About the size of Google Scholar: playing the numbers
Isidro F. Aguillo
isidro.aguillo at CCHS.CSIC.ES
Thu Sep 11 11:36:48 EDT 2014
The pages 14-17 are free to read in the Google Books entry of '_Science
of science and reflexivity'_
On 11/09/2014 17:21, Yves Gingras wrote:
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> http://web.utk.edu/~gwhitney/sigmetrics.html Re: [SIGMETRICS] About
> the size of Google Scholar: playing the numbers Hello
>
> Having worked with Bourdieu before is untimely death in 2002, I cannot
> let pass this opportinity to suggest that it would be nice if in
> addition to just looking at h-index for the fun of it (which by the
> way we do not need to know that Bourdieu is among the very few great
> sociologists of the second half of the 20th century) people read his
> book: _Science of science and reflexivity_ (Chicago press, 2004). He
> talks briefly about scientometrics on p. 14, and putting more
> _reflexive _sociology into our _thinking _before _counting _would be
> welcome...
>
> After “slow science” why not a new motto for all of us: “slow
> bibliometrics: thinking before counting”
>
>
> Best regards
>
> Yves Gingras
>
>
>
> Le 11/09/14 10:34, « Isidro F. Aguillo » <isidro.aguillo at CCHS.CSIC.ES>
> a écrit :
>
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> http://web.utk.edu/~gwhitney/sigmetrics.html
> <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
> Dear Stephen,
>
> Thanks for your comments. I understand the private nature of
> Google, but Mendeley (owned by Elsevier) and other similar
> biblio/altmetric sources are also commercial backed companies and
> they are offering good APIs for in-depth, large data analysis.
>
> as a matter of curiosity I checked the largest h-index in Google
> Scholar Citations and it looks to be:
>
>
> Pierre Bourdieu
>
> Centre de Sociologie Européenne, Collège de France
> http://scholar.google.com/citations?user=d_lp40IAAAAJ&hl=en
> <http://scholar.google.com/citations?user=d_lp40IAAAAJ&hl=en>
> <http://scholar.google.com/citations?user=d_lp40IAAAAJ&hl=en>
>
> Citations 361973
> h-index 207
>
> Any better candidates?
>
>
> On 11/09/2014 16:13, Stephen J Bensman wrote:
>
>
>
>
>
> Isidro,
>
> Unfortunately Google is a cautious private enterprise company
> with commercial interest and secrets. For example, it is very
> cautious when it comes to copyright. I really hate it when I
> find a book chapter of interest to me but cannot download it
> or copy/paste it. Moreover, with Google Scholar citations it
> allows you to make the choice whether you want yours public or
> private. That keeps the door open for Harzing’s
> Publish-or-Perish program. Google does not want any law suits
> resulting from making your private data public without your
> permission.
>
>
>
> Google allows you large enough samples for most purposes. For
> example, when it comes to individuals, the main measure
> appears to be the h-index. For analytical purposes, your
> h-index has to be above 50 to provide a proper sample. Few
> people have h-indexes above 50, and I know of none with an
> h-index above 1000.
>
>
>
> Google’s database is Google’s private property. It can do
> with it what it wants. I imagine that—like Thomson
> Reuters—you could purchase a lot of data from it. However, you
> may be some sort of Bolshevik, who wants the right to
> expropriate it. As for the uselessness of Google Scholar, I
> will quote your compatriots below:
>
>
>
> “Now, when empirical studies
> (http://googlescholardigest.blogspot.com.es/p/bibliography.html)
> <http://googlescholardigest.blogspot.com.es/p/bibliography.html%29>
> demonstrate every day that Google Scholar and its derivatives
>
>
>
> a) measure with similar credit to traditional bibliometric
> indicators,
>
> b) are the most used products by scientists
> (_http://www.nature.com/news/online-collaboration-scientists-and-the-social-network-1.15711)
> <http://www.nature.com/news/online-collaboration-scientists-and-the-social-network-1.15711%29>
> <http://www.nature.com/news/online-collaboration-scientists-and-the-social-network-1.15711%29>
> <http://www.nature.com/news/online-collaboration-scientists-and-the-social-network-1.15711%29>
> _,”
>
>
>
> Why don’t you take up your case with them as well?
>
> Respectfully,
>
> Stephen J Bensman
>
> LSU Libraries
>
> Lousiana State University
>
> Baton Rouge, LA 70803
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* ASIS&T Special Interest Group on Metrics
> [mailto:SIGMETRICS at LISTSERV.UTK.EDU] *On Behalf Of *Isidro F.
> Aguillo
> *Sent:* Thursday, September 11, 2014 1:46 AM
> *To:* SIGMETRICS at LISTSERV.UTK.EDU
> *Subject:* Re: [SIGMETRICS] About the size of Google Scholar:
> playing the numbers
>
>
>
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> http://web.utk.edu/~gwhitney/sigmetrics.html
> <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
> <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
> <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
>
> Are you talking about Google Scholar?
>
> The useless bibliographic tool that does not allow to extract
> large data sets?
>
> The system that blocked the access to it to your whole
> organization if you try to do it?
>
> Are suffering CAPTCHA?
>
> Is somebody able to talk with them and convince of changing
> their approach to our community?
>
> On 10/09/2014 20:17, Stephen J Bensman wrote:
>
>
> Enrique and Emilio.
>
> I read your working paper with great interest as it deals
> with the same topic on which we are doing research here at
> LSU. To tell you the honest truth, I had trouble with
> its basic premise, i.e., that Google Scholar (GS) has a
> given size. I do not think that it does, and, if it does,
> it is meaningless. The real problem is what is the size
> of documentary set that is relevant to the search query.
>
>
>
> The WWW and PageRank (the Google search engine) operate
> within what can be called the power-law or Lotkaian
> domain. Informetric laws also operate within this
> domain. On top of that, PageRank operates on what is
> called the probability ranking principle, by which the
> probability of relevance exponentially decreases as the
> number of inlinks decreases, i.e. below a certain point
> you are dealing with gibberish manufactured by the search
> engine itself. Therefore, there is a need for left
> truncation and determination of what can be termed the
> x-min. Since we are dealing with the Lotkaian domain, the
> x-min marks the point where the asymptote or “tail” on the
> x-axis for the items begins.
>
>
>
> We are dealing with Nobelists, and what we have found is
> that with PageRank the set of relevant documents is
> conterminous with the researcher’s h-index and the “tail”
> of his GS citations distribution. In other words—whether
> by serendipity or not—the h-index is an excellent estimate
> of the x-min of a GS citations distribution. Below that
> is what the Germans would call a “Trummerzone” or rubbish
> zone largely manufactured by the search engine itself.
> This conterminous-ness is a validation of both the h-index
> and Google Scholar. The relevance of the set is also
> proven by the fact that the extreme outliers on the right
> messing up the tail are usually works on the topics for
> which the Nobelist won the prize. Case closed.
>
>
>
> Every field has its statistical problem. With medical
> research it is right truncation, for every patient has to
> die before the results are really known. With the WWW and
> scientometric research, it is left truncation.
>
>
>
> If you are interested in how I view how Google Scholar
> works, you can read our working papers at the following URLs:
>
>
>
> http://arxiv.org/abs/1312.3872
>
>
>
> http://arxiv.org/abs/1404.4904
>
>
>
> I hope to post another working paper there next week that
> will really clinch the point. But who knows? I may be wrong.
>
>
>
> Respectfully,
>
>
>
> Stephen J Bensman, Ph.D.
>
> LSU Libraries
>
> Lousiana State University
>
> Baton Rouge, LA 70803
>
> USA
>
>
>
>
>
> *From:* ASIS&T Special Interest Group on Metrics
> [mailto:SIGMETRICS at LISTSERV.UTK.EDU] *On Behalf Of
> *Enrique Orduña
> *Sent:* Wednesday, September 10, 2014 5:15 AM
> *To:* SIGMETRICS at LISTSERV.UTK.EDU
> *Subject:* [SIGMETRICS] About the size of Google Scholar:
> playing the numbers
>
>
>
> Adminstrative info for SIGMETRICS (for example
> unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html
> <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
> <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
> <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
>
>
>
>
>
>
>
> Dear Colleagues,
>
> The purpose of this mail is to present our latest working
> paper, deposited on July 24, 2014.
> http://googlescholardigest.blogspot.com.es/2014/09/about-size-of-google-scholar-playing.html
>
>
>
>
>
>
>
>
>
>
>
>
>
> We propose the inextricable task of knowing the size of
> this huge black hole looks like Google Scholar (GS).
> Anyway, as the title of the document (
>
>
>
>
>
>
> About the size of Google Scholar: playing the numbers), we
> have begun to make accounts and using 4 different
> empirical methods we estimate that the number of unique
> documents (different versions of a document are excluded)
> should not be less than 160 million (as of May 2014).
>
>
>
>
>
>
>
>
>
> Regardless of this particular outcome, which is itself
> significant (especially when compared with other
> scientific databases, and that gives us key clues about
> the amount of scientific knowledge that can be searchable,
> found and accessed to on the web), even more exciting is
> the methodological challenge of this assumption. It has
> not only forced us to devise various techniques for
> measuring the size of this dark object that GS is, but
>
>
>
> also
>
>
> applying them we have shed light, again, on various
> inconsistencies, uncertainties and limitations of the
> search interface tools used by Google. In short, we have
> learned more about what Google Scholar does or does not,
> and we want to share it with you all.
>
>
>
>
>
>
>
>
>
> This research comes at a good time. We are not only almost
> celebrating the 10th anniversary of GS but also hearing
> some voices (from somewhere in Europe…) finally relying on
> the use of Google Scholar for scientific evaluation.
>
>
>
>
>
>
>
>
>
> Now, when empirical studies
> (http://googlescholardigest.blogspot.com.es/p/bibliography.html)
> <http://googlescholardigest.blogspot.com.es/p/bibliography.html%29>
> demonstrate every day that Google Scholar and its derivatives
>
>
>
>
>
>
>
>
>
> a) measure with similar credit to traditional bibliometric
> indicators,
>
>
>
>
> b) are the most used products by scientists
> (http://www.nature.com/news/online-collaboration-scientists-and-the-social-network-1.15711),
> <http://www.nature.com/news/online-collaboration-scientists-and-the-social-network-1.15711%29,>
>
>
>
> and
>
>
>
>
>
>
>
> c) have unfortunately ended up with the competition
> (Microsoft Academic Search is in an unexplained
> hibernation,
> http://googlescholardigest.blogspot.com.es/2014/04/empirical-evidences-microsoft-academic-search-dead.html)
> <http://googlescholardigest.blogspot.com.es/2014/04/empirical-evidences-microsoft-academic-search-dead.html%29>
>
>
>
> .
>
>
>
>
>
>
>
>
>
>
> seems that certain euphoria unleashed. We are pleased,
> better late than never…
>
>
>
>
>
>
>
>
>
> However, without wanting to lower the aroused
> expectations, we emphasize that the problems of Google
> Scholar for scientific evaluation are not technical or
> methodological (coverage, reliability and validity of the
> measures, records filtering performance…). Seminal
> limitations are those related with:
>
>
>
>
>
>
>
>
>
> a) the ease with which GS indicators can be manipulated
>
>
>
>
>
>
> (http://ec3noticias.blogspot.com.es/2014/01/google-scholar-wins-ravesbut-can-it-be.htmt),
> <http://ec3noticias.blogspot.com.es/2014/01/google-scholar-wins-ravesbut-can-it-be.htmt%29,>
>
> b) the transience of the results and measures (in many
> cases difficult to replicate stably),
>
> c) the technological dependence on companies that develop
> tools that come and go on the consumer product market
> (http://ec3noticias.blogspot.com.es/2014/04/la-new-new-horizontes.html-bibliometrics).
> <http://ec3noticias.blogspot.com.es/2014/04/la-new-new-horizontes.html-bibliometrics%29.>
>
>
>
>
>
>
>
>
>
> Google Scholar enthusiasts are now welcome; meanwhile we
> will continue vigorously in which we already proposed
> several years ago: to reveal with “data”
>
>
>
> -
>
>
> and not mere opinions
>
>
>
> -
>
>
> , the bowels of Google Scholar, and to reveal at the same
> time their strengths and weaknesses. So, like the old
> serials published, we can only promise...TO BE CONTINUED…
>
>
>
>
>
>
>
>
>
>
> Best,
>
>
>
>
>
>
>
>
> Enrique Orduña-Malea
>
>
>
>
> Polytechnic University of Valencia
>
>
>
>
>
>
>
>
> Emilio Delgado López-Cózar
>
>
>
>
> Universidad de Granada
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Yves Gingras
>
> Professeur
> Département d'histoire
> Centre interuniversitaire de recherche
> sur la science et la technologie (CIRST)
> Chaire de recherche du Canada en histoire
> et sociologie des sciences
> Observatoire des sciences et des technologies (OST)
> UQAM
> C.P. 8888, Succ. Centre-Ville
> Montréal, Québec
> Canada, H3C 3P8
>
> Tel: (514)-987-3000-7053
> Fax: (514)-987-7726
>
> http://www.chss.uqam.ca
> http://www.cirst.uqam.ca
> http://www.ost.uqam.ca
--
************************************
Isidro F. Aguillo, HonDr.
The Cybermetrics Lab, IPP-CSIC
Grupo Scimago
Madrid. SPAIN
isidro.aguillo at csic.es
ORCID 0000-0001-8927-4873
ResearcherID: A-7280-2008
Scholar Citations SaCSbeoAAAAJ
Twitter @isidroaguillo
Rankings Web webometrics.info
************************************
---
Este mensaje no contiene virus ni malware porque la protección de avast! Antivirus está activa.
http://www.avast.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20140911/a7fd9eac/attachment.html>
More information about the SIGMETRICS
mailing list