​About the size of Google Scholar: playing the numbers

Isidro F. Aguillo isidro.aguillo at CCHS.CSIC.ES
Thu Sep 11 11:36:48 EDT 2014


The pages 14-17 are free to read in the Google Books entry of '_Science 
of science and reflexivity'_


On 11/09/2014 17:21, Yves Gingras wrote:
> Adminstrative info for SIGMETRICS (for example unsubscribe): 
> http://web.utk.edu/~gwhitney/sigmetrics.html Re: [SIGMETRICS] ​About 
>  the size of Google Scholar: playing the numbers Hello
>
> Having worked with Bourdieu before is untimely death in 2002, I cannot 
> let pass this opportinity to suggest that it would be nice if in 
> addition to just looking at h-index for the fun of it (which by the 
> way we do not need to know that Bourdieu is among the very few great 
> sociologists of the second half of the 20th century) people read his 
> book: _Science of science and reflexivity_ (Chicago press, 2004). He 
> talks briefly about scientometrics on p. 14, and putting more 
> _reflexive _sociology into our _thinking _before _counting _would be 
> welcome...
>
> After “slow science” why not a new motto for all of us:  “slow 
> bibliometrics: thinking before counting”
>
>
> Best regards
>
> Yves Gingras
>
>
>
> Le 11/09/14 10:34, « Isidro F. Aguillo » <isidro.aguillo at CCHS.CSIC.ES> 
> a écrit :
>
>     Adminstrative info for SIGMETRICS (for example unsubscribe):
>     http://web.utk.edu/~gwhitney/sigmetrics.html
>     <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
>     Dear Stephen,
>
>      Thanks for your comments. I understand the private nature of
>     Google, but Mendeley (owned by Elsevier) and other similar
>     biblio/altmetric sources are also commercial backed companies and
>     they are offering good APIs for in-depth, large data analysis.
>
>      as a matter of curiosity I checked the largest h-index in Google
>     Scholar Citations and it looks to be:
>
>
>     Pierre Bourdieu
>
>     Centre de Sociologie Européenne, Collège de France
>     http://scholar.google.com/citations?user=d_lp40IAAAAJ&hl=en
>     <http://scholar.google.com/citations?user=d_lp40IAAAAJ&hl=en>
>     <http://scholar.google.com/citations?user=d_lp40IAAAAJ&hl=en>
>
>      Citations    361973
>      h-index             207
>
>      Any better candidates?
>
>
>      On 11/09/2014 16:13, Stephen J Bensman wrote:
>
>
>
>
>
>         Isidro,
>
>         Unfortunately Google is a cautious private enterprise company
>         with commercial interest and secrets.  For example, it is very
>         cautious when it comes to copyright.  I really hate it when I
>         find a book chapter of interest to me but cannot download it
>         or copy/paste it.  Moreover, with Google Scholar citations it
>         allows you to make the choice whether you want yours public or
>         private.  That keeps the door open for Harzing’s
>         Publish-or-Perish program.  Google does not want any law suits
>         resulting from making your private data public without your
>         permission.
>
>
>
>         Google allows you large enough samples for most purposes.  For
>         example, when it comes to individuals, the main measure
>         appears to be the h-index.  For analytical purposes, your
>         h-index has to be above 50 to provide a proper sample.  Few
>         people have h-indexes above 50, and I know of none with an
>         h-index above 1000.
>
>
>
>         Google’s database is Google’s private property.  It can do
>         with it what it wants.  I imagine that—like Thomson
>         Reuters—you could purchase a lot of data from it. However, you
>         may be some sort of Bolshevik, who wants the right to
>         expropriate it.   As for the uselessness of Google Scholar, I
>         will quote your compatriots below:
>
>
>
>         “Now, when empirical studies
>         (http://googlescholardigest.blogspot.com.es/p/bibliography.html)
>         <http://googlescholardigest.blogspot.com.es/p/bibliography.html%29>
>         demonstrate every day that Google Scholar and its derivatives
>
>
>
>         a) measure with similar credit to traditional bibliometric
>         indicators,
>
>         b) are the most used products by scientists
>         (_http://www.nature.com/news/online-collaboration-scientists-and-the-social-network-1.15711)
>         <http://www.nature.com/news/online-collaboration-scientists-and-the-social-network-1.15711%29>
>         <http://www.nature.com/news/online-collaboration-scientists-and-the-social-network-1.15711%29>
>         <http://www.nature.com/news/online-collaboration-scientists-and-the-social-network-1.15711%29>
>         _,”
>
>
>
>         Why don’t you take up your case with them as well?
>
>         Respectfully,
>
>         Stephen J Bensman
>
>         LSU Libraries
>
>         Lousiana State University
>
>         Baton Rouge, LA 70803
>
>
>
>
>
>
>
>
>
>
>
>
>
>         *From:* ASIS&T Special Interest Group on Metrics
>         [mailto:SIGMETRICS at LISTSERV.UTK.EDU] *On Behalf Of *Isidro F.
>         Aguillo
>         *Sent:* Thursday, September 11, 2014 1:46 AM
>         *To:* SIGMETRICS at LISTSERV.UTK.EDU
>         *Subject:* Re: [SIGMETRICS] ​About the size of Google Scholar:
>         playing the numbers
>
>
>
>         Adminstrative info for SIGMETRICS (for example unsubscribe):
>         http://web.utk.edu/~gwhitney/sigmetrics.html
>         <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
>         <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
>         <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
>
>         Are you talking about Google Scholar?
>
>          The useless bibliographic tool that does not allow to extract
>         large data sets?
>
>          The system that blocked the access to it to your whole
>         organization if you try to do it?
>
>          Are suffering CAPTCHA?
>
>          Is somebody able to talk with them and convince of changing
>         their approach to our community?
>
>          On 10/09/2014 20:17, Stephen J Bensman wrote:
>
>
>             Enrique and Emilio.
>
>             I read your working paper with great interest as it deals
>             with the same topic on which we are doing research here at
>             LSU.  To tell you the  honest truth, I had trouble with
>             its basic premise, i.e., that Google Scholar (GS) has a
>             given size.  I do not think that it does, and, if it does,
>             it is meaningless.  The real problem is what is the size
>             of documentary set that is relevant to the search query.
>
>
>
>             The WWW and PageRank (the Google search engine) operate
>             within what can be called the power-law or Lotkaian
>             domain.  Informetric laws also operate within this
>             domain.  On top of that, PageRank operates on what is
>             called the probability ranking principle, by which the
>             probability of relevance exponentially decreases as the
>             number of inlinks decreases, i.e. below a certain point
>             you are dealing with gibberish manufactured by the search
>             engine itself.  Therefore, there is a need for left
>             truncation and determination of what can be termed the
>             x-min.  Since we are dealing with the Lotkaian domain, the
>             x-min marks the point where the asymptote or “tail” on the
>             x-axis for the items begins.
>
>
>
>             We are dealing with Nobelists, and what we have found is
>             that with PageRank the set of relevant documents is
>             conterminous with the researcher’s h-index and the “tail”
>             of his GS citations distribution.  In other words—whether
>             by serendipity or not—the h-index is an excellent estimate
>             of the x-min of a GS citations distribution.  Below that
>             is what the Germans would call a “Trummerzone” or rubbish
>             zone largely manufactured by the search engine itself. 
>             This conterminous-ness is a validation of both the h-index
>             and Google Scholar.  The relevance of the set is also
>             proven by the fact that the extreme outliers on the right
>             messing up the tail are usually works on the topics for
>             which the Nobelist won the prize.  Case closed.
>
>
>
>             Every field has its statistical problem.  With medical
>             research it is right truncation, for every patient has to
>             die before the results are really known.  With the WWW and
>             scientometric research, it is left truncation.
>
>
>
>             If you are interested in how I view how Google Scholar
>             works, you can read our working papers at the following URLs:
>
>
>
>             http://arxiv.org/abs/1312.3872
>
>
>
>             http://arxiv.org/abs/1404.4904
>
>
>
>             I hope to post another working paper there next week that
>             will really clinch the point.  But who knows?  I may be wrong.
>
>
>
>             Respectfully,
>
>
>
>             Stephen J Bensman, Ph.D.
>
>             LSU Libraries
>
>             Lousiana State University
>
>             Baton Rouge, LA 70803
>
>             USA
>
>
>
>
>
>             *From:* ASIS&T Special Interest Group on Metrics
>             [mailto:SIGMETRICS at LISTSERV.UTK.EDU] *On Behalf Of
>             *Enrique Orduña
>             *Sent:* Wednesday, September 10, 2014 5:15 AM
>             *To:* SIGMETRICS at LISTSERV.UTK.EDU
>             *Subject:* [SIGMETRICS] ​About the size of Google Scholar:
>             playing the numbers
>
>
>
>             Adminstrative info for SIGMETRICS (for example
>             unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html
>             <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
>             <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
>             <http://web.utk.edu/%7Egwhitney/sigmetrics.html>
>
>
>
>
>             ​ ​
>
>
>             Dear Colleagues,
>
>              The purpose of this mail is to present our latest working
>             paper, deposited on July 24, 2014.
>             http://googlescholardigest.blogspot.com.es/2014/09/about-size-of-google-scholar-playing.html
>
>
>
>             ​ ​
>
>
>
>
>
>
>
>
>
>             We propose the inextricable task of knowing the size of
>             this huge black hole looks like Google Scholar (GS).
>             Anyway, as the title of the document (
>
>
>
>             ​ ​
>
>
>             About the size of Google Scholar: playing the numbers), we
>             have begun to make accounts and using 4 different
>             empirical methods we estimate that the number of unique
>             documents (different versions of a document are excluded)
>             should not be less than 160 million (as of May 2014).
>
>
>
>
>
>
>
>
>
>             Regardless of this particular outcome, which is itself
>             significant (especially when compared with other
>             scientific databases, and that gives us key clues about
>             the amount of scientific knowledge that can be searchable,
>             found and accessed to on the web), even more exciting is
>             the methodological challenge of this assumption. It has
>             not only forced us to devise various techniques for
>             measuring the size of this dark object that GS is, but
>
>
>
>             ​ also ​
>
>
>             applying them we have shed light, again, on various
>             inconsistencies, uncertainties and limitations of the
>             search interface tools used by Google. In short, we have
>             learned more about what Google Scholar does or does not,
>             and we want to share it with you all.
>
>
>
>
>
>
>
>
>
>             This research comes at a good time. We are not only almost
>             celebrating the 10th anniversary of GS but also hearing
>             some voices (from somewhere in Europe…) finally relying on
>             the use of Google Scholar for scientific evaluation.
>
>
>
>
>
>
>
>
>
>             Now, when empirical studies
>             (http://googlescholardigest.blogspot.com.es/p/bibliography.html)
>             <http://googlescholardigest.blogspot.com.es/p/bibliography.html%29>
>             demonstrate every day that Google Scholar and its derivatives
>
>
>
>
>
>
>
>
>
>             a) measure with similar credit to traditional bibliometric
>             indicators,
>
>
>
>
>             b) are the most used products by scientists
>             (http://www.nature.com/news/online-collaboration-scientists-and-the-social-network-1.15711),
>             <http://www.nature.com/news/online-collaboration-scientists-and-the-social-network-1.15711%29,>
>
>
>
>             ​ and​
>
>
>
>
>
>
>
>             c) have unfortunately ended up with the competition
>             (Microsoft Academic Search is in an unexplained
>             hibernation,
>             http://googlescholardigest.blogspot.com.es/2014/04/empirical-evidences-microsoft-academic-search-dead.html)
>             <http://googlescholardigest.blogspot.com.es/2014/04/empirical-evidences-microsoft-academic-search-dead.html%29>
>
>
>
>             ​ .​
>
>
>
>
>
>
>
>
>
>
>             seems that certain euphoria unleashed. We are pleased,
>             better late than never…
>
>
>
>
>
>
>
>
>
>             However, without wanting to lower the aroused
>             expectations, we emphasize that the problems of Google
>             Scholar for scientific evaluation are not technical or
>             methodological (coverage, reliability and validity of the
>             measures, records filtering performance…). Seminal
>             limitations are those related with:
>
>
>
>
>
>
>
>
>
>             a) the ease with which GS indicators can be manipulated
>
>
>
>>
>
>             (http://ec3noticias.blogspot.com.es/2014/01/google-scholar-wins-ravesbut-can-it-be.htmt),
>             <http://ec3noticias.blogspot.com.es/2014/01/google-scholar-wins-ravesbut-can-it-be.htmt%29,>
>
>              b) the transience of the results and measures (in many
>             cases difficult to replicate stably),
>
>              c) the technological dependence on companies that develop
>             tools that come and go on the consumer product market
>             (http://ec3noticias.blogspot.com.es/2014/04/la-new-new-horizontes.html-bibliometrics).
>             <http://ec3noticias.blogspot.com.es/2014/04/la-new-new-horizontes.html-bibliometrics%29.>
>
>
>
>
>
>
>
>
>
>             Google Scholar enthusiasts are now welcome; meanwhile we
>             will continue vigorously in which we already proposed
>             several years ago: to reveal with “data”
>
>
>
>             ​ - ​
>
>
>             and not mere opinions
>
>
>
>             ​ -​
>
>
>             , the bowels of Google Scholar, and to reveal at the same
>             time their strengths and weaknesses. So, like the old
>             serials published, we can only promise...TO BE CONTINUED…
>
>
>
>
>
>
>
>
>
>
>             ​ Best,​
>
>
>
>
>
>
>
>
>             Enrique Orduña-Malea​
>
>
>
>
>             ​ Polytechnic University of Valencia​
>
>
>
>
>
>
>
>
>             ​ ​Emilio Delgado López-Cózar
>
>
>
>
>             Universidad de Granada​
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Yves Gingras
>
> Professeur
> Département d'histoire
> Centre interuniversitaire de recherche
> sur la science et la technologie (CIRST)
> Chaire de recherche du Canada en histoire
> et sociologie des sciences
> Observatoire des sciences et des technologies (OST)
> UQAM
> C.P. 8888, Succ. Centre-Ville
> Montréal, Québec
> Canada, H3C 3P8
>
> Tel: (514)-987-3000-7053
> Fax: (514)-987-7726
>
> http://www.chss.uqam.ca
> http://www.cirst.uqam.ca
> http://www.ost.uqam.ca


-- 

************************************
Isidro F. Aguillo, HonDr.
The Cybermetrics Lab, IPP-CSIC
Grupo Scimago
Madrid. SPAIN

isidro.aguillo at csic.es
ORCID 0000-0001-8927-4873
ResearcherID: A-7280-2008
Scholar Citations SaCSbeoAAAAJ
Twitter @isidroaguillo
Rankings Web webometrics.info
************************************



---
Este mensaje no contiene virus ni malware porque la protección de avast! Antivirus está activa.
http://www.avast.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20140911/a7fd9eac/attachment.html>


More information about the SIGMETRICS mailing list