​About the size of Google Scholar: playing the numbers

Enrique Orduña riorma at GMAIL.COM
Wed Sep 10 06:15:11 EDT 2014

Dear Colleagues,

The purpose of this mail is to present our latest working paper, deposited
on July 24, 2014.

We propose the inextricable task of knowing the size of this huge black
hole looks like Google Scholar (GS). Anyway, as the title of the document (
About the size of Google Scholar: playing the numbers), we have begun to
make accounts and using 4 different empirical methods we estimate that the
number of unique documents (different versions of a document are excluded)
should not be less than 160 million (as of May 2014).

Regardless of this particular outcome, which is itself significant
(especially when compared with other scientific databases, and that gives
us key clues about the amount of scientific knowledge that can be
searchable, found and accessed to on the web), even more exciting is the
methodological challenge of this assumption. It has not only forced us to
devise various techniques for measuring the size of this dark object that
GS is, but
​also ​
applying them we have shed light, again, on various inconsistencies,
uncertainties and limitations of the search interface tools used by Google.
In short, we have learned more about what Google Scholar does or does not,
and we want to share it with you all.

This research comes at a good time. We are not only almost celebrating the
10th anniversary of GS but also hearing some voices (from somewhere in
Europe…) finally relying on the use of Google Scholar for scientific

Now, when empirical studies (
http://googlescholardigest.blogspot.com.es/p/bibliography.html) demonstrate
every day that Google Scholar and its derivatives

a) measure with similar credit to traditional bibliometric indicators,

b) are the most used products by scientists (
​ and​

c) have unfortunately ended up with the competition (Microsoft Academic
Search is in an unexplained hibernation,

seems that certain euphoria unleashed. We are pleased, better late than

However, without wanting to lower the aroused expectations, we emphasize
that the problems of Google Scholar for scientific evaluation are not
technical or methodological (coverage, reliability and validity of the
measures, records filtering performance…). Seminal limitations are those
related with:

a) the ease with which GS indicators can be manipulated

b) the transience of the results and measures (in many cases difficult to
replicate stably),

c) the technological dependence on companies that develop tools that come
and go on the consumer product market (

Google Scholar enthusiasts are now welcome; meanwhile we will continue
vigorously in which we already proposed several years ago: to reveal with
​- ​
and not mere opinions
​ -​
, the bowels of Google Scholar, and to reveal at the same time their
strengths and weaknesses. So, like the old serials published, we can only
promise...TO BE CONTINUED…


Enrique Orduña-Malea​
​Polytechnic University of Valencia​

​​Emilio Delgado López-Cózar
Universidad de Granada​
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20140910/c1cd1b07/attachment.html>

More information about the SIGMETRICS mailing list