​About the size of Google Scholar: playing the numbers

Enrique Orduña riorma at GMAIL.COM
Fri Sep 12 08:08:28 EDT 2014

Dear Stephen,

Thank you very much again for your interesting email, it helps us to debate
and discuss in a very concentrated form.

As regards Barabasi and Albert studies (and Broder, Baeza-Yates and many
other colleagues), we have certainly learned a lot from them. However, I
think they are located in another analysis context, as they are intended to
measure the network, and we just the database (which obviously harvest the
net, but filtering contents).

Therefore, our aim was to count the number of bibliographic records that
Google Scholar. That is, Google Scholar is an academic search engine, but
also a bibliographical database. And that amount (and its evolution) gives
us much information. In addition, a record in Google Scholar database does
not always correspond to an online digital object (a node the academic
network), since many records are mere citations that do not lead to any

In any case, unless this point, which is determined by simply have
different research interests, I think we agree in fact about what is a
defined or undefined universe of information. Let us recommend you the
following presentation:

In addition, Google Scholar is able to measure not only scientific impact,
but the professional and social impact of people, something that neither
WoS nor Scopus do, and that certainly explains the high values in the
performance of economists and other researchers in social sciences and

We therefore believe that it is interesting to know the size of Google
Scholar, also measured in this way. And it seems we are not the only ones
to be concerned about this. We recommend the following work performed by
Khabsa & Giles (2014), which certainly enlightened us on our way.

Khabsa & Giles (2014). “The number of scholarly documents on the public
web. Plos One, 9(5): e93949.doi:10.1371/journal.pone.0093949

As regards Google Scholar’s opacity, while we agree that it is a company
and Google can do whatever they want with their products, we should remind
that the bibliographic records that Google Scholar catalogs and provides
access to are harvested in most from public entities (such as institutional
repositories from public universities). A little consideration with those
institutions that nurture for free its database would be appreciated,
​for example
 an API to be used for academic purposes. But this is a personal

And of course we can learn a lot from each other. We will be happy to read
your work and communicate you our ideas and suggestions and / or vice
versa. We are in contact.

Kind regards,

Enrique Orduña-Malea & Emilio Delgado López-Cózar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20140912/5bab9021/attachment.html>

More information about the SIGMETRICS mailing list