I am looking forward to you paper, but I wonder how much really is known about ranking in Scholar. Google states about it "Google Scholar aims to rank documents the way researchers do, weighing the full text of each document, where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literature". There are so many vagueries here.
First, the bulk of GS search results do not have full text, they are 'citation' parsed from reference list of primary indexed papers. Second, it says that Google weighs who wrote an article, but how is that measured in? Citations indeed play an imporant role that canot be switched off and that results in rankings that for most searches give you old stuff. The majority of GS users do not realize that and thus often miss out on the latest research findings.
I also wonder whether Google uses the same kind of PageRank in GS but then with citation numbers. Or do they also take into account weblinks to the various versions of papers? If so, how are these link numbers for various versions added up or corrected for? And if Google uses pagerank based on weblinks to papers combined with pagerank type relevance based on citations, how does the resulting hybrid ranking work? Of course that is a company secret they won't share, but I wonder if there is anybody who has deduced how things really work behind the screens of GS? I will add anything I learn here to our GS guide at

I will soon be posting on arXiv an article entitled “Eugene Garfield, Francis Narin, and PageRank:  the Theoretical Bases of the Google Search Engine.”  Below is its abstract:

This paper presents a test of the validity of using Google Scholar (GS) to evaluate the publications of researchers.  It does this by first comparing the theoretical premises on which the GS search engine PageRank algorithm operates to those on which Garfield based his theory of citation indexing.  It finds that the basic premise is the same, i.e., that subject sets of relevant documents are defined semantically better by linkages than by words.  Google incorporated this premise into PageRank, amending it with the addition of the citation influence method developed by Francis Narin and the staff of Computer Horizons, Inc. (CHI).  This method weighted more heavily citations from documents which themselves were more heavily cited.  Garfield himself essentially had also incorporated this method into his theory of citation indexing by restricting as far as possible the coverage of the Science Citation Index (SCI) to a small multidisciplinary core of journals most heavily cited.  Stealing a page from Garfield’s book, the paper presents a test of the validity of GS by tracing its citations to the h-index works of 5 Nobel laureates in chemistry—the discipline in which Garfield began his pioneering research—with Anne-Wil Harzing’s revolutionary Publish-or-Perish (PoP) software that has established bibliographic and statistical control over the GS database.  Most of these works were journal articles, and the rankings of the journals in which they appeared by both total cites (TC) and impact factor (IF) at the time of their publication were analyzed.  The results conformed to the findings of Garfield through citation analysis, confirming his law of concentration and view of the importance of review articles.  As a byproduct of this finding, it is shown that Narin had totally misunderstood and mishandled citations from review journals.  The evidence of this paper is conclusive:  Garfield’s theory of citation indexing and PageRank validate each other, and Eugene Garfield is the grandfather of the Web search engine.

I will post this article as soon as my wife finishes her proofreading and copyediting.  I will inform you when it has been posted, but I wanted to get out as soon as possible the basic findings of the paper.


