Thelwall M. "Can Google's PageRank be used to find the most important academic Web pages? J Doc 59(2):205-217 2003
Eugene Garfield
garfield at CODEX.CIS.UPENN.EDU
Tue May 25 10:34:07 EDT 2004
Mike Thelwall : m.thelwall at wlv.ac.uk
TITLE Can Google's PageRank be used to find the most important
academic Web pages?
AUTHRO Thelwall M
JOURNAL JOURNAL OF DOCUMENTATION 59 (2): 205-217 2003
Document type: Article Language: English Cited References: 32
Times Cited: 0 Explanation
Abstract:
Google's PageRank is an influential algorithm that uses a model of Web use
that is dominated by its link structure in order to rank pages by their
estimated value to the Web community. This paper reports on the outcome of
applying the algorithm to the Web sites of three national university
systems in order to test whether it is capable of identifying the most
important Web pages. The results are also compared with simple inlink
counts. It was discovered that the highest inlinked pages do not always
have the highest PageRank, indicating that the two metrics are genuinely
different, even for the top pages. More significantly, however, internal
links dominated external links for the high ranks in either method and
superficial reasons accounted for high scores in both cases. It is
concluded that PageRank is not useful for identifying the top pages in a
site and that it must be combined with a powerful text matching techniques
in order to get the quality of information retrieval results provided by
Google.
Author Keywords:
Internet, universities, information retrieval, algorithms, effectiveness
KeyWords Plus:
IMPACT FACTORS, CRAWLER
Addresses:
Thelwall M, Wolverhampton Univ, Sch Comp & Informat Technol, Wolverhampton,
England
Wolverhampton Univ, Sch Comp & Informat Technol, Wolverhampton, England
Publisher:
EMERALD GROUP PUBLISHING LTD, 60/62 TOLLER LANE, BRADFORD BD8 9BY, W
YORKSHIRE, ENGLAND
IDS Number:
730YD
ISSN:
0022-0418
Cited Author Cited Work Volume Page Year
BHARAT K 10 INT WORLD WID WEB 2001
BRIN S COMPUT NETWORKS ISDN 30 107 1998
BRODER A COMPUT NETW 33 309 2000
GAO J TREC10 WEB TRACK EXP 2001
GLASER J SCIENTOMETRICS 52 411 2001
GOODRUM AA INFORM PROCESS MANAG 37 661 2001
GOOGLE GOOGL TECHN 2002
HAVELIWALA T EFFICIENT COMPUTATIO 1999
HAWKING D INFORMATION TECHNOLO 307 2000
HEYDON A WORLD WIDE WEB 2 219 1999
INGWERSEN P J DOC 54 236 1998
KLEINBERG JM J ACM 46 604 1999
LARSON RR ASIS 96 1996
LEYDESDORFF L CYBERMETRICS 4 2000
LIFANTSEV M P INT C INT COMP 143 2000
NG AY P 24 ANN INT ACM SIG 258 2001
PAGE B 6285999 US 1998
RAFIEI D COMPUT NETW 33 823 2000
RICHARDSON M NEURAL INFORMATION P 2001
ROUSSEAU R CYBERMETRICS 1 1997
SMITH A SCIENTOMETRICS 54 2002
SMITH AG J DOC 55 577 1999
SULLIVAN D GOOGLE TOPS SEARCH H 2002
THELWALL M J AM SOC INF SCI TEC 52 1157 2001
THELWALL M J DOC 58 60 2002
THELWALL M J DOC 57 177 2001
THELWALL M J DOCUMENTATION 2001
THELWALL M J INFORM SCI 27 319 2001
THELWALL M ONLINE INFORMATION R 26 2002
THELWALL M ONLINE INFORMATION R 26 124 2002
THELWALL M PUBLICLY ACCESSIBLE 2001
XI W TREC 2001 686 2001
More information about the SIGMETRICS
mailing list