helwall M, Wilkinson D "Finding similar academic Web sites with links, bibliometric couplings and colinks" Information Processing & Management 40(3):515-526 May 2004

Eugene Garfield garfield at CODEX.CIS.UPENN.EDU
Fri Jul 2 14:27:33 EDT 2004


Mike Thelwall  : m.thelwall at wiv.ac.uk
David Wilkinson: d.wilkinson at wlv.ac.uk

TITLE    Finding similar academic Web sites with links, bibliometric
         couplings and colinks
AUTHOR   Thelwall M, Wilkinson D
JOURNAL  INFORMATION PROCESSING & MANAGEMENT 40 (3): 515-526 MAY 2004


Document type: Article     Language: English     Cited References: 49
Times Cited: 0

Abstract:
A common task in both Webmetrics and Web information retrieval is to
identify a set of Web pages or sites that are similar in content. In this
paper we assess the extent to which links, colinks and couplings can be used
to identify similar Web sites. As an experiment, a random sample of 500
pairs of domains from the UK academic Web were taken and human assessments
of site similarity, based upon content type, were compared against ratings
for the three concepts. The results show that using a combination of all
three gives the highest probability of identifying similar sites, but
surprisingly this was only a marginal improvement over using links alone.
Another unexpected result was that high values for either colink counts or
couplings were associated with only a small increased likelihood of
similarity. The principal advantage of using couplings and colinks was found
to be greater coverage in terms of a much larger number of pairs of sites
being connected by these measures, instead of increased probability of
similarity. In information retrieval terminology, this is improved recall
rather than improved precision. (C) 2003 Elsevier Ltd. All rights reserved.

Author Keywords:
document clustering, webmetrics, Web information retrieval

KeyWords Plus:
SCIENCE, DEPARTMENTS, INFORMATION, COCITATION, IMPACT

Addresses:
Thelwall M, Wolverhampton Univ, Sch Comp & Informat Technol, Wulfruna St,
Wolverhampton WV1 1SB, England
Wolverhampton Univ, Sch Comp & Informat Technol, Wolverhampton WV1 1SB, England

Publisher:
PERGAMON-ELSEVIER SCIENCE LTD, THE BOULEVARD, LANGFORD LANE, KIDLINGTON,
OXFORD OX5 1GB, ENGLAND

IDS Number:
818PX

ISSN:
0306-4573

 Cited Author            Cited Work                Volume      Page Year
 AGUILLO IF            ONLINE INFORMATION 9                   239      1998
 ALMIND TC             J DOC                         53       404      1997
 ARASU A               ACM T INTERNET TECHN           1         2      2001
 BJORNEBORN L          P 12 ACM C HYP HYP                     133      2001
 BJORNEBORN L          SHARED OUTLINKS WEBO                            2001
 BORGMAN CL            ANNU REV INFORM SCI           36         3      2002
 BRIN S                COMPUT NETWORKS ISDN          30       107      1998
 BRODER A              COMPUT NETW                   33       309      2000
 CAWKELL T             ASIS MONOGRAPH SERIE                   177      2000
 CHAKRABARTI S         STRUCTURE BROAD TOPI                            2002
 CHEN C                INFORMATION VISUALIS                            1999
 CHEN CM               INTERACT COMPUT               10       353      1998
 CHU H                 J ED LIB INFORMATION          43       110      2002
 CRONIN B              J INFORM SCI                  27         1      2001
 DEARING R             REPORT NATL COMMITTE                            1997
 FLAKE GW              COMPUTER                      35        66      2002
 GAO J                 TREC 10 WEB TRACK EX                            2001
 GARRIDO M             CYBERACTIVISM ONLINE                   165      2003
 GLANZEL W             SCIENTOMETRICS                50       199      2001
 HAVELIWALA TH         SCALABLE TECHNIQUES                             2000
 INGWERSEN P           J DOC                         54       236      1998
 KLEINBERG JM          J ACM                         46       604      1999
 LI XM                 SCIENTOMETRICS                57       239      2003
 NG AY                 P 17 INT JOINT C ART                   903      2001
 NG AY                 P 24 ANN INT ACM SIG                   258      2001
 PARK HW               J AM SOC INF SCI TEC          53       592      2002
 PENNOCK DM            P NATL ACAD SCI USA           99      5207      2002
 PIROLLO P             CHI 96 P C HUM FACT                    118      1996
 POLANCO X             CLUSTERING MAPPING W                            2001
 ROGERS R              SCI CULTURE                   11       191      2002
 ROUSSEAU R            CYBERMETRICS                   1                1997
 SALTON G              INTRO MODERN INFORMA                            1983
 SCHVANEVELDT RW       PSYCHOL LEARN MOTIV           24       249      1989
 SMALL H               J AM SOC INFORM SCI           50       799      1999
 SMALL H               J AM SOC INFORM SCI           24       265      1973
 SMALL H               SCIENTOMETRICS                38       275      1997
 TANG R                IN PRESS DISCIPLINAR
 THELWALL M            IN PRESS J DOCUMENTA          59
 THELWALL M            INTERNET RES                  12       124      2002
 THELWALL M            J AM SOC INF SCI TEC          53       995      2002
 THELWALL M            J AM SOC INF SCI TEC          52      1157      2001
 THELWALL M            J DOC                         58       563      2002
 THELWALL M            J INFORM SCI                  27       319      2001
 THELWALL M            J INFORM SCI                  27       393      2001
 THELWALL M            PUBLICLY ACCESSIBLE                             2001
 THELWALL M            SCIENTOMETRICS                55       335      2002
 THOMAS O              J INFORM SCI                  26       421      2000
 WATTS DJ              NATURE                       393       440      1998
 WHITE HD              J AM SOC INFORM SCI           32       163      1981



More information about the SIGMETRICS mailing list