ABS&Comment: Dean, Finding related pages in the World Wide Web

Gretchen Whitney gwhitney at UTKUX.UTCC.UTK.EDU
Wed Sep 29 20:52:33 EDT 1999


J. Dean         : jdean at mysimon.com
MR Henzinger    :monika at pa.dec.com

TITLE   :       Finding related pages in the World Wide Web
AUTHOR: Dean J, Henzinger MR
JOURNAL:        COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND
TELECOMMUNICATIONS              NETWORKING 31: (11-16) 1467-1479 MAY 17 1999

 Document type: Article
 Language: English

Abstract:
When using traditional search engines, users have to formulate queries to
describe their information need. This paper discusses a different approach
to Web searching where the input to the search process is not a set of query
terms, but instead is the URL of a page, and the output is a set of related
Web pages. A related Web page is one that addresses the same topic as the
original page. For example, www.washingtonpost.com is a page related to
www.nytimes.com, since both are online newspapers.

We describe two algorithms to identify related Web pages. These algorithms
use only the connectivity information in the Web (i.e., the links between
pages) and not the content of pages or usage information. We have
implemented both algorithms and measured their runtime performance. To
evaluate the effectiveness of our algorithms, we performed a user study
comparing our algorithms with Netscape's 'What's Related' service
(http://home.netscape.com/escapes/related/). Our study showed that the
precision at 10 for our two algorithms are 73% better and 51% better than
that of Netscape, despite the fact that Netscape uses both content and
usage pattern information in addition to connectivity information. (C) 1999
Published by Elsevier Science B.V. All rights reserved.

Author Keywords:
search engines, related pages, searching paradigms

Addresses:
Dean J, Mysimon Inc, Santa Clara, CA USA.
Compaq Syst Res Ctr, Palo Alto, CA 94301 USA.

Publisher:
ELSEVIER SCIENCE BV, AMSTERDAM

IDS Number:
202PZ

ISSN:
1389-1286

Copyright © 1999 Institute for Scientific Information
Please visit their website at www.isinet.com


excerpt from the paper

"Previous authors have suggested using cocitation and other forms of
connectivity to identify related Web pages.  Spertus observed that
cocitation can indicate that two pages are related [20].  That is, if page A
points to both pages B and C, then B and C might be related.  Various
researchers in the field of bibliometrics have also observed this [9-11,
19], and this observation forms the basis of our Cocitation algorithm.  The
notion of collaborative filtering, although it is based on user's
recommendations rather than hyperlinks, also relies on this observation
[21].  Pitkow and Pirolli [16] cluster Web pages based on cocitation
analysis.  Terveen and Hill [22] use the connectivity structure of the Web
to find groups of related Web sites."

-------------------------------------------------------------
Eugene Garfield, Ph.D.
Chairman Emeritus, ISI, 3501 Market Street, Philadelphia, PA 19104
Publisher, THE SCIENTIST, 3600 Market St,
Philadelphia, PA 19104 (www.the-scientist.com)
Tel: 215-243-2205 // Fax: 215-387-1266
email:  garfield at codex.cis.upenn.edu
The Scientist: http://www.the-scientist.com
Home Page: http://garfield.library.upenn.edu



-------------------------------------------------------------



More information about the SIGMETRICS mailing list