He X, Zha HY, Ding CHQ, Simon HD "Web document clustering using hyperlink structures" COMPUTATIONAL STATISTICS & DATA ANALYSIS 41 (1): 19-45 NOV 28 2002
Eugene Garfield
garfield at CODEX.CIS.UPENN.EDU
Wed Dec 18 16:11:46 EST 2002
Xiaopeng HE : {xhe,zha}@cse.psu.edu
Title Web document clustering using hyperlink structures
Author He X, Zha HY, Ding CHQ, Simon HD
Journal COMPUTATIONAL STATISTICS & DATA ANALYSIS 41 (1): 19-45 NOV 28 2002
Document type: Article Language : English
Cited References : 35 Times Cited: 0
Abstract:
With the exponential growth of information on the World Wide Web, there is
great demand for developing efficient methods for effectively organizing the
large amount of retrieved information. Document clustering plays an
important role in information retrieval and taxonomy management for the Web.
In this paper we examine three clustering methods: K-means, multi-level
METIS, and the recently developed normalized-cut-method using a new approach
of combining textual information, hyperlink structure and co-citation
relations into a single similarity metric. We found the normalized-cut
method with the new similarity metric is particularly effective, as
demonstrated on three datasets of web query results. We also explore some
theoretical connections between the normalized-cut method and the K-means
method. (C) 2002 Elsevier Science B.V. All rights reserved.
Author Keywords:
World Wide Web, graph partitioning, cheeger constant, clustering method,
K-means method, normalized cut method, eigenvalue decomposition, link
structure,
similarity metric
KeyWords Plus:
GRAPHS, EIGENVECTORS, ALGORITHM, MATRICES
Addresses:
Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
Univ Calif Berkeley, Lawrence Berkeley Lab, NERSC Div, Berkeley, CA 94720
USA
Publisher:
ELSEVIER SCIENCE BV, AMSTERDAM
IDS Number:
615NV
ISSN:
0167-9473
Cited Author Cited Work Volume Page Year
ANICK PG P 7 INT ACM SIGIR C 349 1994
BHARAT K P 7 INT WORLD WID WE 379 1998
CHAKRABARTI S COMPUT NETWORKS ISDN 30 65 1998
CHAKRABARTI S COMPUTER 32 60 1999
CHEEGER J LOWER BOUND SMALLEST 1970
CHUNG FRK SPECTRAL GRAPH THEOR 1997
CROFT WB PROVIDING GOVT INFOR 95
DONATH W IBM TECHNICAL DISCLO 15 938 1972
EFTHIMIADIS EN P 16 INT C ASS COMP 146 1993
EVERITT B CLUSTER ANAL 1993
FIEDLER M CZECH MATH J 25 619 1975
FIEDLER M CZECH MATH J 23 298 1973
FLAKE GW EFFICIENT IDENTIFICA 150 2000
FRIEZE A FAST MONTE CARLOL ME 2000
GIBSON D P 9 ACM C HYP HYP 225 1998
GOLUB G MATRIX COMPUTATIONS 1989
GORDON AD CLASSIFICATION 1981
HEARST MA P SIGIR 96 246 1996
HENDRICKSON B SIAM J SCI COMPUT 16 452 1995
KARYPIS G METISASTERIX SOFTWAR
KLEINBERG J P ACM SIAM S DISCR A 668 1998
KLEINBERG JM P 5 ANN INT COMP COM 26 1999
KUMAR R P 25 INT C VER LARG 639 1999
LARSON R P 59 ANN M AM SOC IN 71 1996
LI YH IEEE INTERNET COMPUT 2 24 1998
MOHAR B DISCRETE MATH 109 171 1992
PIROLLI P P ACM C HUM FACT COM 118 1996
PORTER MF PROGRAM 14 130 1980
POTHEN A SIAM J MATRIX ANAL A 11 430 1990
RIJSBERGERN CJV INFORMATION RETRIEVA 1979
SHI JB PROC CVPR IEEE 731 1997
SMALL H J AM SOC INFORM SCI 24 265 1973
SPIELMAN DA IN PRESS P 37 ANN IE 96 1996
WILLETT P INFORMATION PROCESSI 24 577 1988
ZAMIR O GROUPER DYNAMIC CLUS 1999
More information about the SIGMETRICS
mailing list