ART:Finding Cyber-communities
Gretchen Whitney
gwhitney at UTKUX.UTCC.UTK.EDU
Thu Jul 1 17:36:13 EDT 1999
Trawling the web for emerging cyber-communities
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins
WWW8 Conference Refereed Papers, Toronto 1999
http://www8.org/w8-papers/4a-search-mining/trawling/trawling.html
Abstract: The web harbors a large number of communities -- groups of
content-creators sharing a common interest -- each of which manifests
itself as a set of interlinked web pages. Newgroups and commercial web
directories together contain of the order of 20000 such communities; our
particular interest here is on emerging communities -- those that have
little or no representation in such fora. The subject of this paper is
the systematic enumeration of over 100,000 such emerging communities from
a web crawl: we call our process trawling. We motivate a graph-theoretic
approach to locating such communities, and describe the algorithms, and
the algorithmic engineering necessary to find structures that subscribe to
this notion, the challenges in handling such a huge data set, and the
results of our experiment.
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Gretchen Whitney, PhD tel 423.974.7919
School of Information Sciences fax 423.974.4967
University of Tennessee, Knoxville TN 37996 USA gwhitney at utk.edu
http://web.utk.edu/~gwhitney/
More information about the SIGMETRICS
mailing list