ART:Finding Cyber-communities

Gretchen Whitney gwhitney at UTKUX.UTCC.UTK.EDU
Thu Jul 1 17:36:13 EDT 1999

Trawling the web for emerging cyber-communities
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins

WWW8 Conference Refereed Papers, Toronto 1999

Abstract: The web harbors a large number of communities -- groups of
content-creators sharing a common interest -- each of which manifests
itself as a set of interlinked web pages.  Newgroups and commercial web
directories together contain of the order of 20000 such communities; our
particular interest here is on emerging communities -- those that have
little or no representation in such fora.  The subject of this paper is
the systematic enumeration of over 100,000 such emerging communities from
a web crawl: we call our process trawling.  We motivate a graph-theoretic
approach to locating such communities, and describe the algorithms, and
the algorithmic engineering necessary to find structures that subscribe to
this notion, the challenges in handling such a huge data set, and the
results of our experiment.

Gretchen Whitney, PhD                                     tel 423.974.7919
School of Information Sciences                            fax 423.974.4967
University of Tennessee, Knoxville TN 37996 USA           gwhitney at

More information about the SIGMETRICS mailing list