ART:Finding Cyber-communities

Gretchen Whitney gwhitney at UTKUX.UTCC.UTK.EDU
Thu Jul 1 17:36:13 EDT 1999


Trawling the web for emerging cyber-communities
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins

WWW8 Conference Refereed Papers, Toronto 1999

http://www8.org/w8-papers/4a-search-mining/trawling/trawling.html

Abstract: The web harbors a large number of communities -- groups of
content-creators sharing a common interest -- each of which manifests
itself as a set of interlinked web pages.  Newgroups and commercial web
directories together contain of the order of 20000 such communities; our
particular interest here is on emerging communities -- those that have
little or no representation in such fora.  The subject of this paper is
the systematic enumeration of over 100,000 such emerging communities from
a web crawl: we call our process trawling.  We motivate a graph-theoretic
approach to locating such communities, and describe the algorithms, and
the algorithmic engineering necessary to find structures that subscribe to
this notion, the challenges in handling such a huge data set, and the
results of our experiment.


<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Gretchen Whitney, PhD                                     tel 423.974.7919
School of Information Sciences                            fax 423.974.4967
University of Tennessee, Knoxville TN 37996 USA           gwhitney at utk.edu
http://web.utk.edu/~gwhitney/



More information about the SIGMETRICS mailing list