Kim, GJ; Whang, KY; Kim, MS; Lim, HS; Lee, KH; Lee, BS. IEEE. 2009. An Incremental Clustering Crawler for Community-Limited Search. (ICADIWT 2009): 438-445.
Eugene Garfield
garfield at CODEX.CIS.UPENN.EDU
Mon Apr 25 14:09:49 EDT 2011
Kim, GJ; Whang, KY; Kim, MS; Lim, HS; Lee, KH; Lee, BS. IEEE. 2009. An
Incremental Clustering Crawler for Community-Limited Search. 2009 SECOND
INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL
INFORMATION AND WEB TECHNOLOGIES (ICADIWT 2009): 438-445.
presented at 2nd International Conference on the Applications of Digital
Information and Web Technologies in London, ENGLAND, AUG 04-06, 2009.
Author Full Name(s): Kim, Gye-Jeong; Whang, Kyu-Young; Kim, Min-Soo; Lim,
Hyo-Sang; Lee, Ki-Hoon; Lee, Byung Suk
Language: English
Document Type: Proceedings Paper
Abstract: We propose an incremental clustering crawler, a novel algorithm for
finding communities for community-limited search in the web. A web community
is a set of semantically related sites found through link-based clustering. The
key idea of the proposed algorithm is to perform clustering incrementally while
crawling is in progress. This algorithm does not need to crawl all the web pages
a priori, but needs to crawl only as many web pages as are relevant to the
clusters that are being formed. This ability to crawl on the fly is an important
advantage since it is infeasible to crawl the entire set of web pages in the
world and since we often do not even know which web pages or sites to crawl.
Another advantage is that the time spent on clustering is reduced because at
any time the clustering is performed on only the relevant web pages collected
thus far An apparent disadvantage is that the resulting clusters are not optimal
since the algorithm does not have all the crawled sites available at the time of
clustering. Experiments show, however that the achieved cluster quality is
comparable to the optimal cluster quality which, in our experiments, is achieved
using the minimum spanning tree clustering algorithm.
Addresses: [Kim, Gye-Jeong] LG Elect Inst Technol, Seoul, South Korea
Reprint Address: Kim, GJ, LG Elect Inst Technol, Seoul, South Korea.
E-mail Address: gjkim at mozart.kaist.ac.kr; kywhang at mozart.kaist.ac.kr;
mskim at mozart.kaist.ac.kr; hslim at mozart.kaist.ac.kr; khlee at mozart.kaist.ac.kr;
bslee at cems.uvm.edu
ISBN: 978-1-4244-4456-4
URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5273940
More information about the SIGMETRICS
mailing list