Kim, GJ; Whang, KY; Kim, MS; Lim, HS; Lee, KH; Lee, BS. IEEE. 2009. An Incremental Clustering Crawler for Community-Limited Search. (ICADIWT 2009): 438-445.

Eugene Garfield garfield at CODEX.CIS.UPENN.EDU
Mon Apr 25 14:09:49 EDT 2011


Kim, GJ; Whang, KY; Kim, MS; Lim, HS; Lee, KH; Lee, BS. IEEE. 2009. An 
Incremental Clustering Crawler for Community-Limited Search. 2009 SECOND 
INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL 
INFORMATION AND WEB TECHNOLOGIES (ICADIWT 2009): 438-445.

presented at 2nd International Conference on the Applications of Digital 
Information and Web Technologies in London, ENGLAND, AUG 04-06, 2009.

Author Full Name(s): Kim, Gye-Jeong; Whang, Kyu-Young; Kim, Min-Soo; Lim, 
Hyo-Sang; Lee, Ki-Hoon; Lee, Byung Suk
Language: English
Document Type: Proceedings Paper

Abstract: We propose an incremental clustering crawler, a novel algorithm for 
finding communities for community-limited search in the web. A web community 
is a set of semantically related sites found through link-based clustering. The 
key idea of the proposed algorithm is to perform clustering incrementally while 
crawling is in progress. This algorithm does not need to crawl all the web pages 
a priori, but needs to crawl only as many web pages as are relevant to the 
clusters that are being formed. This ability to crawl on the fly is an important 
advantage since it is infeasible to crawl the entire set of web pages in the 
world and since we often do not even know which web pages or sites to crawl. 
Another advantage is that the time spent on clustering is reduced because at 
any time the clustering is performed on only the relevant web pages collected 
thus far An apparent disadvantage is that the resulting clusters are not optimal 
since the algorithm does not have all the crawled sites available at the time of 
clustering. Experiments show, however that the achieved cluster quality is 
comparable to the optimal cluster quality which, in our experiments, is achieved 
using the minimum spanning tree clustering algorithm.

Addresses: [Kim, Gye-Jeong] LG Elect Inst Technol, Seoul, South Korea
Reprint Address: Kim, GJ, LG Elect Inst Technol, Seoul, South Korea.
E-mail Address: gjkim at mozart.kaist.ac.kr; kywhang at mozart.kaist.ac.kr; 
mskim at mozart.kaist.ac.kr; hslim at mozart.kaist.ac.kr; khlee at mozart.kaist.ac.kr; 
bslee at cems.uvm.edu
ISBN: 978-1-4244-4456-4
URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5273940



More information about the SIGMETRICS mailing list