Cothey V. "Web-crawling reliability" JASIST 55 (14). DEC 2004. p.1228-1238

Eugene Garfield garfield at CODEX.CIS.UPENN.EDU
Wed Dec 22 15:46:38 EST 2004


Viv Cothey : viv.cothey at wlv.ac.uk

TITLE: Web-crawling reliability (Article, English)

AUTHOR: Cothey, V

SOURCE: JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGY 55 (14). DEC 2004. p.1228-1238 JOHN WILEY & SONS INC, HOBOKEN

ABSTRACT:
In this article, I investigate the reliability, in the
social science sense, of collecting informetric data about the World Wide
Web by Web crawling. The investigation includes a critical examination of
the practice of Web crawling and contrasts the results of content
crawling with the results of link crawling. It is shown that Web crawling
by search engines is intentionally biased and selective. I also report
the results of a large-scale experimental simulation of Web crawling that
illustrates the effects of different crawling policies on data
collection. It is concluded that the reliability of Web crawling as a
data collection technique is improved by fuller reporting of relevant
crawling policies.

AUTHOR ADDRESS: V Cothey, Wolverhampton Univ, Sch Comp & Informat Technol,
Lichfield St, Wolverhampton WV1 1SB, England



More information about the SIGMETRICS mailing list