On some webometrics methods
Isidro F. Aguillo
isidro.aguillo at CCHS.CSIC.ES
Tue Jun 15 05:28:13 EDT 2010
Dear colleagues:
Many of the webometrics papers currently published are based on methods
requiring the use of search engines for collecting the web data. I want
to warn about the use in some of them of obsolete procedures that should
be discarded:
- Altavista search engine is still popular as source of web data.
However, Altavista and the Alltheweb, a European popular engine, were
bought by Yahoo about 2003. Since then they are using the Yahoo
database as a source although maintaining different (original)
interfaces. Our empirical results show that although the databases are
the same, the frequency of the updating of the "mirrors" (Altavista,
Alltheweb) is lower than the Yahoo database. Taking into account the
still fast growth of the Web, this means that for most of the time Yahoo
is providing MORE and fresher results than its mirrors. As there is no
technical reason for not using the same operators in the three engines,
using Altavista is no longer recommended.
- The Web Impact Factor (WIF) was the first web indicator proposed and
it become popular because the ratio links/webpages were very similar to
the successful ISI Impact factor that uses citations/papers.
Unfortunately the distribution of links and webpages follow "power-law"
distributions, so most of the time the WIF values are useless providing
artifacts that can not be used for comparative purposes. Several recent
papers cite the WIF as a useful tool that it is no longer true.
Comments?
--
===========================
Isidro F. Aguillo, HonPhD
Cybermetrics Lab (3C1)
IPP-CCHS-CSIC
Albasanz, 26-28
28037 Madrid. Spain
Editor of the Rankings Web
===========================
More information about the SIGMETRICS
mailing list