On some webometrics methods

Isidro F. Aguillo isidro.aguillo at CCHS.CSIC.ES
Tue Jun 15 05:28:13 EDT 2010

Dear colleagues:

Many of the webometrics papers currently published are based on methods 
requiring the use of search engines for collecting the web data. I want 
to warn about the use in some of them of obsolete procedures that should 
be discarded:

- Altavista search engine is still popular as source of web data. 
However, Altavista and the Alltheweb, a European popular engine, were 
bought by Yahoo about 2003.  Since then they are using  the Yahoo 
database as a source although maintaining different (original) 
interfaces. Our empirical results show that although the databases are 
the same, the frequency of the updating of the "mirrors" (Altavista, 
Alltheweb)  is lower  than the Yahoo database. Taking into account the 
still fast growth of the Web, this means that for most of the time Yahoo 
is providing MORE and fresher results than its mirrors. As there is no 
technical reason for not using the same operators in the three engines, 
using Altavista is no longer recommended.

- The Web Impact Factor (WIF) was the first web indicator proposed and 
it become popular because the ratio links/webpages were very similar to 
the successful ISI Impact factor that uses citations/papers. 
Unfortunately the distribution of links and webpages follow "power-law" 
distributions, so most of the time the WIF values are useless providing 
artifacts that can not be used for comparative purposes. Several recent 
papers cite the WIF as a useful tool that it is no longer true.



Isidro F. Aguillo, HonPhD
Cybermetrics Lab (3C1)
Albasanz, 26-28
28037 Madrid. Spain

Editor of the Rankings Web

More information about the SIGMETRICS mailing list