Isidro F. Aguillo isidro at CINDOC.CSIC.ES
Wed Mar 22 11:04:59 EST 2000

The aim of this very irregular newsletter is to exchange information
about our emerging topic in order to increase the number of
contributions to the field, including new papers for Cybermetrics
e-journal. As electronic publishing is becoming more popular, the
possibility of adding web periodicals to traditional databases could
have interesting effects on the bibliometric indicators. Really, this is
a new plea to increase the number of scientometric articles in
Cybermetrics in particular and in the WWW in general.

Following the scheme of the past bulletin our first topic is the
webspace size. Recent estimations suggests figures over 1 billon pages
(, but there are several informal sources
that consider the number is already exceeding 1,5 b. From my point of
view, growth rate it is even more important than size, with doubling
periods of less than 6-8 months. In very few years 10 or 20 billion
pages will be common and, considering linguistic diversity, that numbers
could in no way represent an asymptote. So, debate about size is
important, as we are "masters" of the only way to analyse such future
situation: quantitative methods. Obviously such large samples could have
also positive effect on the statistical significance of our tests, so we
could uncover easily new patterns.

In the past the main drawback of cybermetrics methods was the VERY
IMPORTANT inconsistencies of the search engines. This ugly situation
affects specially to Altavista and Hotbot, arguably the most useful
engines for us due to their coverage and full Boolean support (Bar-Ilan,
Rousseau and others described this situation). Some authors even
suggested that this was a structural problem of the web, so direct
application of citation techniques (such as WebIF calculation of
Ingwerson) are not possible.

But the main source of such inconsistencies was the commercial character
of those engines, not the nature of the information available from the
Web. A more stable engine, with a large coverage and some advanced
search options could solve, in part, this problem. In my opinion such
tool is already available as the NEW advanced search screen of FAST
( allow calculating more precise indicators. As
FAST is already the largest engine (about 300 million), now it is
possible to study institutions and not only country domains.

Other important topic is the analysis of the "Invisible Internet", those
part of the web traditionally not indexed by the robots of the search
engines due to the presence of gateways or similar barrier. Until now
the most important part of the "infranet" was the bibliographic or
factual databases, but with the increasing availability of full text
documents and the popularisation of rich-text formats (robots are unable
to extract data from pdf or postcript files) this could change in the
near future.

The large deposits of "pre-prints" plus the huge number of full text
archives in the institutional or personal webpages are important sources
of R&D information and they are playing a relevant new role in the
scholarly communication process. We called it "quality islands" as they
are originated from more or less controlled peer-review, an aspect we
wish to stress.

Now, there is a possibility to analyse this part of the infranet as we
can use anchor-related delimiters of the search engines. By example, we
can develop an strategy to discover "informetrics" papers in pdf format
using "in the link name" option of FAST in order to make
geographic-institutional analysis.

As usual we recommended visit the pages dedicated to new articles we
discover in the web
( and the wonderful
site dedicated to the ISSI 2001 Conference in Sydney
(  In the meantime we
hope to see you in Leiden during the next R&D indicators conference

Cybermetrics is open to your commentaries, advices, criticisms and

