Size of repositories (fwd)

Stevan Harnad harnad at ECS.SOTON.AC.UK
Tue Mar 20 13:15:49 EDT 2007

---------- Forwarded message ----------
Date: Tue, 20 Mar 2007 15:42:35 +0100
From: Isidro F. Aguillo <isidro at CINDOC.CSIC.ES>
Subject: Size of repositories

We have made an experiment when preparing the new edition of the
Webometrics Ranking of World Universities (
to calculate the number of documents in web repositories needed for
reaching certain level. We defined Premier League for the Universities
in the Top 200, World Class for the Top 500 and Regional Class when they
appear among the Top 1000.

We collected data for rich formats, including Adobe Acrobat (pdf), MS
Word (doc), MS PowerPoint (ppt) and PostScript (ps) files and Google
Scholar database.

The thresholds are as follows:

                  PDF     DOC    PPT      PS    SCHOLAR

PREMIER LEAGUE   19000    4000    2000    1000     3300

WORLD CLASS       7000    2000    1000     300     1200

REGIONAL CLASS    3000     500     300      50      400

These figures could be used as a reference in repository planning. All
the data refers to publicly accessible documents in the Web being
indexed by major search engines

Isidro F. Aguillo
isidro at
Ph:(+34) 91-5635482 ext. 313

Cybermetrics Lab
Joaquin Costa, 22
28002 Madrid. SPAIN

More information about the SIGMETRICS mailing list