New Ranking of Central and Institutional Repositories (fwd)

Sat Feb 9 21:48:50 EST 2008

---------- Forwarded message ----------
Date: Sun, 10 Feb 2008 08:36:08 +1100
From: Arthur Sale <ahjs--ozemail.com.au>
To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM--LISTSERVER.SIGMAXI.ORG
Subject: Re: New Ranking of Central and Institutional Repositories

  It looks as though the algorithm is the same as for university websites.

Rank each repository for inward bound hyperlinks (VISIBILITY)
Rank every repository for number of pages (SIZE)
Rank every repository for number of 'interesting' documents eg .doc. .pdf
(RICH FILES)
Rank every repository for number of records returned by a Google Scholar
search (GOOGLE SCHOLAR)
Compute (VISIBILITY x 50%) + (SIZE x 20%) + (RICH FILES x 15%) + (GOOGLE
SCHOLAR x 15%)
And then rank the repositories on this score.

This is a poor measure in general. VISIBILITY (accounts for 50% of score!)
is not necessarily useful for repositories, when harvesting in more
important than hyperlinks. It will be strongly influenced by staff members
linking their publications off a repository search. Both SIZE and RICH FILES
measure absolute size and say nothing about currency or activity. Some of
the higher placed Australian universities have simply had old stuff dumped
in them, and are relatively inactive in acquiring current material. Activity
should be a major factor in metrics for repositories, and this could easily
measured by a search limited to a year (eg 2007), or by the way ROAR does it
through OAI-PMH harvesting.

Arthur Sale
University of Tasmania

>
> (1) I don't know the Webometrics ranking formula, but it is
> clearly based on multiple weighted parameters, and not merely
> on total number of records (country, size, visibility, rich
> files, "scholar"), otherwise the rank order would have been
> the same as what ROAR gives you if you select "Sort by Total Records":
> http://roar.eprints.org/?action=home
<http://roar.eprints.org/?action=home&q=&country=&version=&type>
&q=&country=&version=&type
=&order=recordcount&submit=Filter
>
> The Webometrics "Size" parameter seems to be the same as
> ROAR's "Total records" -- except Webometrics so far seems to
> omit PubMedCentral, which would otherwise be the biggest of
> the CRs. I expect that Webometrics'
> coverage and perhaps also their formula is still being
> refined. [They only seem to cover a total of 200 CRs and IRs
> right now.] And of course there is also still the
> not-yet-solved problem of distinguishing the records that are
> full-texts from those that are just metadata, and
> distinguishing OA content from other kinds of deposits. Stay tuned.
> http://trac.eprints.org/projects/iar/wiki/Missing
>