New Ranking of Central and Institutional Repositories

Mon Feb 11 19:42:27 EST 2008

 Isidro

As one of those that contributed to that discussion, may I be more specific?

The impact of a repository should be measured by things other than some of
the measures that you use. PageRank and Size are both very weak indicators.
I give examples below.

VISIBILITY
Visibility in the way you measure is nothing to do with the purpose of
repositories, and only a minor factor in their impact. Let me give examples:

*       Inward links to the repository itself are relatively rare, and
probably negligible in the total. Almost no-one really goes to a repository
to search its content except locally - its value is in federation. The
exceptions are (1) central repositories such as CERN, RepEc, ArXiv, etc, and
(2) exemplar repositories such as Southampton and QUT. The component is
hugely biased towards these repositories.
*       The majority of links to institutional repositories on the Web are
probably from depositor's home pages, linking to their research records. In
UTas we will gain 600-1000 such links once it is in the standard staff
member template. Is this visibility? Or does it measure university size?
*       In a few cases, viewers may link to a paper. However to do this they
have to value the paper significantly, then copy the URL, and then post it
to a public website or blog. I expect this is a minority in the total of
links. Any data otherwise? In any case it is dependent on an author's
importance in the field, not the repository value.

REAL VISIBILITY
Real visibility in the case of a repository consists in (a) whether it
provides a compliant OAI-PMH interface, and (b) whether that interface is
harvested by federated services, such as ROAR, OAIster, etc. One might also
add whether the repository is actively harvested as a flat file or via OAI
by Google and Google Scholar, Scopus, or Thomson. Noithing else really
matters in respect of visibility. All these are measurable. PageRank is
irrelevant, sorry.

SIZE
Size is a terrible measure. Australia is full of examples where the
repository has been populated by uploading zillions of old stub records
going back to the 1930s or before. The full text is mostly missing, though
sometimes a grant has funded image scanning of the document. This is
fullness for the sake of fullness. To give one example in your list, the
Australasian Digital Thesis Program has 110,000 records of this type of old
PhD theses. The full-text simply says: contact the university for a
photocopy. That's OK, but the weighting of size ought to be low - less than
20%.

If it is necessary to measure size, and it probably is, then I suggest a
measure that counts the number of records with a publication date within the
last five years. Choose 10 years if you want, but ancient record-keeping
does not translate into impact.

ACTIVITY
It is quite clear from ROAR that deposit activity is a major measure of
impact. There are three easy measures to derive.

*       The number of acquisitions in the last 12 months. Easily discovered
from the OAI interface.
The number of acquisitions with a publication date in the last 12 months.
Easily discovered from the OAI interface. This measures currency as well as
activity.
*       Some repositories are sporadic, some are continuous, the latter
reflecting a deep-seated integration within the university's activity. A
simple measure would be to derive a statistic from the traffic (see ROAR),
such as

*       number of days in last 12 months with a deposit event
*       the Fourier spectrum of the last 12 months deposit events having no
component with a period longer than 7 days above 10% (I guess at what is
significant and perhaps this can be turned into a score).

RICH TEXT
This is a reasonable measure, though subject to error. For example we
sometimes put a full-text that gives instructions on how to ask for access
to the item concerned, or a bio of the creator of an artwork.

DOWNLOADS
I'd love to promote downloads as a measure of impact, but there is as yet no
federated way to access this data.

I'm happy to continue this dialogue.

Arthur Sale
Professor of Computer Science
University of Tasmania

> -----Original Message-----
> From: American Scientist Open Access Forum
> [mailto:AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM at LISTSERVER.SIGMAX
> I.ORG] On Behalf Of Isidro F. Aguillo
> Sent: Monday, 11 February 2008 6:53 PM
> To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM at LISTSERVER.SIGMAXI.ORG
> Subject: Re: [AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM] New
> Ranking of Central and Institutional Repositories
>
> Dear all:
>
> Thanks for your interest in the Ranking of repositories, part
> of our larger effort for rnaking webpresence of universities
> and research centers. A few comments to your messages:
>
> - Currently the Ranking of repositories is a beta version. We
> will thank comments, suggestions and criticisms. Information
> about missed repositories are warmly welcomed. After feedback
> recieved during the last days we are considering a new
> edition before the scheduled one in July.
> - Our rank formula mimic in part PageRank but our
> "inspiration" was in fact impact factor. We maintain a ratio
> 1:1 between visibility (impact) and size (activity) that it
> is the basis of IF. In order to take into account the
> diversity of web info we decide to split the size
> contribution according to additional criteria.
> - Freshness is a topic we are concerned about not only for 
> repositories but for the rest of the rankings too. We are
> considering to take it into account  in the Scholar
> contribution giving more weight to recent publications.
> - There are methodological problems for producing relative
> indicators:
> percentage of global output, or institution size
> normalization. But you know ranking are usually build by GDP
> (US, Japan, Germany,...) and not GDP per capita (Luxembourg,
> United Arab Emirates, ...)
> - Our position as a research group has been previously stated
> but I am going to summarise again: The rankings are made with
> the aim of increase the volume of academic information
> available on the Web, promoting the electronic publication of
> all the activities of the universities, not only the research
> related ones. And specially from developing countries institutions.
>
> Best regards,
>
> Leslie Carr escribió:
> >
> > On 9 Feb 2008, at 21:36, Arthur Sale wrote:
> >
> >> It looks as though the algorithm is the same as for
> university websites.
> >>
> >> Rank each repository for inward bound hyperlinks (VISIBILITY) Rank
> >> every repository for number of pages (SIZE) Rank every
> repository for
> >> number of 'interesting' documents eg .doc.
> >> .pdf (RICH FILES)
> >> Rank every repository for number of records returned by a Google
> >> Scholar search (GOOGLE SCHOLAR) Compute (VISIBILITY x 50%)
> + (SIZE x
> >> 20%) + (RICH FILES x 15%) + (GOOGLE SCHOLAR x 15%) And
> then rank the
> >> repositories on this score.
> >>
> >> This is a poor measure in general. VISIBILITY (accounts for 50% of
> >> score!) is not necessarily useful for repositories, when
> harvesting
> >> in more important than hyperlinks. It will be strongly
> influenced by
> >> staff members linking their publications off a repository search.
> >> Both SIZE and RICH FILES measure absolute size and say
> nothing about
> >> currency or activity. Some of the higher placed Australian
> >> universities have simply had old stuff dumped in them, and are
> >> relatively inactive in acquiring current material.
> Activity should be
> >> a major factor in metrics for repositories, and this could easily
> >> measured by a search limited to a year (eg 2007), or by
> the way ROAR
> >> does it through OAI-PMH harvesting.
> >>
> > I believe that the Webometrics (ghastly name!) ranking of
> repositories
> > uses the same criteria as its ranking of universities ie it is
> > attempting to quantify the impact that the repository has
> had. This is
> > very different to the size, deposit activity, or even
> used-ness of the
> > repository and explains why the major contributing factor is
> > VISIBILITY. The main issue for this league table is "how
> much evidence
> > is there in the public web that your active research and scholarly
> > outputs are valued enough by your community of peers that they are
> > linking to them".
> >
> > This will probably seem entirely arbitrary to some people, and
> > entirely obvious to others, depending on how much they see "the web"
> > as a para-literature. It mimics Google's PageRank valuation of web
> > pages according to how many 'votes' (links/quasi-citations)
> they get
> > from other pages from independent sources.
> >
> >  It is not possible to tell with any accuracy whether a University
> > Website is "a good website" simply by looking at the University's
> > place in the Webometrics Ranking of Universities. The website is
> > simply a channel which delivers visibility-impact for the
> University
> > (or not). Similarly for the repository.
> > --
> > Les Carr
> >
>
> --
> ****************************
> Isidro F. Aguillo
> Laboratorio de Cibermetría
> Cybermetrics Lab
> CCHS - CSIC
> Joaquin Costa, 22
> 28002 Madrid. Spain
>
> isidro @ cindoc.csic.es
> +34-91-5635482 ext 313
> ****************************
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20080212/45335796/attachment.html>