How to Compare IRs and CRs - or maybe how not to?

Armbruster, Chris Chris.Armbruster at EUI.EU
Sat Feb 9 13:36:01 EST 2008

I also have my doubts that IRs, federated IRs and OAI-PMH will do the job, but CRs are also sometimes no better. Even assuming that content is self-archived, will it be found?

Consider this: It is often assumed that what stands in the way of enhanced functionality and quality is the lack of journal articles available in open access. However, a critical experiment has shown that databases already have problems with coverage even if items are available in open access.  It has been found (Bergstrom/Lavaty 2007) that for 33 key economic journals, ninety percent of articles in the most-cited journals had been self-archived and about fifty percent of articles in less-cited journals were also available freely online. All of the freely available articles were found through Google. Using Google Scholar, they found about 10% less. However, when using OAIster they found only 1/4 of the freely available articles and results were only marginally better for SSRN and RePEc searches. Given the high propensity of economists to self-archive and the availability of institutional and disciplinary repositories, the differences between Google and the non-commercial solutions are so dramatic as to warrant the conclusion that the non-commercial solutions, whatever their merits, have only very limited potential.

Chris Armbruster

-----Original Message-----
From: ASIS&T Special Interest Group on Metrics on behalf of dwojick at
Sent: Sat 09/02/2008 19:16
Subject: Re: [SIGMETRICS] How to Compare IRs and CRs

Steve, I am concerned when you say the following-- "It's from the local 
that the local produce can then be "harvested" (the limitations of a
mixed metaphor!) to some central site, if desired, or just straight to
an indexer like Google Scholar or Citebase."

OA in 10's of 1,000's of IRs is virtually worthless without some very 
good, central, global, search capability. How to build this capability 
is far from clear.

David Wojick

----Original Message----
From: harnad at ECS.SOTON.AC.UK
Date: 02/09/2008 11:49 AM
Subj: [SIGMETRICS] How to Compare IRs and CRs

On Sat, 9 Feb 2008, Leslie Carr wrote:

> On 9 Feb 2008, at 11:35, Thomas Krichel wrote:
>> Yeah, but E-LIS is really small, looking at it today it tells
>> us it has 7253 documents. That IRs struggle to compete with that
>> sort of effort demonstrates that IRs don't populate, even in the
>> presence of mandates. No amount of Driver summits will change this.
> If you go to ROAR you will find 62 "Institutional or Departmental"
> repositories that are bigger than E-LIS (that's out of a total set 
> 562). Admittedly that's just 1 in 8 institutional repositories 
> something approximating to their weight, but then there are only 89
> subject repositories listed in total.
> It's not a done deal by any means, but I think that the trend is
> looking a lot more positive than you suggest .

It's even a shade more subtle than that:

Not only is comparing IRs to CRs comparing apples to fruit, but the
genus and species have different respective denominators to answer to!

(1) Obviously, we would not be surprised if Harvard (with an output 
say, 10K journal articles yearly) had a bigger IR than Mercer County
Community College (with a yearly output of 100 journal articles).

(2) But we would be surprised if the yearly deposit rate for Harvard's
10K annual articles was 1% and the yearly deposit rate for MCC was 
even if that meant that Harvard had 100 annual deposits and MCC had 

(3) So the right unit of comparison is not total repository content, 
course, but proportion of annual output self-archived.

(4) The comparison is more revealing (and exacting) when we compare 
with IRs: How to compare Harvard's IR to the CR for Biomedicine 

(5) We are not surprised if the total annual worldwide (or even just 
output in Biomedicine exceeds the total annual output of Harvard in 

(6) Again, the valid unit of comparison is total annual-deposits 
by annual-output, and for a discipline, total annual output means all
articles published that year in that disciple, originating from all of
the world's research institutions.

And that (if you needed one) is yet another reason why direct IR 
is the systematic way to generate 100% OA. It's apples/apples vs
fruit/fruit -- and all the fruit, hence all the apples, oranges, etc.
are sown, grown and stocked locally. It's from the local repositories
that the local produce can then be "harvested" (the limitations of a
mixed metaphor!) to some central site, if desired, or just straight to
an indexer like Google Scholar or Citebase.

The moral of the story is that we have to normalize IR/IR, IR/CR and
CR/CR comparisons -- and that absolute, non-normalized totals are not
meaningless, but especially misleading about CRs, which give a
spurious impression of magnitude simply by omitting their even-larger
magnitude denominators!

Stevan Harnad

If you have adopted or plan to adopt a policy of providing Open Access
to your own research article output, please describe your policy at:

     BOAI-1 ("Green"): Publish your article in a suitable toll-access 
     BOAI-2 ("Gold"): Publish your article in an open-access journal 
     a suitable one exists.
     in BOTH cases self-archive a supplementary version of your 
     in your own institutional repository.

More information about the SIGMETRICS mailing list