How to Compare IRs and CRs

Stevan Harnad harnad at ECS.SOTON.AC.UK
Sat Feb 9 11:49:25 EST 2008

On Sat, 9 Feb 2008, Leslie Carr wrote:

> On 9 Feb 2008, at 11:35, Thomas Krichel wrote:
>> Yeah, but E-LIS is really small, looking at it today it tells
>> us it has 7253 documents. That IRs struggle to compete with that
>> sort of effort demonstrates that IRs don't populate, even in the
>> presence of mandates. No amount of Driver summits will change this.
> If you go to ROAR you will find 62 "Institutional or Departmental"
> repositories that are bigger than E-LIS (that's out of a total set of
> 562). Admittedly that's just 1 in 8 institutional repositories pulling
> something approximating to their weight, but then there are only 89
> subject repositories listed in total.
> It's not a done deal by any means, but I think that the trend is
> looking a lot more positive than you suggest .

It's even a shade more subtle than that:

Not only is comparing IRs to CRs comparing apples to fruit, but the
genus and species have different respective denominators to answer to!

(1) Obviously, we would not be surprised if Harvard (with an output of,
say, 10K journal articles yearly) had a bigger IR than Mercer County
Community College (with a yearly output of 100 journal articles).

(2) But we would be surprised if the yearly deposit rate for Harvard's
10K annual articles was 1% and the yearly deposit rate for MCC was 90%,
even if that meant that Harvard had 100 annual deposits and MCC had only

(3) So the right unit of comparison is not total repository content, of
course, but proportion of annual output self-archived.

(4) The comparison is more revealing (and exacting) when we compare CRs
with IRs: How to compare Harvard's IR to the CR for Biomedicine (PubMed

(5) We are not surprised if the total annual worldwide (or even just US)
output in Biomedicine exceeds the total annual output of Harvard in all

(6) Again, the valid unit of comparison is total annual-deposits divided
by annual-output, and for a discipline, total annual output means all
articles published that year in that disciple, originating from all of
the world's research institutions.

And that (if you needed one) is yet another reason why direct IR deposit
is the systematic way to generate 100% OA. It's apples/apples vs
fruit/fruit -- and all the fruit, hence all the apples, oranges, etc.
are sown, grown and stocked locally. It's from the local repositories
that the local produce can then be "harvested" (the limitations of a
mixed metaphor!) to some central site, if desired, or just straight to
an indexer like Google Scholar or Citebase.

The moral of the story is that we have to normalize IR/IR, IR/CR and
CR/CR comparisons -- and that absolute, non-normalized totals are not
meaningless, but especially misleading about CRs, which give a
spurious impression of magnitude simply by omitting their even-larger
magnitude denominators!

Stevan Harnad

If you have adopted or plan to adopt a policy of providing Open Access
to your own research article output, please describe your policy at:

     BOAI-1 ("Green"): Publish your article in a suitable toll-access journal
     BOAI-2 ("Gold"): Publish your article in an open-access journal if/when
     a suitable one exists.
     in BOTH cases self-archive a supplementary version of your article
     in your own institutional repository.

More information about the SIGMETRICS mailing list