On Comparing Institutional Apples With Multi-Institutional Fruit: The Denominator Fallacy Again

Thu Jul 8 10:38:19 EDT 2010

On Thu, Jul 8, 2010 at 5:44 AM, Armbruster, Chris <Chris.Armbruster --
eui.eu> wrote:

> "Institution" is indeed not a very precise concept, but the repository
> ranking will not be improved if one were to spend much time trying to decide
> which repository is institutional and which is not

If there is any rationale for separately ranking and comparing -- as
http://repositories.webometrics.info/ the Ranking Web of World
Repositories RWWR does -- both the top 800 repositories and the top
800 *institutional* repositories (and there is indeed an important
rationale for doing so), then that rationale is that the institutions
are indeed *institutional* and not multi-institutional. The purpose is
to rank their relative size (and hence their success in capturing
their target content), and there is no point in comparing the size of
the category "apple" with the size of the category "fruit." This is
the "denominator fallacy": http://bit.ly/denominator-fallacy

The pro's and con's of Chris Armbruster's advocacy of central
(multi-institutional) repositories over institutional repositories
have already been multiply discussed over the years in this Forum and
elsewhere: http://bit.ly/Armbruster-central

The argument for institutional repositories is that (1) institutions
are the providers of all of OA's target content, (2) they have a stake
in managing their own output, and (most important of all) (3) they are
in a position to mandate the deposit of their own output.

The argument for multi-institutional (central) repositories is that
they look (superficially) as if they were bigger, hence more
"successful" in attracting OA's target content. (Hence Chris's
preference for keeping the two kinds of repositories and their sizes
conflated in the RWWR rankings.) They also look (superficially) more
manageable and sustainable.

The argument against multi-institutional (central) repositories is (a)
that multi-institutional entities (notably, funders) cannot mandate
the deposit of all institutional research output (because not all
research is funded), (b) that central deposit mandates compete with
instead of reinforcing institutional mandates (eliciting resistance
from authors facing the prospect of having to do double-deposits), and
(most relevantly here) (c) that the size and success of a repository
can only be evaluated and compared in relation to the size of that
repository's total target output: And although there are differences
among institutions in the size of their own total output (which can
and should be weighted to normalize it and make it comparable), the
differences in size between institutions and multi-institutions is the
difference in size between the number of apples and the number of
fruit. (The denominator fallacy.)

Multi-institutional (central) repositories' content would have to be
weighted by the output of all their actual and potential target
institutions and the total target content of each, in order to make
multi-institutional rankings comparable to those of individual
institutions. RWWR is not doing that kind of weighting -- nor would it
be easy to determine those weightings for each kind of
multi-institutional repository, though it may eventually be possible
to estimate in principle. If it were done, however, there would hardly
be any need for two rankings (for repositories vs. institutional
repositories).

What would be clear from a proper denominator-weighted ranking of
institutional and multi-institutional repositories is that, contrary
to what Chris has argued, it is not at all true that the
multi-institutional repositories are bigger or more successful in
collecting their respective total target contents. Rather, it makes
much more sense for both institutions and funders to mandate that
researchers deposit in their own institutional repository -- from
which multi-institutional collections could then be automatically
harvested. (It would then be redundant to try to compare their
relative success, as one would clearly derivative from the other.)

For management and sustainability, local institutional deposit and
central harvesting is the complementary -- and optimal -- solution.
But first the primary content-provision problem has to be solved,
otherwise there is next to nothing to manage and sustain!

> how about also deleting No 10 because it is only a departmental repository?

A departmental repository, in contrast, is *sub-institutional* rather
than multi-institutional. Hence, unless there is to be a separate RWWR
ranking of the top 800 *departmental* mandates, there is no harm in
listing the departmental repositories among the institutional
repositories -- *except* if the university has both an institutional
and a departmental repository, *and* the contents of the departmental
repository are also a proper subset of the contents of the
institutional repository, hence double-counted.

This is not the case in the instance of ["institutional"] repository
#10, University of Southampton School of Electronics and Computer
Science, whose contents are *not* part of institutional repository
#27, University of Southampton. Rather than resulting in an inflated
ranking for Southampton, this actually results in a *lower* ranking.
The joint RWWR ranking of the integrated institutional repository
would be higher for Southampton. (That said, with a properly weighted
denominator, separately tagged departmental repositories would be
useful at this time, to compare the relative success of
institution-wide mandates vs. departmental/school/faculty mandates --
i.e., Arthur's Sale's "patchwork mandate" strategy:
http://bit.ly/Patchwork-Mandate .)

> Also, it is a
> bad idea to define repositories as institutional only if they restrict
> themselves to the output of a single institution. We already have too many
> repository managers who succumb to this kind of institutionalist logic - and
> reject OA content only because it is not from their own institution.

If only the problem were that of an overflowing cup, with so much OA
target content that it needs to be rejected!

Chris has the OA content problem completely upside-down! The problem
is that not enough of each institution's own OA target content is
being deposited, anywhere -- not that institutions are declining to
host the output of other institutions. (It is only Chris's
central-repository preoccupation that makes him imagine that the
latter is the problem.)

What's missing is not repositories to deposit in, but *mandates to
deposit*. The solution is for institutions and funders to mandate
institutional deposit of all content, funded and unfunded, across all
disciplines -- and then, if desired, to harvest that content into
various central collections, by discipline, funder, language or
nation, as desired. Institutions are the universal providers of all
that content; they are also the natural locus for deposit mandates.

> The CSIC has a sound methodology for ranking repositories, and it not their
> job to define exclusively what is an IR and what not. And in cyberspace it
> is much more interesting to compare repositories according to domains and
> services they offer...

I take it that by the CSIC Chris means the RWWR:
http://repositories.webometrics.info/about.html

And as far as I can tell, the only reason Chris finds the methodology
sound is that it conflates institutional and multi-institutional
repositories, which favors Chris's preference for multi-institutional
repositories.

What is much more interesting and important in cyberspace than the
locus of the distributed content is the *presence* of the content.
Most (80%) of OA's target content is still missing from anywhere on
the (free) web, and long overdue. Locus matters strategically for the
concrete, practical goal of capturing that target content (and making
it OA). Chris keeps systematically missing this point. If the content
were all there already, none of this would matter in the slightest.

(And a good intuition pump to bear in mind is that the key to the
success of Google and the like was not to try to get everyone to
deposit their content directly in Google: What happened, and worked,
was distributed, local deposit and hosting, followed by central
harvesting. Not a bad principle to generalize to OA...)

> Moreover, it would help if we could move beyond the often narrow
> understanding of what an institutional repository is and what not &
> acknowledge more clearly that a strategy of privileging institutional
> repositories as such has not helped.

Chris does not seem to have noticed the growing
institutional/departmental repository mandate movement (initiated in
2002 by Southampton ECS, but greatly accelerated since the 16th
mandate in 2008 by Harvard FAS, and now running well over 100
institutional/departmental mandates, including UCL, MIT and Stanford,
as well as over 40 funder mandates).
http://www.openoasis.org/index.php?option=com_content&view=article&id=144&Itemid=338

It is not (and never has been) a matter of merely "privileging"
institutional deposit, but *mandating* it.

> The value & sustainability of IRs
> (individually, as isolated instances, & if not embedeed in a national
> system) is rather limited for both scholarship and open access.

(1) Repository value is nil without content.

(2) With content, locus is irrelevant, as search is not local but
global, via central harvesters.

(3) Sustainability is a red herring (especially with today's sparse OA
content); institutional deposit loci and central harvesters are
complementary, insofar as preservation is concerned.

(4) Nations can and should mandate OA deposit. Nations can and should
harvest OA deposits centrally. But there is no earthly need (or
prospect) of nations directly hosting all their institutional OA
output centrally, any more than there is any earthly need for nations
to host all their institutions centrally.

(5) If Chris is worried about limitations on OA scholarship, he should
set his mind to thinking of how to induce the OA target content
providers (institutional researchers) to deposit their content, to
make it OA.

(6) IRs will take care of themselves.

> Hence, it is
> very welcome that more determined efforts are underway at building viable
> networks of research repositories and integrate IRs in national systems
> (e.g. Ireland as latest instance).

All true, but a non sequitur, insofar was the fundamental problem of
filling those repositories with their target contents is concerned.

> For a sustained argument, please see:
>
> Armbruster/Romary (2010) Comparing Repository Types: Challenges and Barriers
> for Subject-Based Repositories, Research Repositories, National Repository
> Systems and Institutional Repositories in Serving Scholarly Communication."
> (accepted for publication in IJDLS)
> http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1506905
>
> Romary/Armbruster (2010) Beyond Institutional Repositories. IJDLS 1(1)44-61
> http://ssrn.com/abstract=1425692

For a sustained critique and response, see:

Conflating OA Repository-Content, Deposit-Locus, and Central-Service Issues
http://openaccess.eprints.org/index.php?/archives/665-guid.html

Institutional vs. Central Repositories: 2 (of 2)
http://openaccess.eprints.org/index.php?/archives/659-guid.html

Institutional vs. Central Repositories: 1 (of 2)
http://openaccess.eprints.org/index.php?/archives/658-guid.html

Beyond Romary & Armbruster On Institutional Repositories
http://openaccess.eprints.org/index.php?/archives/606-guid.html

When Will the Research Community Take OA Matters Into Its Own Hands?
http://openaccess.eprints.org/index.php?/archives/244-guid.html

First Things First: OA Self-Archiving, Then Maybe OA Publishing
http://openaccess.eprints.org/index.php?/archives/155-guid.html

Well-Meaning Supporters of "OA + X" Inadvertently Opposing OA
http://openaccess.eprints.org/index.php?/archives/182-guid.html

Swan, A., Needham, P., Probets, S., Muir, A., Oppenheim, C., O’Brien,
A., Hardy, R., Rowland, F. and Brown, S. (2005) Developing a model for
e-prints and open access journal content in UK further and higher
education. Learned Publishing, 18 (1). pp. 25-40.
http://eprints.ecs.soton.ac.uk/11000/

Stevan Harnad