Craig et al.'s review of the OA citation advantage

Sat May 26 07:20:42 EDT 2007

On Wed, 23 May 2007 bernd-christoph.kaemper at ub.uni-stuttgart.de wrote:

> Elsevier said that citation rates of their journals had gone
> up considerably because of the increased access through wide-
> spread online availability of their journals...
>
> Online availability clearly increased the IF [journal citation
> impact factor]. In the FUTON subcategory, there was an IF gradient 
> favoring journals with freely available articles. ..."
>
> I think it is quite obvious why sources available with open access 
> will be used and cited more often than others...
>
> So the usefulness of open access is a matter of daily experience, 
> not so much of academic discussions whether there is any empirical 
> proof for a citation advantage of open access that may be isolated 
> by eliminating all possible confounders...
>
> That open access leads to more visibility and thereby potentially
> more citations is trivial, but this relative open access advantage
> will vary from journal to journal... 
>
> Due to the multitude of possible confounding factors I would not 
> believe any of the figures calculated by Stevan Harnad as the 
> cumulated lost impact, or conversely, the possible gain.

I couldn't quite follow the logic of this posting. It seemed to be
saying that, yes, there is evidence that OA increases impact, it is even
trivially obvious, but, no, we cannot estimate how much, because there
are possible confounding factors and the size of the increase varies.

All studies have found that the size of the OA impact differential varies
from field to field, journal to journal, and year to year. The range
of variation is from +25% to over +250% percent. But the differential
is always positive, and mostly quite sizeable. That is why I chose a
conservative overall estimate of +50% for the potential gain in impact if it
were not just the current 15% of research that was being made OA, but
also the remaining 85%. (If you think 50% is not conservative enough, use
the lower-bound 25%: You'll still find a substantial potential impact
gain/loss. If you think self-selection accounts for half the gain, split
it in half again: there's still plenty of gain, once you multiply by
85% of total citations.)

An interesting question that has since arisen (and could be answered by
similar studies) is this:

    Since it is known that (in science) the top 10% of articles published
    receive 90% of the total citations made (Seglen 1992), to what
    extent is the top 10% of articles published over-represented among
    the c. 15% of articles that are being spontaneously made OA by their
    authors today?

It is a logical possibility that all or most of the top 10% are already
among the 15% that are being made OA: I rather doubt it; but it would
be worth checking whether it is so. If it did turn out to be so, then
reaching 100% OA would be far less urgent and important than I had argued,
and OA mandates would likewise be less important.

The empirical studies of the relation between OA and impact have been
mostly motivated by the objective of accelerating the growth of OA -- and
thereby the growth of research usage and impact. Those who are confident
that the OA impact differential is merely or largely a non-causal
self-selection bias are encouraged to demonstrate that that is the case.

Note very carefully, though, that the observed correlation between OA
and citations takes the form of a correlation between the number of OA
articles, relative to non-OA articles, at each citation level. The more
highly cited an article, the more likely it is OA. This is true within
journals, and within and across years, in every field tested.

And this correlation can arise because more-cited articles are more
likely to be made OA *or* because articles that are made OA are more
likely to be cited (or both -- which is what I think is in reality
the case). It is certainly *not* the case that self-selection is the
default or null hypothesis, and that those who interpret the effect as
OA causing the citation increase hence have the burden of proof: The
situation is completely symmetric numerically; so your choice between the
two hypotheses is not based on the numbers, but on other considerations,
such as prima facie plausibility -- or financial interest.

Until and unless it is shown empirically that today's OA 15% already
contains all or most of the top-cited 10% (and hence 90% of what
researchers cite), I think it is a much more plausible interpretation
of the existing findings that OA is a cause of the increased usage and
citations, rather than just a side-effect of them, and hence that there
is usage and impact to be gained by providing and mandating OA. (I can
quite understand why those who have a financial interest in its being
otherwise [Craig et al. 2007] might prefer the other interpretation,
but clearly prima facie plausibility cannot be their justification.)

I also think that 50% of total citations is a plausible overall estimate
of the potential gain from OA, as long as it is understood clearly that
that the 50% gain does not apply to every article made OA. Many articles
are not found useful enough to cite no matter how accessible you make
them. The 50% citation gain will mostly accrue to the top 10% of articles,
as citations always do (though OA will no doubt also help to remedy some
inequities and will sometimes help some neglected gems to be discovered
and used more widely). In other words, the OA advantage to an article
will be roughly proportional to that article's intrinsic citation value
(independent of OA).

Other interesting questions: The top-cited articles are not evenly
distributed among journals. The top journals tend to get the top-cited
articles. It is also unlikely that journal subscriptions are evenly
distributed among journals: The top journals are likely to be subscribed
to more, and are hence more accessible.

So if someone is truly interested in these questions (as I am not!),
they might calculate a "toll-accessibility index" (TAI) for each article,
based on the number of researchers/institutions that have toll access to
the journal in which that article is published. An analysis of covariance
can then be done to see whether and how much the OA citation advantage
is reduced if one controls for the article's TAI. (I suspect the answer
will be: somewhat, but not much.)

Stevan Harnad

Bollen, J., Van de Sompel, H., Smith, J. and Luce, R. (2005) Toward
alternative metrics of journal impact: A comparison of download and
citation data. Information Processing and Management, 41(6): 1419-1440
http://arxiv.org/abs/cs.DL/0503007.

Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics
as Predictors of Later Citation Impact. Journal of the American
Association for Information Science and Technology (JASIST) 57(8) pp.
1060-1072. http://eprints.ecs.soton.ac.uk/10713/

Craig, Ian; Andrew Plume, Marie McVeigh, James Pringle & Mayur Amin
(2007) Do Open Access Articles Have Greater Citation Impact? A critical
review of the literature. Journal of Informetrics.
http://www.publishingresearch.net/Citations-SummaryPaper3_000.pdf.pdf

Davis, P. M. and Fromerth, M. J. (2007) Does the arXiv lead to higher
citations and reduced publisher downloads for mathematics articles?
Scientometrics 71: 203-215.
http://arxiv.org/abs/cs.DL/0603056
See critiques: 
http://www.ecs.soton.ac.uk/%7Eharnad/Hypermail/Amsci/5221
http://www.ecs.soton.ac.uk/%7Eharnad/Hypermail/Amsci/5440.html

Diamond, Jr. , A. M. (1986) What is a Citation Worth? Journal of Human
Resources 21:200-15, 1986,
http://www.garfield.library.upenn.edu/essays/v11p354y1988.pdf

Eysenbach, G. (2006) Citation Advantage of Open Access Articles. PLoS
Biology 4: 157.

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year
Cross-Disciplinary Comparison of the Growth of Open Access and How it
Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4)
pp. 39-47. http://eprints.ecs.soton.ac.uk/11688/

Hajjem, C. and Harnad, S. (2006) Manual Evaluation of Robot Performance
in Identifying Open Access Articles. Technical Report, Institut des
sciences cognitives, Universite du Quebec a Montreal.
http://eprints.ecs.soton.ac.uk/12220/

Hajjem, C. and Harnad, S. (2006) The Self-Archiving Impact Advantage:
Quality Advantage or Quality Bias? Technical Report, ECS, University of
Southampton. http://eprints.ecs.soton.ac.uk/13193/

Hajjem, C. and Harnad, S. (2007) Citation Advantage For OA
Self-Archiving Is Independent of Journal Impact Factor, Article Age, and
Number of Co-Authors. Technical Report, Electronics and Computer
Science, University of Southampton.
http://eprints.ecs.soton.ac.uk/13329/

Hajjem, C. and Harnad, S. (2007) The Open Access Citation Advantage:
Quality Advantage Or Quality Bias?. Technical Report, Electronics and
Computer Science, University of Southampton.
http://eprints.ecs.soton.ac.uk/13328/

Harnad, S. & Brody, T. (2004) Comparing the Impact of Open Access (OA)
vs. Non-OA Articles in the Same Journals, D-Lib Magazine 10 (6) June
(Japanese translation) http://eprints.ecs.soton.ac.uk/10207/

Harnad, S. (2005) Making the case for web-based self-archiving. Research
Money 19(16). http://eprints.ecs.soton.ac.uk/11534/

Harnad, S. (2005) Maximising the Return on UK's Public Investment in
Research. http://eprints.ecs.soton.ac.uk/11220/

Harnad, S. (2005) OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) +
UA. http://eprints.ecs.soton.ac.uk/12085/

Harnad, S. (2005) On Maximizing Journal Article Access, Usage and
Impact. Haworth Press (occasional column).
http://eprints.ecs.soton.ac.uk/10793/

Harnad, S. (2006) Within-Journal Demonstrations of the Open-Access
Impact Advantage: PLoS, Pipe-Dreams and Peccadillos (LETTER). PLOS
Biology 4(5). http://eprints.ecs.soton.ac.uk/12607/

Henneken, E. A., Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C.,
Thompson, D., and Murray, S. S. (2006) Effect of E-printing on Citation
Rates in Astronomy and Physics. Journal of Electronic Publishing, Vol.
9, No. 2, Summer 2006
http://arxiv.org/abs/cs/0604061

Henneken, E. A., Kurtz, M. J., Warner, S., Ginsparg, P., Eichhorn, G.,
Accomazzi, A., Grant, C. S., Thompson, D., Bohlen, E. and Murray, S. S.
(2006) E-prints and Journal Articles in Astronomy: a Productive
Co-existence (submitted to Learned Publishing)
http://arxiv.org/abs/cs/0609126

Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C. S., Demleitner, M.,
Murray, S. S. (2005) The Effect of Use and Access on Citations.
Information Processing and Management, 41 (6): 1395-1402, December 2005
http://cfa-www.harvard.edu/%7Ekurtz/kurtz-effect.pdf

Kurtz, Michael and Brody, Tim (2006) The impact loss to authors and
research. In, Jacobs, Neil (ed.) Open Access: Key strategic, technical
and economic aspects. Oxford, UK, Chandos Publishing.
http://eprints.soton.ac.uk/40867/

Lawrence, S, (2001) Online or Invisible?, Nature 411 (2001) (6837): 521.
http://www.neci.nec.com/lawrence/papers/online-nature01/

Metcalfe, Travis S (2006) The Citation Impact of Digital Preprint Archives
for Solar Physics Papers. Solar Physics 239: 549-553
http://adsabs.harvard.edu/abs/2006SoPh..239..549M

Moed, H. F. (2006) The effect of 'Open Access' upon citation impact: An
analysis of ArXiv's Condensed Matter Section
http://arxiv.org/abs/cs.DL/0611060

Perneger, T. V. (2004) Relation between online 'hit counts' and
subsequent citations: prospective study of research papers in the
British Medical Journal. British Medical Journal 329:546-547.
http://bmj.bmjjournals.com/cgi/content/full/329/7465/546

Seglen, P.O. (1992) The skewness of science. The American Society for
Information Science 43: 628-638
http://dx.doi.org/10.1002/(SICI)1097-4571(199210)43:9%3C628::AID-ASI5%3E3.0.CO;2-0