Open access publishing, article downloads, and citations
harnad at ECS.SOTON.AC.UK
Thu Jul 31 16:10:03 EDT 2008
** Apologies for Cross-Posting
On 31-Jul-08, at 11:14 AM, Phil Davis wrote:
> Open access publishing, article downloads, and citations: randomised
> controlled trial
> Philip M Davis, Bruce V Lewenstein, Daniel H Simon, James G Booth,
> Mathew J L Connolly
> BMJ 2008;337:a568
> Published 31 July 2008, doi:10.1136/bmj.a568
Overview (by SH):
Davis et al's study was designed to test whether the "Open Access (OA)
Advantage" (i.e., more citations to OA articles than to non-OA
articles in the same journal and year) is an artifact of a "self-
selection bias" (i.e., better authors are more likely to self-archive
or better articles are more likely to be self-archived by their
The control for self-selection bias was to select randomly which
articles were made OA, rather than having the author choose. The
result was that a year after publication the OA articles were not
cited significantly more than the non-OA articles (although they were
The authors write:
"To control for self selection we carried out a randomised controlled
experiment in which articles from a journal publisher’s websites were
assigned to open access status or subscription access only"
The authors conclude:
"No evidence was found of a citation advantage for open access
articles in the first year after publication. The citation advantage
from open access reported widely in the literature may be an artefact
of other causes."
To show that the OA advantage is an artefact of self-selection bias
(or any other factor), you first have to produce the OA advantage and
then show that it is eliminated by eliminating self-selection bias (or
any other artefact).
This is not what Davis et al did. They simply showed that they could
detect no OA advantage one year after publication in their sample.
This is not surprising, since most other studies don't detect an OA
advantage one year after publication either. It is too early.
To draw any conclusions at all from such a 1-year study, the authors
would have had to do acontrol condition, in which they managed to find
a sufficient number of self-selected self-archived OA articles (from
the same journals, for the same year) that do show the OA advantage,
whereas their randomized OA articles do not. In the absence of that
control condition, the finding that no OA advantage is detected in the
first year for this particular sample of journals and articles is
The authors did find a download advantage within the first year, as
other studies have found. This early download advantage for OA
articles has also been found to be correlated with a citation
advantage 18 months or more later. The authors try to argue that this
correlation would not hold in their case, but they give no evidence
(because they hurried to publish their study, originally intended to
run four years, three years too early.)
(1) The Davis study was originally proposed (in December 2006) as
intended to cover 4 years:
Davis, PN (2006) Randomized controlled study of OA publishing (see
It has instead been released after a year.
(2) The Open Access (OA) Advantage (i.e., significantly more citations
for OA articles, always comparing OA and non-OA articles in the same
journal and year) has been reported in all fields tested so far, for
Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-
Disciplinary Comparison of the Growth of Open Access and How it
Increases Research Citation Impact. IEEE Data Engineering Bulletin
28(4) pp. 39-47.
(3) There is always the logical possibility that the OA advantage is
not a causal one, but merely an effect of self-selection: The better
authors may be more likely to self-archive their articles and/or the
better articles may be more likely to be self-archived; those better
articles would be the ones that get more cited anyway.
(4) So it is a very good idea to try to control methodologically for
this self-selection bias: The way to control it is exactly as Davis et
al have done, which is to select articles at random for being made OA,
rather than having the authors self-select.
(5) Then, if it turns out that the citation advantage for randomized
OA articles is significantly smaller than the citation advantage for
self-selected-OA articles, then the hypothesis that the OA advantage
is all or mostly just a self-selection bias is supported.
(6) But that is not at all what Davis et al. did.
(7) All Davis et al did was to find that their randomized OA articles
had significantly higher downloads than non-OA articles, but no
significant difference in citations.
(8) This was based on the first year after publication, when most of
the prior studies on the OA advantage likewise find no significant OA
advantage, because it is simply too early: the early results are too
noisy! The OA advantage shows up in later years (1-4).
(9) If Davis et al had been more self-critical, seeking to test and
perhaps falsify their own hypothesis, rather than just to confirm it,
they would have done the obvious control study, which is to test
whether articles that were made OA through self-selected self-
archiving by their authors (in the very same year, in the very same
journals) show an OA advantage in that same interval. For if they do
not, then of course the interval was too short, the results were
released prematurely, and the study so far shows nothing at all: It is
not until you have actually demonstrated an OA advantage that you can
estimate how much of that might due to a self-selection artefact!
(10) The study shows almost nothing at all, but not quite nothing,
because one would expect (based on our own previous study, which
showed that early downloads, at 6 months, predict enhanced citations a
year and a half or later) that Davis's increased downloads too would
translate into increased citations, once given enough time.
Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics
as Predictors of Later Citation Impact. Journal of the American
Association for Information Science and Technology (JASIST) 57(8) pp.
(11) The findings of Michael Kurtz and collaborators are also relevant
in this regard. They looked only at astrophysics, which is special, in
that (a) it is a field with only about a dozen journals, and every
research astronomer has subscription access to them -- and these days
also free online access via ADS -- and (b) it is a field in which most
authors self-archive their preprints very early in arxiv -- much
earlier than the date of publication.
Kurtz, M. J. and Henneken, E. A. (2007) Open Access does not increase
citations for research articles from The Astrophysical Journal.
Preprint deposited in arXiv September 6, 2007.
(12) Kurtz & Henneken found the usual self-archiving advantage in
astrophysics (i.e., about twice as many citations for OA papers than
non-OA) but when they analyzed its cause, they found that most of the
cause was the Early Advantage of access to the preprint, as much as a
year before publication of the (OA) postprint. In addition, they found
a self-selection bias (for preprints -- which is all that were
involved here, because, as noted, as of publication, everything is
OA): The better articles by the better authors were more likely to
have been self-archived as preprints.
(13) Kurtz's results do not generalize to all fields, because it is
not true in other fields either that (a) they already have 100% OA for
their published postprints, nor that (b) many authors tend to self-
archive preprints before publication.
(14) However, the fact that early preprint self-archiving (in a field
that is 100% OA as of postprint publication) is sufficient to double
citations is very likely to translate into a similar effect, in a non-
OA field, if one reckons on the basis of the one-year access embargo
that many publishers are imposing on the postprint. (The yearlong "No-
Embargo" advantage in other fields might not turn out to be so big as
to double citations, as with the preprint Early Advantage in
astrophysics, because at least there is some subscription access to
the postprint, but the counterpart of the Early Advantage for the
postprint is likely to be there too.)
(15) Moreover, the preprint OA advantage is primarily Early Advantage,
and only secondarily Self-Selection.
(16) The size of the postprint self-selection bias would have been
what Davis and al tested -- if they had done the proper control, and
waited long enough to get an actual OA effect to compare against.
(17) We had reported in a pilot study that there was no statistically
significant difference between the size of the OA advantage for
mandated and unmandated self-archiving:
Hajjem, C & Harnad, S. (2007) The Open Access Citation Advantage:
Quality Advantage Or Quality Bias?Preprint deposited in arXiv January
(18) We will soon be reporting the results of a 4-year study on the OA
advantage in mandated and unmandated self-archiving that confirms
these earlier findings: Mandated self-archiving is like Davis et al's
randomized OA, and it does not reduce the OA advantage at all -- once
enough time has elapsed for there to be an OA Advantage at all.
American Scientist Open Access Forum
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SIGMETRICS