Open access publishing, article downloads, and citations

Stevan Harnad harnad at ECS.SOTON.AC.UK
Thu Jul 31 16:10:03 EDT 2008

                           ** Apologies for Cross-Posting

On 31-Jul-08, at 11:14 AM, Phil Davis wrote:

> Open access publishing, article downloads, and citations: randomised  
> controlled trial
> Philip M Davis, Bruce V Lewenstein, Daniel H Simon, James G Booth,  
> Mathew J L Connolly
> BMJ 2008;337:a568
> Published 31 July 2008, doi:10.1136/bmj.a568

Overview (by SH):

Davis et al's study was designed to test whether the "Open Access (OA)  
Advantage" (i.e., more citations to OA articles than to non-OA  
articles in the same journal and year) is an artifact of a "self- 
selection bias" (i.e., better authors are more likely to self-archive  
or better articles are more likely to be self-archived by their  

The control for self-selection bias was to select randomly which  
articles were made OA, rather than having the author choose. The  
result was that a year after publication the OA articles were not  
cited significantly more than the non-OA articles (although they were  
downloaded more).

The authors write:
"To control for self selection we carried out a randomised controlled  
experiment in which articles from a journal publisher’s websites were  
assigned to open access status or subscription access only"
The authors conclude:
"No evidence was found of a citation advantage for open access  
articles in the first year after publication. The citation advantage  
from open access reported widely in the literature may be an artefact  
of other causes."

To show that the OA advantage is an artefact of self-selection bias  
(or any other factor), you first have to produce the OA advantage and  
then show that it is eliminated by eliminating self-selection bias (or  
any other artefact).

This is not what Davis et al did. They simply showed that they could  
detect no OA advantage one year after publication in their sample.  
This is not surprising, since most other studies don't detect an OA  
advantage one year after publication either. It is too early.

To draw any conclusions at all from such a 1-year study, the authors  
would have had to do acontrol condition, in which they managed to find  
a sufficient number of self-selected self-archived OA articles (from  
the same journals, for the same year) that do show the OA advantage,  
whereas their randomized OA articles do not. In the absence of that  
control condition, the finding that no OA advantage is detected in the  
first year for this particular sample of journals and articles is  
completely uninformative.

The authors did find a download advantage within the first year, as  
other studies have found. This early download advantage for OA  
articles has also been found to be correlated with a citation  
advantage 18 months or more later. The authors try to argue that this  
correlation would not hold in their case, but they give no evidence  
(because they hurried to publish their study, originally intended to  
run four years, three years too early.)

(1) The Davis study was originally proposed (in December 2006) as  
intended to cover 4 years:
Davis, PN (2006) Randomized controlled study of OA publishing (see  
It has instead been released after a year.

(2) The Open Access (OA) Advantage (i.e., significantly more citations  
for OA articles, always comparing OA and non-OA articles in the same  
journal and year) has been reported in all fields tested so far, for  
Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross- 
Disciplinary Comparison of the Growth of Open Access and How it  
Increases Research Citation Impact. IEEE Data Engineering Bulletin  
28(4) pp. 39-47.
(3) There is always the logical possibility that the OA advantage is  
not a causal one, but merely an effect of self-selection: The better  
authors may be more likely to self-archive their articles and/or the  
better articles may be more likely to be self-archived; those better  
articles would be the ones that get more cited anyway.

(4) So it is a very good idea to try to control methodologically for  
this self-selection bias: The way to control it is exactly as Davis et  
al have done, which is to select articles at random for being made OA,  
rather than having the authors self-select.

(5) Then, if it turns out that the citation advantage for randomized  
OA articles is significantly smaller than the citation advantage for  
self-selected-OA articles, then the hypothesis that the OA advantage  
is all or mostly just a self-selection bias is supported.

(6) But that is not at all what Davis et al. did.

(7) All Davis et al did was to find that their randomized OA articles  
had significantly higher downloads than non-OA articles, but no  
significant difference in citations.

(8) This was based on the first year after publication, when most of  
the prior studies on the OA advantage likewise find no significant OA  
advantage, because it is simply too early: the early results are too  
noisy! The OA advantage shows up in later years (1-4).

(9) If Davis et al had been more self-critical, seeking to test and  
perhaps falsify their own hypothesis, rather than just to confirm it,  
they would have done the obvious control study, which is to test  
whether articles that were made OA through self-selected self- 
archiving by their authors (in the very same year, in the very same  
journals) show an OA advantage in that same interval. For if they do  
not, then of course the interval was too short, the results were  
released prematurely, and the study so far shows nothing at all: It is  
not until you have actually demonstrated an OA advantage that you can  
estimate how much of that might due to a self-selection artefact!

(10) The study shows almost nothing at all, but not quite nothing,  
because one would expect (based on our own previous study, which  
showed that early downloads, at 6 months, predict enhanced citations a  
year and a half or later) that Davis's increased downloads too would  
translate into increased citations, once given enough time.
Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics  
as Predictors of Later Citation Impact. Journal of the American  
Association for Information Science and Technology (JASIST) 57(8) pp.  
(11) The findings of Michael Kurtz and collaborators are also relevant  
in this regard. They looked only at astrophysics, which is special, in  
that (a) it is a field with only about a dozen journals, and every  
research astronomer has subscription access to them -- and these days  
also free online access via ADS -- and (b) it is a field in which most  
authors self-archive their preprints very early in arxiv -- much  
earlier than the date of publication.
Kurtz, M. J. and Henneken, E. A. (2007) Open Access does not increase  
citations for research articles from The Astrophysical Journal.  
Preprint deposited in arXiv September 6, 2007.
(12) Kurtz & Henneken found the usual self-archiving advantage in  
astrophysics (i.e., about twice as many citations for OA papers than  
non-OA) but when they analyzed its cause, they found that most of the  
cause was the Early Advantage of access to the preprint, as much as a  
year before publication of the (OA) postprint. In addition, they found  
a self-selection bias (for preprints -- which is all that were  
involved here, because, as noted, as of publication, everything is  
OA): The better articles by the better authors were more likely to  
have been self-archived as preprints.

(13) Kurtz's results do not generalize to all fields, because it is  
not true in other fields either that (a) they already have 100% OA for  
their published postprints, nor that (b) many authors tend to self- 
archive preprints before publication.

(14) However, the fact that early preprint self-archiving (in a field  
that is 100% OA as of postprint publication) is sufficient to double  
citations is very likely to translate into a similar effect, in a non- 
OA field, if one reckons on the basis of the one-year access embargo  
that many publishers are imposing on the postprint. (The yearlong "No- 
Embargo" advantage in other fields might not turn out to be so big as  
to double citations, as with the preprint Early Advantage in  
astrophysics, because at least there is some subscription access to  
the postprint, but the counterpart of the Early Advantage for the  
postprint is likely to be there too.)

(15) Moreover, the preprint OA advantage is primarily Early Advantage,  
and only secondarily Self-Selection.

(16) The size of the postprint self-selection bias would have been  
what Davis and al tested -- if they had done the proper control, and  
waited long enough to get an actual OA effect to compare against.

(17) We had reported in a pilot study that there was no statistically  
significant difference between the size of the OA advantage for  
mandated and unmandated self-archiving:
Hajjem, C & Harnad, S. (2007) The Open Access Citation Advantage:  
Quality Advantage Or Quality Bias?Preprint deposited in arXiv January  
22, 2007.
(18) We will soon be reporting the results of a 4-year study on the OA  
advantage in mandated and unmandated self-archiving that confirms  
these earlier findings: Mandated self-archiving is like Davis et al's  
randomized OA, and it does not reduce the OA advantage at all -- once  
enough time has elapsed for there to be an OA Advantage at all.

Stevan Harnad
American Scientist Open Access Forum

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the SIGMETRICS mailing list