Does the arXiv lead to higher citations and reduced publisher downloads?

Wed Mar 15 11:12:54 EST 2006

On Wed, 15 Mar 2006, Phil Davis wrote:

> Thanks for your thoughtful response.  In the absence of a controlled
> experiment, the best one can do is 1) confirm covariance;

Agreed.

> 2) confirm temporal order (the cause precedes the effect);

Not yet methodologically feasible, alas.

> 3) confirm a theoretical basis for the phenomena;

There are several plausible theories for an OA Advantage (OAA), compatible
with the evidence to date. They don't all agree, but, as I said, there
are likely to be multiple causal factors. (Future research will have to
measure both their causality and their relative size.)

> and 4) systematically rule out all other explanatory causes.

This too certainly has not yet been done by anyone.

> I've been able to do all of these except 2) since our data is cumulative
> in nature.

Not just not (2), but also not (4), especially in light of the fact that
more than one explanation is compatible with your findings (3)...

> Michael Kurtz however was able to work with
> temporal data in his study of astrophysics journals, and was unable to
> confirm the Open Access Postulate.

I know about and admire his work. But astro is anomalous in that it is
100% OA and became that way at one fell swoop (via ADS). No intermediate
temporal stages in which 25% then 50% then 75% of astro became OA,
hence no way to test the Competitive Advantage (CA). Mike cites
three causal factors: QB (author self-selection Quality Bias) and EA
(strong Early Access effects, which you failed to find in your maths
data) and UA (Usage Advantage: doubled downloads -- you did not report
comparative totals, but only compared Arxiv to publisher downloads).

Mike found no residual increased citations overall owing to 100% OA
(in fact, somewhat fewer!) and -- I think correctly -- interpreted this
as being because when *everything* is OA -- 100% OA, a level playing
field -- then authors do not cite *more*: they cite more *selectively*,
based on importance and relevance, rather than unconsciously biassed, as
in most other fields, by affordability/accessibility constraints. That
might shift the citations around (from a less relevant, accessible
article that I might have cited before I had access to all articles to
a more relevant article that I can now access) but it does not increase
the total number of citations per article I write (in fact, it decreases
it slightly). (This is one manifestation of what I called QA, the Quality
Advantage, though a very hard one to pin down causally from the kinds of
data available: It almost requires an author study on citation strategies
before and after. I would say your high-end correlation between
downloads and citations is indirect evidence in support of it.)

What the astro data miss are the effects of CA, the Competitive Advantage
of OA vs. non-OA articles within the same journal and issue, and similar
comparisons of like with like (e.g., same citation band) because astro has
the boon and bain, as noted, of having gone 100% OA in one swoop. It
hence also lacks a means of testing the other form of QA, in which the
better articles get selectively advantaged if they are OA compared to
when they are not OA. (Correlations, citation bands, and other ways of
equating comparable content will need to be used to test this in fields
that are not yet 100% OA, hence allowing a basis for comparison!)

I am still betting most of my money on CA, QA, and EA (as well as UA)
rather than just the QB (and AA) that your article stresses. Also in
the download/citation correlation.

> I don't disagree at all that there may be multiple causes working
> simultaneously, and demonstrating interaction effects.  I also don't
> disagree that the causes may vary for different fields of study.  I am
> however, troubled by individuals who make universal and unqualified
> statements like, "Open Access increases citations by 50-250%!"

I am such an individual, and I hold by that statement (though certainly
not because I mean to trouble you!)

> The more
> precise answer is much more subtle, but I understand that a statement like,
> "open access may provide some citation benefit, but only for prestigious
> authors who publish in prestigious journals and whose article is already
> highly-cited", doesn't sound as convincing to administrators and policy
> makers.

Not just that, but I don't think it is a correct statement of the
thrust of the current body of findings on the OAA. It is merely your
interpretation of the result of your own study in 4 maths journals!

Stevan

> M. J. KURTZ, G. EICHHORN, A. ACCOMAZZI, C. GRANT, M. DEMLEITNER, E.
> HENNEKEN, S. S. MURRAY, The effect of use and access on citations,
> Information Processing and Management, 41 (2005) 1395-1402. Available:
> http://arxiv.org/abs/cs.DL/0503029
>
> >I think your results are very interesting, but I don't think they
> >have shown that the OA citation advantage (OAA)  is all or mostly a
> >self-selection Quality Bias (QB) correlate, rather than being causal.
> >It is still quite plausible that the OAA is a genuine causal factor,
> >but that it has a bigger effect on the high quality/citation end.
> >That could be a fuzzy threshold effect. And at the low/zero end there
> >could be a lot of articles that are just so weak that they're not
> >going to be cited even if you ram them down people's throats! In
> >other words, what I've called "QA" (Quality Advantage) rather than QB
> >(self-selected Quality Bias) could very well still be the true causal
> >factor: Self-Archiving gives the *better* articles a boost -- not an
> >equal linear boost to all articles!
> >
> >At any rate, the jury is definitely still out on the causal
> >components of OAA. I am still pretty convinced intuitively and
> >logically not only that it's causal, but that it's the biggest of the
> >causal factors, though I'm quite ready to believe the effect is
> >stronger for the better articles.
> >
> >Some of the differences in the reported findings may well also be
> >field differences maths vs astro vs physics vs bio and, perhaps even
> >more importantly, differences arising from differences in overall %
> >OA, by field. (Surely self-selection is a less plausible component of
> >an OAA in a field that is 95% OA than in a field that is 5% OA.) But
> >to sort these out we need much bigger Ns tested across many different
> >fields, with different baseline %OA, and looked at within year and
> >within citation range.
> >
> >For the apparent absence of Mike Kurtz's Early Access effect, this
> >*might* be an astro/math difference, or a 100%OA/30%OA difference.
> >Same for the finding of a much smaller download/citation correlation.
> >
> >Stevan
> >
> >On 14-Mar-06, at 8:31 PM, Philip Meir Davis wrote:
> >
> >>The paper is now available.  Please see the section where we
> >>address the
> >>three postulates (Open Access, Early View, and Self-Selection).  Of
> >>the
> >>three, Self-Selection was clearly the strongest explanation.  If Open
> >>Access is partially at work, it appears only to affect the highly-
> >>cited
> >>articles.  Early-View really could not be supported by the data.
> >>--Phil
> >>>
> >>>On Tue, 14 Mar 2006, Phil Davis wrote:
> >>>
> >>>>Liblicense, While our study confirms the same citation advantage
> >>>>reported by others, it does not attribute Open Access as the
> >>>>cause of more citations, but to Self-Selection. Open Access
> >>>>therefore may be a result, not a cause, of authors promoting
> >>>>higher-quality work.
> >>>>
> >>>>Does the arXiv lead to higher citations and reduced publisher
> >>>>downloads
> >>>>for
> >>>>mathematics articles?
> >>>>Authors: Philip M. Davis, Michael J. Fromerth
> >>>>Date: March 14, 2006
> >>>>http://arxiv.org/abs/cs.DL/0603056
> >>>
> >>>The full text of Phil Davis's paper is not yet accessible, so I
> >>>can only
> >>>respond to the abstract.
> >>>
> >>>There are many plausible components of the OA advantage, of which
> >>>self-selection (Quality Bias: QB) is certainly one -- but not the
> >>>only
> >>>one, and unlikely to be the principle one, except under a few special
> >>>conditions. QB is a temporary phenomenon, obviously, disappearing
> >>>completely at 100% OA. Same is true for the Competitive Advantage
> >>>(CA) of
> >>>(comparable) OA papers over non-OA papers in the same journal issue,
> >>>as well as the Arxiv Advantage (the advantage of appearing jointly
> >>>in a central, widely consulted repository).
> >>>
> >>>Once 100% OA is reached, QB, CA and AA all vanish. (AA vanishes
> >>>because
> >>>of OAI interoperability and central harvesting services.)
> >>>
> >>>But there are three other components that remain even at 100% OA:
> >>>
> >>>Early Access Advantage (EA): The permanent citation boost from
> >>>earlier
> >>>access
> >>>Quality Advantage (QA): The permanent advantage of quality once the
> >>>     playing field has been levelled and affordability/
> >>>accessibility no
> >>>     longer biases what is and is not accessible
> >>>Usage Advantage (UA): Average downloads for OA articles are at least
> >>>     double those of non-OA articles
> >>>
> >>>     OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA
> >>>     http://eprints.ecs.soton.ac.uk/12085/
> >>>
> >>>>An analysis of 2,765 articles published in four math journals
> >>>>from 1997-2005 indicated that articles deposited in the arXiv
> >>>>received 35% more citations on average than non-deposited
> >>>>articles (an advantage of about 1.1 citations per article), and
> >>>>this difference was most pronounced for highly-cited articles.
> >>>>The most plausible explanation was not the Open Access or Early
> >>>>View postulates, but Self-Selection, which has led to higher
> >>>>quality articles being deposited in the arXiv.
> >>>
> >>>Without seeing the full text one cannot be sure of how this was
> >>>ascertained, but let us assume that it was by correlation (looking
> >>>at the author's track record, and their comparable non-OA
> >>>articles, to
> >>>show that there is a strong correlation between prior author/article
> >>>citation rates and probability of later self-archiving).
> >>>
> >>>There is no doubt at all that this is a causal factor, and indeed
> >>>it is
> >>>the example set by the high-quality authors that helps encourage
> >>>other
> >>>authors to self-archive.
> >>>
> >>>But the only systematic way to show that QB is the *only*
> >>>component of
> >>>the OA advantage, or the biggest one, is to test it at all levels of
> >>>self-archiving, from 1% to 99%. Obviously a citation advantage that
> >>>persists even as a larger and larger proportion of the research in
> >>>the
> >>>field becomes OA is less and less likely to be due to the fact
> >>>that the
> >>>best author/articles are the ones being self-archived.
> >>>
> >>>And it also has to be tested for articles at all citation levels
> >>>(i.e.,
> >>>for comparable low, medium, and high-citation articles). The OA
> >>>advantage is bigger at the higher citation levels, to be sure, but
> >>>if it
> >>>is even present at the lower ones, that already shows that QB is
> >>>unlikely to be the only factor.
> >>>
> >>>As to estimating the relative size of the causal contributions of
> >>>each
> >>>of the 6 factors -- this will require a more fine-grained analysis,
> >>>taking into account not only %OA, citation level, and article age,
> >>>but
> >>>also article deposit date. Equating average citation levels for the
> >>>authors and for the specialty domain will be necessary in the
> >>>comparisons, and a lot of journals will need to be sampled, in
> >>>diverse
> >>>fields, to make sure patterns are not specialty-specific.
> >>>
> >>>>Yet in spite of
> >>>>their citation advantage, arXiv-deposited articles received 23%
> >>>>fewer downloads from the publisher's website (about 10 fewer
> >>>>downloads per article) in all but the most recent two years after
> >>>>publication. The data suggest that arXiv and the publisher's
> >>>>website may be fulfilling distinct functional needs of the
> >>>>reader.
> >>>
> >>>That sounds like the Arxiv Advantage (AA) expressed in the downloads
> >>>(UA).
> >>>
> >>>Apart from total citation counts and downloads, other interesting
> >>>variables to look at (and compare for OA effects) include: citation
> >>>latency, citation longevity and other temporal measures; same for
> >>>downloads; also authority impact (similar to google's PageRank:
> >>>citations by higher-cited citers count for more), inbreeding/
> >>>outbreeding
> >>>coefficients, co-citations, and semantic correlations.
> >>>
> >>>Stevan Harnad
> >>>
> >>>Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year
> >>>Cross-Disciplinary Comparison of the Growth of Open Access and How it
> >>>Increases Research Citation Impact. IEEE Data Engineering Bulletin
> >>>28(4)
> >>>pp. 39-47.
> >>>http://eprints.ecs.soton.ac.uk/11688/
>