Does the arXiv lead to higher citations and reduced publisher downloads?

Philip Meir Davis pmd8 at CORNELL.EDU
Tue Mar 14 20:31:43 EST 2006

The paper is now available.  Please see the section where we address the
three postulates (Open Access, Early View, and Self-Selection).  Of the
three, Self-Selection was clearly the strongest explanation.  If Open
Access is partially at work, it appears only to affect the highly-cited
articles.  Early-View really could not be supported by the data.

> Adminstrative info for SIGMETRICS (for example unsubscribe):
> On Tue, 14 Mar 2006, Phil Davis wrote:
>> Liblicense, While our study confirms the same citation advantage
>> reported by others, it does not attribute Open Access as the
>> cause of more citations, but to Self-Selection. Open Access
>> therefore may be a result, not a cause, of authors promoting
>> higher-quality work.
>> Does the arXiv lead to higher citations and reduced publisher downloads
>> for
>> mathematics articles?
>> Authors: Philip M. Davis, Michael J. Fromerth
>> Date: March 14, 2006
> The full text of Phil Davis's paper is not yet accessible, so I can only
> respond to the abstract.
> There are many plausible components of the OA advantage, of which
> self-selection (Quality Bias: QB) is certainly one -- but not the only
> one, and unlikely to be the principle one, except under a few special
> conditions. QB is a temporary phenomenon, obviously, disappearing
> completely at 100% OA. Same is true for the Competitive Advantage (CA) of
> (comparable) OA papers over non-OA papers in the same journal issue,
> as well as the Arxiv Advantage (the advantage of appearing jointly
> in a central, widely consulted repository).
> Once 100% OA is reached, QB, CA and AA all vanish. (AA vanishes because
> of OAI interoperability and central harvesting services.)
> But there are three other components that remain even at 100% OA:
> Early Access Advantage (EA): The permanent citation boost from earlier
> access
> Quality Advantage (QA): The permanent advantage of quality once the
>     playing field has been levelled and affordability/accessibility no
>     longer biases what is and is not accessible
> Usage Advantage (UA): Average downloads for OA articles are at least
>     double those of non-OA articles
>     OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA
>> An analysis of 2,765 articles published in four math journals
>> from 1997-2005 indicated that articles deposited in the arXiv
>> received 35% more citations on average than non-deposited
>> articles (an advantage of about 1.1 citations per article), and
>> this difference was most pronounced for highly-cited articles.
>> The most plausible explanation was not the Open Access or Early
>> View postulates, but Self-Selection, which has led to higher
>> quality articles being deposited in the arXiv.
> Without seeing the full text one cannot be sure of how this was
> ascertained, but let us assume that it was by correlation (looking
> at the author's track record, and their comparable non-OA articles, to
> show that there is a strong correlation between prior author/article
> citation rates and probability of later self-archiving).
> There is no doubt at all that this is a causal factor, and indeed it is
> the example set by the high-quality authors that helps encourage other
> authors to self-archive.
> But the only systematic way to show that QB is the *only* component of
> the OA advantage, or the biggest one, is to test it at all levels of
> self-archiving, from 1% to 99%. Obviously a citation advantage that
> persists even as a larger and larger proportion of the research in the
> field becomes OA is less and less likely to be due to the fact that the
> best author/articles are the ones being self-archived.
> And it also has to be tested for articles at all citation levels (i.e.,
> for comparable low, medium, and high-citation articles). The OA
> advantage is bigger at the higher citation levels, to be sure, but if it
> is even present at the lower ones, that already shows that QB is
> unlikely to be the only factor.
> As to estimating the relative size of the causal contributions of each
> of the 6 factors -- this will require a more fine-grained analysis,
> taking into account not only %OA, citation level, and article age, but
> also article deposit date. Equating average citation levels for the
> authors and for the specialty domain will be necessary in the
> comparisons, and a lot of journals will need to be sampled, in diverse
> fields, to make sure patterns are not specialty-specific.
>> Yet in spite of
>> their citation advantage, arXiv-deposited articles received 23%
>> fewer downloads from the publisher's website (about 10 fewer
>> downloads per article) in all but the most recent two years after
>> publication. The data suggest that arXiv and the publisher's
>> website may be fulfilling distinct functional needs of the
>> reader.
> That sounds like the Arxiv Advantage (AA) expressed in the downloads
> (UA).
> Apart from total citation counts and downloads, other interesting
> variables to look at (and compare for OA effects) include: citation
> latency, citation longevity and other temporal measures; same for
> downloads; also authority impact (similar to google's PageRank:
> citations by higher-cited citers count for more), inbreeding/outbreeding
> coefficients, co-citations, and semantic correlations.
> Stevan Harnad
> Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year
> Cross-Disciplinary Comparison of the Growth of Open Access and How it
> Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4)
> pp. 39-47.

More information about the SIGMETRICS mailing list