Critique of EPS/RIN/RCUK/DTI "Evidence-Based Analysis of Data Concerning Scholarly Journal Publishing"

Sat Oct 14 09:03:18 EDT 2006

Dear Michael,

Thanks for your (as always) very interesting and informative data! They
show that:

(1) In astronomy, where all active, publishing researchers already have
online access to all relevant journal articles (a very special case!),
researchers all use the versions "eprinted" (self-archived) in Arxiv
first, because those are available first; and they all switch to using
the journal version, instead of the self-archived one, as soon as the
journal version is available.

That is interesting, but hardly surprising, in view of the very special
conditions of astronomy: If I only had access to a self-archived
preprint or postprint first, I'd used that, faute de mieux. And as soon
as the official journal version was accessible -- assuming that it's
equally accessible -- I'd use that.

But these conditions -- (i) open accessibility of the eprint before
publication, (ii) in one longstanding central repository (Arxiv),
for many and in some cases most papers, and (iii) open accessibility
of the journal version of all papers upon publication -- is simply not
representative of most other fields! In most other fields, (i') only
about 15% of papers are available early as preprints or postprints,
(ii') they are self-archived in distributed IRs and websites, not one
central one (Arxiv), and (iii') the journal versions of many papers are
not accessible at all to many of the researchers after publication.

That's a very different ball game.

(2) Your data showing that astronomy journals are not cancelled despite
100% OA are very interesting, but they too follow almost tautologically
from (1): If virtually all researchers have access to the journal version,
and virtually all of them prefer to use that rather than the eprint,
it stands to reason that it is not being cancelled! (What is cause and
what is effect there is another question -- i.e., whether preference is
driving subscriptions or subscriptions are driving preference.)

(3) In astronomy, there is a small, closed circle of core journals,
and all active researchers worldwide already have access. In many
fields there is not a closed circle of core journals, and/or not all
researchers have access. Hence access to a small set of core journals
is not a precondition for being an active researcher in many fields --
which does not mean that lacking that access does not weaken the research
(and that is the point!).

(4) I agree completely that there is a component of self-selection
Quality Bias (QB) in the correlation between self-archiving and
citations. The question is (4a) how much of the higher citation count
for self-archived articles is due to QA (as opposed to Early Advantage,
Competitive Advantage, Quality Advantage, Usage Advantage, and Arxiv
(Central) Bias)? And (4b) does self-selection QB itself have any
causal consequences (or are authors doing it purely superstitiously,
since it is has no causal effects at all)? The effects of course need
not be felt in citations; they could be felt in downloads (usage) or in
other measures of impact (co-citations, influence on research
direction, funding, fame, etc.).

The most important thing to bear in mind is that it would be absurd to
imagine that somehow OA guarantees a quality-blind linear increment to
the usage of any article, regardless of its quality. It is virtually
certain that OA will benefit the better articles more, because they are
more worth using and trying to build upon, hence more handicapped by
access-barriers (which *do* exist in fields other than astro). That's QA,
not QB. No amount of accessibility will help unciteable papers get used
and cited. And most papers are uncited, hence probably unciteable!

(5) I think we agree that the basic challenge in assessing causality
here is that we have a positive correlation (between proportion of papers
self-archived and citation-counts) but we need to analyze the direction
of the causation. The fact that higher citation-count papers tend to be
self-archived more, and lower citation-count papers less is merely a
restatement of the correlation, not a causal analysis of it: Their
citation counts come *after* the self-archiving, not before!

The only methodologically irreproachable way to test causality would be
to randomly choose a (sufficiently large, diverse, and representative)
sample of N papers at the time of acceptance for publication
(postprints -- no previous preprint self-archiving) and randomly
*impose* self-archiving on N/2 of them, and not on the other N/2. That
way we have random selection and not self-selection. Then we count
citations for about 2-3 years, for all the papers, and compare them.

No one will do that study, but an approximation to it can be done
(and we are doing it) by comparing (a) citation counts for papers that
are self-archived in IRs that have a self-archiving mandate with (b)
citation counts for papers in IRs without mandates and with (c) papers
(in the same journal and year) that are not self-archived.

Not a perfect method, problems with small Ns, short available
time-windows, and admixtures of self-selection and imposed self-archiving
even with mandates -- but an approximation nonetheless.  And other
metrics -- downloads, co-citations, hub/authority scores, endogamy
scores, growth-rates, funding, etc. -- can be used to triangulate and
disambiguate.

Stay tuned.

Now some comments:

On Tue, 10 Oct 2006, Michael Kurtz wrote:

> Dear Stevan and list,
> 
> Recently Stevan has copied me on two sets of correspondance concerning 
> the OA citation advantage; I thought I would just briefly respond to both.
> 
> Besides our IPM article: 
> http://adsabs.harvard.edu/abs/2005IPM....41.1395K we have recently 
> published two short papers, both with graphs you might find interesting.
> 
> The preprint will appear in Learned Publishing
> http://adsabs.harvard.edu/abs/2006cs........9126H E-prints and Journal 
> Articles in Astronomy: a Productive Co-existence
> 
> and this is in the J. Electronic Publishing
> http://adsabs.harvard.edu/abs/2006JEPub...9....2H Effect of E-printing 
> on Citation Rates in Astronomy and Physics
> 
> There is a point I would like to emphasize from these papers.  Figure 2 
> of the Learned Publishing paper shows that the number of ADS users who 
> read the preprint version once the paper has been released drops to near 
> zero.  This shows that essentially every astronomer has subscriptions to 
> the main journals, as ADS treats both the arXiv links and the links to 
> the journals equally; also it shows that astronomers prefer the journals.

And it also shows how anomalous Astronomy is, compared to other fields,
where it is certainly not true that every researcher has subscriptions
to the main journals...

> Figure 5 of the J Electronic Publishing paper also shows that there is 
> no effect of cost on the OA reads (and thus by extension citation) 
> differential.  Note in the plot that there is no change in slope for the 
> obsolescence function of the reads (either of preprinted or 
> non-preprinted) at 36 months.  At 36 months the 3 year moving wall 
> allows the papers to be accessed by everyone, this shows clearly that 
> there is no cost effect portion of the OA differential in astronomy.  
> This confirms the conclusion of my IPM article.

And it underscores again, how unrepresentative astronomy is of research as
a whole.

> Now three comments:
> 
> Citations are probably the least sensitive measure to see the effects of 
> OA.  This is because one must be able to read the core journals in order 
> to write a paper which will be published by them.  It is really not 
> possible for a person who has not been regularly reading journal 
> articles on, say, nuclear physics, to suddenly be able to write one, and 
> cite the OA articles which enabled that writing.  It takes some time for 
> a body of authors who did not previously have access to form and write 
> acceptable papers. 

In astronomy -- where the core journals are few and a closed circle, and
all active researchers have access to them. But this is not true of
research as a whole, across disciplines (or around the world).
Researchers in most fields are no doubt handicapped for having less than
full access, but that does not prevent them from doing and publishing
research altogether.

> Any statistical analysis of the causal/bias distinction must take into 
> account the actual distribution of citations among articles.  This is 
> why  I made the monte carlo analysis in the IPM paper.  As a quick 
> example for papers published in the Astrophysical Journal in 2003: The 
> most cited 10% have 39% of all citations, and are 96% in the arXiv; the 
> lowest cited 10% have 0.7% of all citations and are 29% in the arXiv.  
> Showing the causal hypothesis is true will be very difficult under these 
> conditions.

(1) Since all of the published postprints in all these journals
are accessible to all research-active astronomers as of their date of
publication, we are of necessity speaking here mostly about an Early
Access effect (preprints). Most of the other components of the Open Access
Advantage (Competitive Advantage, Usage Advantage, Quality Advantage)
are minimized here by the fact that everything in astronomy is OA from
the date of publication onward. The remaining components are either
Arxiv-specific (the Arxiv Bias -- the tradition of archiving and hence
searching in one central repository) or self-selection [Quality Bias]
influencing who does and does not self-archive *early*, with their
prepublication preprint.

Since most fields don't post preprints at all, this comparison is mostly
moot. For most fields, the question about citation advantage concerns
the postprint only, and as of the date of acceptance for publication,
not before.

(2) In other fields too, there is the same correlation between citation
counts and percentage self-archived, but it is based on postprints,
self-archived at publication, not pre-refereeing preprints self-archived
much earlier. And, most important, it is not true in these fields that
the postprint is accessible to all researchers via subscription: Many
potential users cannot access the article at all if it is not
self-archived -- and that is the main basis for the OA advantage.

> Perhaps the journal which is most sensitive to cancellations due to OA 
> archiving is Nuclear Physics B; it is 100% in arXiv, and is very 
> expensive.  I have several times seen librarians say that they would 
> like to cancel it.  One effect of OA on Nuclear Physics B is that its 
> impact factor (as we measure it, I assume ISI gets the same thing) has 
> gone up, just as we show in the J E Pub paper for Physical Review D.  
> Whether Nuclear Physics B has been cancelled more than Nuclear Physics A 
> or Physics Letters B must be well known at Elsevier.

It is an interesting question whether NPB is being cancelled, but if
it is, it clearly is not because of self-archiving, nor because of
astronomy's special "universal paid OA" OA to the published version: if
NPB is being cancelled, it is for the usual reason, which is that it is
not good enough to justify its share of the institution's journal budget.

Chrs, Stevan