Citation analysis of author-choice OA journals

Stevan Harnad harnad at ECS.SOTON.AC.UK
Mon Aug 25 23:57:58 EDT 2008



On 25-Aug-08, at 11:01 PM, Phil Davis wrote:

> Our study focuses on estimating the effect of author-choice open  
> access on article citations.  The 11 journals were selected because  
> they gathered sufficient paying open access submissions as to make a  
> statistical analysis even potentially possible.  Still, if the open  
> access effect is small, a larger sample size is required to detect a  
> signal amongst the noise, which is why I aggregated the 11 journals  
> for subsequent analyses.  PNAS contributed so many articles in the  
> aggregate dataset (about a third) that I didn’t want this one  
> journal to skew the results, hence the tables report the analyses  
> with and without PNAS.
>

Phil:

(1) Aggregating the journals was a good idea, to enhance the sample  
size.

(2) But that does not explain why self-archiving OA was not identified  
and counted in too, instead of crediting all unpaid articles to non-OA  
(which would of course reduce the size of the OA Advantage).

(3) How does the fact that the overall sample was small and the PNAS  
sample was large justify that the entire PNAS data-set was not  
analyzed? (I don't contest that it should be analyzed  (i) within the  
aggregate as well as (ii) separately, and that (iii) the rest should  
also be analyzed separately too, to avoid skewing, I just don't  
understand why the full analyses were not done and their results  
reported.)

> Secondly, while aggregating the journals resulted in increased  
> statistical power, we are combining articles published in  
> *different* scientific fields (biology, medicine, bioinformatics,  
> plant sciences, and multi-disciplinary sciences), which is why  
> journal impact factors are not used as an explanatory variable.   
> Please note that I did include the variable Journal as either a  
> random variable (Table 2) or a fixed variable (Table S2), so journal- 
> to-journal variation is being accounted for in the model
>

Yes, but to the extent that journal variance is included (across mixed  
fields), it is not at all clear why journal impact-factors should not  
be included too: After all, different fields may differ not only in  
their impact factors, but their number of authors, pages, references,  
Review articles, and US authors.

(In our multiple regression analyses, measuring many of the same  
parameters you did (article age, number of authors, references, etc.),  
journal impact factor was the second because predictor of citations,  
after age.)


> While I appreciate your routine post-acceptance advice on  
> methodological improvements, I encourage you to embark on similar  
> analyses that address your own personal research interests.  The  
> data are all public.


(My advice is available pre-acceptance too, if consulted. ;>) ).

Analyses addressing my personal research interests will be available  
shortly, fear not!

But the methodological points I made for both your studies -- the BMJ  
one and this one -- affect the interpretability of your results; they  
are not just matters of personal taste.

To put it another way: You cannot draw the conclusions you draw from  
your data and your analyses, unless you do some of the further  
analyses and controls I describe.

Cheers, Stevan


>
>
> Phil Davis
>
>
>
>
>
> Stevan Harnad wrote:
>>
>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html
>> Confirmation Bias and the Open Access Advantage:
>> Some Methodological Suggestions for Davis's Citation Study
>>
>> Stevan Harnad
>>
>> Full text: http://openaccess.eprints.org/index.php?/archives/451-guid.html
>>
>> SUMMARY: Davis (2008) -- http://arxiv.org/pdf/0808.2428v1 --  
>> analyzes citations from 2004-2007 in 11 biomedical journals. For  
>> 1,600 of the 11,000 articles (15%), their authors paid the  
>> publisher to make them Open Access (OA). The outcome, confirming  
>> previous studies (on both paid and unpaid OA), is a significant OA  
>> citation Advantage, but a small one (21%, 4% of it correlated with  
>> other article variables such as number of authors, references and  
>> pages). The author infers that the size of the OA advantage in this  
>> biomedical sample has been shrinking annually from 2004-2007, but  
>> the data suggest the opposite. In order to draw valid conclusions  
>> from these data, the following five further analyses are necessary:
>>     (1) The current analysis is based only on author-choice (paid)  
>> OA. Free OA self-archiving needs to be taken into account too, for  
>> the same journals and years, rather than being counted as non-OA,  
>> as in the current analysis.
>>     (2) The proportion of OA articles per journal per year needs to  
>> be reported and taken into account.
>>     (3) Estimates of journal and article quality and citability in  
>> the form of the Journal Impact Factor and the relation between the  
>> size of the OA Advantage and journal as well as article "citation- 
>> bracket" need to be taken into account.
>>     (4) The sample-size for the highest-impact, largest-sample  
>> journal analyzed, PNAS, is restricted and is excluded from some of  
>> the analyses. An analysis of the full PNAS dataset is needed, for  
>> the entire 2004-2007 period.
>>     (5) The analysis of the interaction between OA and time,  
>> 2004-2007, is based on retrospective data from a June 2008 total  
>> cumulative citation count. The analysis needs to be redone taking  
>> into account the dates of both the cited articles and the citing  
>> articles, otherwise article-age effects and any other real-time  
>> effects from 2004-2008 are confounded.
>> The author proposes that an author self-selection bias for  
>> providing OA to higher-quality articles (the Quality Bias, QB) is  
>> the primary cause of the observed OA Advantage, but this study does  
>> not test or show anything at all about the causal role of QB (or of  
>> any of the other potential causal factors, such as Accessibility  
>> Advantage, AA, Competitive Advantage, CA, Download Advantage, DA,  
>> Early Advantage, EA, and Quality Advantage, QA). The author also  
>> suggests that paid OA is not worth the cost, per extra citation.  
>> This is probably true, but with OA self-archiving, both the OA and  
>> the extra citations are free.
>>
>>
>>
>> Confirmation Bias and the Open Access Advantage: Some  
>> Methodological Suggestions for Davis's Citation Study
>> http://openaccess.eprints.org/index.php?/archives/451-guid.html
>
> -- 
> Philip M. Davis
> PhD Student
> Department of Communication
> 336 Kennedy Hall
> Cornell University, Ithaca, NY 14853
> email: pmd8 at cornell.edu
> phone: 607 255-4735
> https://confluence.cornell.edu/display/~pmd8/resume

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20080825/834201e2/attachment.html>


More information about the SIGMETRICS mailing list