Open Access: sample size, generalizability and self-selection

Tue Dec 21 16:08:12 EST 2010

Dear Yassine

That "the mean log citation differences ... are significantly greater than zero" is trivial information, especially when the studies are high powered. 

The important and difficult question in this debate concerning a potential OA citation advantage is the magnitude of the citation differences - when do we have an advantage? at 1,5, 10 citations?  This questions has not been discussed at length as it should be, instead we hope p values will do the job for us. Importance (advantaged) cannot be determined by the dichotomous decision making inherent in mindless significance testing.

And while we are at it, remember that p values are conditional probabilities of data (or more extreme data) GIVEN the (exact) truth of H0, randomness, the actual sample size, and assumptions concerning test statistics. Several of these assumption are often ignored in many studies leaving significances tests meaningless. One of these is randomness (random sampling and/or random assignment). Without randomness significance test are really meaningless as probability theory breaks down rendering p values inaccurate.

This is why Davis et al's study should be considered the most appropriate in relation to OA citation advantages, as they adhere to this assumption through random assignment.  

Kind regards - Jesper Schneider

**********************************************
Jesper Wiborg Schneider, PhD, Associate Professor 
Royal School of Library & Information Science, Denmark 
Fredrik Bajers Vej 7K, 9220 Aalborg East 
Phone +45 98773041, Fax +45 98151042
E-mail: jws at iva.dk
Homepage: http://www.iva.dk/jws
**********************************************

-----Original Message-----
From: ASIS&T Special Interest Group on Metrics [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Yassine Gargouri
Sent: 21. december 2010 19:06
To: SIGMETRICS at LISTSERV.UTK.EDU
Subject: Re: [SIGMETRICS] Open Access: sample size, generalizability and self-selection

Our previous sample comparing self-selective self-archiving with mandatory self-archiving (27,197 articles from the publication interval 2002 to 2006 -
6,215 mandated and 20,982 nonmandated) has now been extended to 63,518 articles (13,425 mandated and 50,093 nonmandated) published between 2002 and
2009 in 5,992 journals. For all OA vs Non-OA (O/Ø) comparisons, regardless of whether the OA was Self-Selected (S) or Mandated (M), the mean log citation differences (after adding a constant value 1 to all citations in order to include uncited papers) are significantly greater than zero (based on correlated-sample 2-tailed t-tests for within-journal differences (p = 0.05).

The t-tests applied on the 7 post hoc differences showed in this table, averaged across 2004-2009 (because mandates began to be adopted in 2004) have a statistical power of about 100% (except for the last difference OM vs OS, which is only 11%, and hence we discounted it in our interpretation).

Based on the same means, standard deviations and correlation coefficients as for the first pair of comparisons (O vs Ø) of 3,578 journals, the a priori estimate of statistical power shrinks to 23% when the sample of journals is reduced to 36 (as in Davis's sample). A minimum sample size of 183 journals is required to get a significant effect. 

Davis's study seems to have calculated the minimum sample size needed in order to reach a relative statistical power of 80% in terms of the number of articles within each journal, but not in terms of the number of journals (36).

It follows that a failure to replicate the OA citation advantage with such a sample size would not be improbable.