Open Access: sample size, generalizability and self-selection

Thu Dec 23 01:01:29 EST 2010

The population sample sized is calculated based on the error 
level desirable. THis has not been mentioned by the pro and against 
davis's  sample size

chudamani

On Tue, 21 Dec 2010, Stevan Harnad wrote:

> Adminstrative info for SIGMETRICS (for example unsubscribe):
> http://web.utk.edu/~gwhitney/sigmetrics.html
>
> On 2010-12-21, at 4:08 PM, Jesper Wiborg Schneider wrote:
>
>> That "the mean log citation differences ... are significantly greater than zero" is trivial information, especially when the studies are high powered.
>
> The point of Yassine Gargouri's analysis and posting was to show that finding no significant difference is not improbable when the tests are low-powered (Davis 2010a, 2010b).
>
>> The important and difficult question in this debate concerning a potential OA citation advantage is the magnitude of the citation differences - when do we have an advantage? at 1,5, 10 citations?  This questions has not been discussed at length as it should be, instead we hope p values will do the job for us. Importance (advantaged) cannot be determined by the dichotomous decision making inherent in mindless significance testing.
>
> Yes, the magnitude has been tested and discussed. The Gargouri et al (2010) article shows that the size of the OA Advantage for mandated OA is just as big as for self-selected OA. This contradicts the hypothesis that the OA Advantage is just an artifact of a self-selection bias. Mandatory OA is not self-selected OA.
>
>> And while we are at it, remember that p values are conditional probabilities of data (or more extreme data) GIVEN the (exact) truth of H0, randomness, the actual sample size, and assumptions concerning test statistics. Several of these assumption are often ignored in many studies leaving significances tests meaningless. One of these is randomness (random sampling and/or random assignment). Without randomness significance test are really meaningless as probability theory breaks down rendering p values inaccurate.
>
> It is not clear how these general methodological remarks about randomness and significance testing pertain to the result at hand, which was that the OA Advantage is just as big when it is mandated  (i.e. imposed) as it is when it is self-selected. Hence the advantage is not a result of self-selection.
>
>> This is why Davis et al's study should be considered the most appropriate in relation to OA citation advantages, as they adhere to this assumption through random assignment.
>
> What is being tested is whether or not the many times replicated OA Advantage is an artifact of self-selection.
>
> Any method that controls for self-selection can be a valid test of the self-selection artifact hypothesis. Randomizing OA is not the only way to eliminate self-selection: requiring it is another way.
>
> And when an institution requires all of its research output to be made OA (and there is still an OA advantage), the only way to save the self-selection hypothesis is to argue either (1) that the self-selective self-archiving bias is now a self-selective mandate-noncompliance bias, with the artifact being the result of preferentially withholding the worse articles rather than preferentially self-archiving the better articles (which becomes increasingly far-fetched as mandate compliance rates approach 100%) or (2) that the self-selective author self-archiving bias is now a self-selective institutional mandate-adoption bias (with the institutions with the better research output being the ones that adopt the OA mandates (which is unlikely given that the earliest mandating institutions -- and two of the four used in our sample -- were not Harvard and MIT but Queensland University of Technology and University of Minho -- and the outcome was the same when the other two institutio!
 ns!
> , CERN and University of Southampton, were removed from the calculation).
>
> Note that Yassine Gargouri's posting was only addressing the question of the likelihood of finding a null effect given the sample size and test power. The much more fundamental flaw of Davis's results is the complete absence of a control for self-selected self-archiving. Unless it is shown, with the same sample and power, that with self-selection the usual OA advantage is detected, and with randomization it is eliminated, all we have is a non-replication of the OA advantage; the randomization does not even enter into it as a factor, until a self-selection control is performed, and it detects the usual OA advantage (which the study is designed to show to be the result of a self-selection artifact).
>
> Stevan Harnad
>
> Davis. P. (2010a) Does Open Access Lead to Increased Readership and Citations? A Randomized Controlled Trial of Articles Published in APS Journals
> The Physiologist, 53 (6) http://www.the-aps.org/publications/tphys/2010html/December/open_access.htm
>
> Davis. P. (2010b) Access, Readership, Citations: A Randomized Controlled Trial Of Scientific Journal Publishing
> eCommons at Cornell http://ecommons.cornell.edu/handle/1813/17788
>
> Gargouri, Y., Hajjem, C., Lariviere, V., Gingras, Y., Brody, T., Carr, L. and Harnad, S. (2010) Self-Selected or Mandated, Open Access Increases Citation Impact for Higher Quality Research. PLOS ONE. http://eprints.ecs.soton.ac.uk/18493/
>
> Harnad, S., Correlation, Causation, and the Weight of Evidence, Open Access Archivangelism. http://openaccess.eprints.org/index.php?/archives/772guid.html
>
>>
>>
>> Kind regards - Jesper Schneider
>>
>> **********************************************
>> Jesper Wiborg Schneider, PhD, Associate Professor
>> Royal School of Library & Information Science, Denmark
>> Fredrik Bajers Vej 7K, 9220 Aalborg East
>> Phone +45 98773041, Fax +45 98151042
>> E-mail: jws at iva.dk
>> Homepage: http://www.iva.dk/jws
>> **********************************************
>>
>>
>> -----Original Message-----
>> From: ASIS&T Special Interest Group on Metrics [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Yassine Gargouri
>> Sent: 21. december 2010 19:06
>> To: SIGMETRICS at LISTSERV.UTK.EDU
>> Subject: Re: [SIGMETRICS] Open Access: sample size, generalizability and self-selection
>>
>> Adminstrative info for SIGMETRICS (for example unsubscribe):
>> http://web.utk.edu/~gwhitney/sigmetrics.html
>>
>> Our previous sample comparing self-selective self-archiving with mandatory self-archiving (27,197 articles from the publication interval 2002 to 2006 -
>> 6,215 mandated and 20,982 nonmandated) has now been extended to 63,518 articles (13,425 mandated and 50,093 nonmandated) published between 2002 and
>> 2009 in 5,992 journals. For all OA vs Non-OA (O/Ø) comparisons, regardless of whether the OA was Self-Selected (S) or Mandated (M), the mean log citation differences (after adding a constant value 1 to all citations in order to include uncited papers) are significantly greater than zero (based on correlated-sample 2-tailed t-tests for within-journal differences (p = 0.05).
>>
>> The t-tests applied on the 7 post hoc differences showed in this table, averaged across 2004-2009 (because mandates began to be adopted in 2004) have a statistical power of about 100% (except for the last difference OM vs OS, which is only 11%, and hence we discounted it in our interpretation).
>>
>> Based on the same means, standard deviations and correlation coefficients as for the first pair of comparisons (O vs Ø) of 3,578 journals, the a priori estimate of statistical power shrinks to 23% when the sample of journals is reduced to 36 (as in Davis's sample). A minimum sample size of 183 journals is required to get a significant effect.
>>
>> Davis's study seems to have calculated the minimum sample size needed in order to reach a relative statistical power of 80% in terms of the number of articles within each journal, but not in terms of the number of journals (36).
>>
>> It follows that a failure to replicate the OA citation advantage with such a sample size would not be improbable.
>
>
-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.