Jesper Wiborg Schneider jws at CFA.AU.DK
Sat Apr 19 01:59:57 EDT 2014

Dear Lutz and Loet,

As I said this is not an issue of more references to me and certainly not more perfunctory ones like in this paper, they bring nothing.  Of course I should NOT get credit for "introducing" effect sizes or CIs into our field - this is ludicrous, but I was surprised to see that some of the important "challenges" I bring forward, especially in the JoI paper (extended recently in a Scientometrics paper, the existence I think Lutz is aware of) are basically ignored in this new work (or subsumed in a perfunctory reference) which I presume is written for our community and thus goes into "our literature" on this topic.

These issue cannot be brushed aside as "meta theoretical" and "repetitive".  Loet I certainly agree that the use of inferential statistics should be related to the research questions, or rather to the design and settings used to answer these questions, but you cannot - as you seem to want - remove the "meta theoretical" questions underlying their use.  It is simply absurd in a scholarly field to whish such things to go away.  They are there and you have to confront them, like or not.  There is more to statistical inferences than calculation.  Statistical inference is based on different theories and assumptions.  Both are needed for understanding, interpretation and knowledge claims depend upon them.  When it comes to uncertainty the issues are so problematic and unresolved that most people just put the "binoculars in front of the blind eye" and continue to use them and make claims based upon them which in many cases we need to consider as basically flawed or as Ioannidis has claimed "... most published research findings are false" - I guess such practice causes the "repetitiveness" you dislike.

Yes both of you responded to some of my criticisms in two brief letters, basically arguing for the use of statistical significance tests in the frequentists conception supported by effects sizes and CIs.  Fair enough, but what about the important issues of when to use them, how to interpret the results they produce etc.?  If you have read my papers you will notice that I (also) endorse the use of effect sizes and that I argue that CIs are superior to null hypothesis significance tests.  But you will also notice - contrary to their endorsement in the current paper by Lutz -that I am also highly critical of CIs and that I certainly do not see any point in using them as pseudo significance tests, especially since they can relive us from the problematic null hypothesis.  This is not just my personal opinions; they are legitimate claims which I think should be reflected upon.  Lutz and his coauthor write in the paper:

"CIs provide a feel for the precision of measures. Put another way, they show the range that the true value of the mean may plausibly fall in. For example, if the observed mean was 40, the 95% CI might range between 35 and 45. So, while 40 is our "best guess" as to what the mean truly is, values ranging between 35 and 45 are also plausible alternative values."

I am not sure how people will perceive this, but in my reading the definition is unclear on the basic facts about frequentist CIs - which also happen to be their inherent weaknesses - namely that you cannot determine whether the "true" value of the parameter lies within the one interval you happen to calculate - it does or it does not - and therefore you need the long-run experimental interpretation which, as I have argued, is seldom addressed (especially not its consequences) and is not per se something that gives meaning or work equally well in all fields and situations, especially not in observational studies in the social sciences (they may work in physics, biology or experimental psychology?).  I am afraid that the above definition could leave readers with the impression that what we have is a fixed interval with a certain probability (95%) of including the "true parameter value" and we therefore have a feeling about the 'uncertainty' - but this is not the case!

As I said I was surprised that in this new paper/chapter by Lutz, where the aim I guess is to teach colleagues some better practices, that none of the contested issues brought forward are reflected upon or at least pointed to so that readers are made aware of these issues (e.g., frequentist interpretation, logical issues, randomness or rather lack of, implausible null hypotheses).  As I see it, these issues are not "resolved" in your letters and I reacted because I thought that two perfunctory references, one to me and one to you, was an "understatement" of these unresolved problems in paper where it is relevant to mention them.  I do not argue that each and every question should be brought forward for discussion, but since CIs are endorsed, their assumptions and difficult interpretations could have been scrutinized more, but that is my opinion.  I can understand that you see this differently - fair enough and let's leave it there.

Kind regards Jesper


Jesper W. Schneider
Senior Researcher, PhD

Aarhus University
Business and Social Sciences
Danish Centre for Studies in Research & Research Policy,
Department of Political Science & Government

Bartholins Allé 7
building 1331, room 027
DK-8000 Aarhus C

T: +45 8716 5241
M: jws at<mailto:jws at>


From: ASIS&T Special Interest Group on Metrics [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Bornmann, Lutz
Sent: 17 April 2014 09:47
Subject: Re: [SIGMETRICS] Papers

Dear Jesper,

I agree to Loet. It is not clear to me, what you expect Jesper. You are cited in our papers, but I think it wouldn't be inappropriate to mention that you are the first one who used effect size measures (confidence intervals etc.) in bibliometrics. Since many years, I and many colleagues used it. (I already used it in my master thesis at the end of the 1990s.)



From: ASIS&T Special Interest Group on Metrics [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Loet Leydesdorff
Sent: Thursday, April 17, 2014 8:15 AM
Subject: Re: [SIGMETRICS] Papers

Dear Jesper,

Is there anything new to add to this debate? We thought that referencing the argument would be sufficient in this context.

At the time, we responded more fully in Bornmann & Leydesdorff (2013) and Leydesdorff (2013), and added power analysis (Cohen, 1988) to the statistical test of the Leiden (2011) rankings, available at (Leydesdorff & Bornmann, 2012) in response to your contributions (Schneider 2012 and 2013).

In my opinion, the issue of using significance testing, confidence intervals, and/or power analysis is to be decided from the perspective of the functionality of answering research questions. Otherwise, the debate tends to remain meta-theoretical and one risks to become repetitive.



Bornmann, L., & Leydesdorff, L. (2013). Statistical Tests and Research Assessments: A comment on Schneider (2012). Journal of the American Society for Information Science and Technology, 64(6), 1306-1308.
Leydesdorff, L. (2013). Does the specification of uncertainty hurt the progress of scientometrics? Journal of Informetrics, 7(2), 292-293.
Leydesdorff, L., & Bornmann, L. (2012). Testing Differences Statistically with the Leiden Ranking. Scientometrics, 92(3), 781-783.
Schneider, J. W. (2012). Testing University Rankings Statistically: Why this Perhaps is not such a Good Idea after All. Some Reflections on Statistical Power, Effect Size, Random Sampling and Imaginary Populations. In É. Archambault, Y. Gingras & V. Larivière (Eds.), Science & Technology Indicators (STI) 2012 (Vol. 2, pp. 719-732). Montreal: Universite de Quebec a Montreal.
Schneider, J. W. (2013). Caveats for using statistical significance test in research assessments. Journal of Informetrics, 7(1), 50-62.

-----Original Message-----
From: ASIS&T Special Interest Group on Metrics [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Jesper Wiborg Schneider
Sent: Wednesday, April 16, 2014 8:55 PM
Subject: Re: [SIGMETRICS] Papers

Dear Lutz,

Interesting paper, the latter one, and interesting to see how the 'debate' in our field is reflected in the references you and your coauthor give:

"In bibliometrics, it has been also recommended to go beyond statistical significance testing (Bornmann & Leydesdorff, 2013; Schneider, 2012)."

I guess you can call this quote an understatement, at least from my perspective. I do not think anyone recommended to go 'beyond statistical significance testing' in scientometrics/bibliometrics before I criticized the current practice in 'Caveats for using statistical significance tests in research assessments" first published in the Arxiv in 2011: and later in 2013 in Journal of Informetrics.

In 2012, at the STI conference I exteneded the critic in the paper you mention in the quote, discussing one of your papers on university rankings and exemplifying the use of effect sizes in relation to such rankings, in fact the use of Cohen's h in relation to the proportion of top 10 percent highly cited papers - basically the same example you bring forward in this paper.

Only then - as far as I can follow the ever faster publishing chronology - did you and other colleagues react to some my criticisms, including an endorsement of the use of effect sizes and CI until then not visible.

Now I do not hunger for more references or the like, but I would appreciate that when we in the community have a debate or thread that such a debate/thread is outlined thoroughly and honestly in the review section - the purpose with a review. This case is not the first one and it gives one the impression that our literature is not read ... or worse ...? I am not sure whether this paper is under review, but I guess me writing this mail is the risk you run when announcing this on the this list.

Kind regards Jesper


From: ASIS&T Special Interest Group on Metrics [SIGMETRICS at LISTSERV.UTK.EDU] on behalf of Bornmann, Lutz [lutz.bornmann at GV.MPG.DE]

Sent: 16 April 2014 15:53


Subject: [SIGMETRICS] Papers

BRICS countries and scientific excellence: A bibliometric analysis of most frequently-cited papers Lutz Bornmann<>, Caroline Wagner<>, Loet Leydesdorff<>

(Submitted on 14 Apr 2014)

The BRICS countries (Brazil, Russia, India, and China, and South Africa) are noted for their increasing participation in science and technology. The governments of these countries have been boosting their investments in research and development to become part of the group of nations doing research at a world-class level. This study investigates the development of the BRICS countries in the domain of top-cited papers (top 10% and 1% most frequently cited papers) between 1990 and 2010. To assess the extent to which these countries have become important players on the top level, we compare the BRICS countries with the top-performing countries worldwide. As the analyses of the (annual) growth rates show, with the exception of Russia, the BRICS countries have increased their output in terms of most frequently-cited papers at a higher rate than the top-cited countries worldwide. In a further step of analysis for this study, we generate co-authorship networks among authors of highly cited papers for four time points to view changes in BRICS participation (1995, 2000, 2005, and 2010). Here, the results show that all BRICS countries succeeded in becoming part of this network, whereby the Chinese collaboration activities focus on the USA.

Available at:

The substantive and practical significance of citation impact differences between institutions: Guidelines for the analysis of percentiles using effect sizes and confidence intervals Richard Williams<>, Lutz Bornmann<>

(Submitted on 12 Apr 2014)

In our chapter we address the statistical analysis of percentiles: How should the citation impact of institutions be compared? In educational and psychological testing, percentiles are already used widely as a standard to evaluate an individual's test scores - intelligence tests for example - by comparing them with the percentiles of a calibrated sample. Percentiles, or percentile rank classes, are also a very suitable method for bibliometrics to normalize citations of publications in terms of the subject category and the publication year and, unlike the mean-based indicators (the relative citation rates), percentiles are scarcely affected by skewed distributions of citations. The percentile of a certain publication provides information about the citation impact this publication has achieved in comparison to other similar publications in the same subject category and publication year. Analyses of percentiles, however, have not always been presented in the most effective and meaningful way. New APA guidelines (American Psychological Association, 2010) suggest a lesser emphasis on significance tests and a greater emphasis on the substantive and practical significance of findings. Drawing on work by Cumming (2012) we show how examinations of effect sizes (e.g. Cohen's d statistic) and confidence intervals can lead to a clear understanding of citation impact differences.

Available at:


Dr. Dr. habil. Lutz Bornmann

Division for Science and Innovation Studies Administrative Headquarters of the Max Planck Society Hofgartenstr. 8

80539 Munich

Tel.: +49 89 2108 1265

Mobil: +49 170 9183667

Email: bornmann at<mailto:bornmann at<mailto:bornmann at at>>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 6171 bytes
Desc: image001.jpg
URL: <>

More information about the SIGMETRICS mailing list