skewed citation distributions should not be averaged

Wed Aug 31 12:46:18 EDT 2011

Dear Sigmetrics readers:

I recalled an interesting contribution by Professor Glänzel that discussed the validity of using means in bibliometric analysis (forget about medians here) and I wondered how others would reconcile Professor Leydesdorff's claim in the article below with Professor Glänzel's argument? I have always adhered to Professor Glänzel's view, and would be interested in seeing a discussion of this issue in this forum.  The excerpt from Professor Glänzel appears here:

Seven Myths in Bibliometrics.
About facts and fiction in quantitative science studies
Wolfgang Glänzel1,2
05 June 2008

1Steunpunt O&O Indicatoren, K.U. Leuven, Dept. MSI, Leuven, Belgium,
Wolfgang dot Glanzel at econ dot kuleuven dot be
2Institute for Research Policy Studies, Hungarian Academy of Sciences, Budapest, Hungary,
glanzw at iif dot hu

from:

H. Kretschmer & F. Havemann (Eds.): Proceedings of WIS 2008, Berlin
Fourth International Conference on Webometrics, Informetrics and Scientometrics & Ninth COLLNET Meeting
Humboldt-Universität zu Berlin, Institute for Library and Information Science (IBI)
This is an Open Access document licensed under the Creative Commons License BY

Available at: http://www.collnet.de/Berlin-2008/GlanzelWIS2008smb.pdf

Excerpt:

2.7 Myth #7: Don't use averages in
bibliometrics
The myth: Methods of classical statistics may
not be applied to bibliometric distributions since
those are discrete and extremely skewed. Therefore
the use of medians and quantiles should be
preferred.
The background of this myth is quite obvious.
The Gaussian normal distribution, being one of
the most important families of continuous probability
distributions, arises in many areas of
statistics. If a statistical sample follows a normal
distribution, then the observations should be
symmetrically distributed around the sample
mean and the standard deviation can be used to
determine a tolerance threshold for individual
observations. However, this is obviously not the
case in bibliometrics. Most bibliometric distributions
are far from being symmetric and discrete.
Publication-activity and citation-impact distributions
are often extremely skewed, the majority of
the observations are below the sample mean and
the rest of the sample elements are located in the
long tail of the distributions. In such cases the
mean value and the standard deviation seem to
be completely useless. Therefore the application
of classical tools of moment-based statistics
seems not to be appropriate in research evaluation
either. This is a misbelief. According to the
central limit theorem, the distribution of the
means of random samples is approximately
normal for a large sample size, provided the
underlying distribution of the population is in
the domain of attraction of the Gaussian distribution.
In other words, sample means approach a
normal distribution regardless of the distribution
of the population if the number of observations
is large enough and the first statistical moments
are finite. Consequently, means and shares of
different samples drawn from the same populations
can be compared with each other and the
significance of the deviation can be determined.
Means and shares are used as unbiased estimators
of the expected value and the corresponding
probabilities, respectively. Furthermore, in the
case of skewed discrete distributions the mean
value is superior to median. The underlying
methods of application of mathematical statistics
have been described, among others, by
Schubert and Glänzel (1983), Glänzel and Moed
(2002) and reliability-related statistics have been
regularly and successfully applied to bibliometrics
since. These statistical properties have severe
effects on ranking issues as well. Different
ranks can prove as ties because the underlying
indicator values might not differ significantly
(cf. Glänzel and Debackere 2007).
The myth of the inapplicability of Gaussian
statistics in a bibliometric context actually arose
from a misunderstanding, namely from the assumed
comparison of individual observations
with a standard. However, that is not what statistics
does.

--David Pendlebury
________________________________
From: ASIS&T Special Interest Group on Metrics [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Loet Leydesdorff
Sent: Tuesday, August 30, 2011 11:10 PM
To: SIGMETRICS at LISTSERV.UTK.EDU
Subject: [SIGMETRICS] skewed citation distributions should not be averaged

A Rejoinder on Energy versus Impact Indicators<http://arxiv.org/abs/1108.5845>
Scientometrics (in press)
Citation distributions are so skewed that using the mean or any other central tendency measure is ill-advised. Unlike G. Prathap's scalar measures (Energy, Exergy, and Entropy or EEE), the Integrated Impact Indicator (I3) is based on non-parametric statistics using the (100) percentiles of the distribution. Observed values can be tested against expected ones; impact can be qualified at the article level and then aggregated.

pdf available at http://arxiv.org/ftp/arxiv/papers/1108/1108.5845.pdf

** apologies for cross postings
________________________________
Loet Leydesdorff
Professor, University of Amsterdam
Amsterdam School of Communications Research (ASCoR)
Kloveniersburgwal 48, 1012 CX Amsterdam.
Tel. +31-20-525 6598; fax: +31-842239111
loet at leydesdorff.net <mailto:loet at leydesdorff.net> ; http://www.leydesdorff.net/
Visiting Professor, ISTIC, <http://www.istic.ac.cn/Eng/brief_en.html> Beijing; Honorary Fellow, SPRU, <http://www.sussex.ac.uk/spru/> University of Sussex

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20110831/37c995f9/attachment.html>