skewed citation distributions should not be averaged

David A. Pendlebury david.pendlebury at THOMSONREUTERS.COM
Wed Aug 31 14:34:35 EDT 2011


Dear Professor Leydesdorff,


Thank you for your reply.



I noticed your example of individuals at the University of Amsterdam in your paper - and such small data sets are of course subject to many difficulties. My question arose because of the strong statement -- without qualification -- in your paper:
"Citation distributions are so skewed that using the mean or any other central tendency measure is ill-advised."

Best wishes, David

________________________________
From: ASIS&T Special Interest Group on Metrics [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Loet Leydesdorff
Sent: Wednesday, August 31, 2011 11:11 AM
To: SIGMETRICS at LISTSERV.UTK.EDU
Subject: Re: [SIGMETRICS] skewed citation distributions should not be averaged

Dear David:

Wolfgang Glaenzel precisely defined the conditions:

either. This is a misbelief. According to the
central limit theorem, the distribution of the
means of random samples is approximately
normal for a large sample size, provided the
underlying distribution of the population is in
the domain of attraction of the Gaussian distribution.
In other words, sample means approach a
normal distribution regardless of the distribution
of the population if the number of observations
is large enough and the first statistical moments
are finite. Consequently, means and shares of
different samples drawn from the same populations
can be compared with each other and the
significance of the deviation can be determined.

Gangan Prathap's contribution is interesting in this context because using a physical metaphor, he distinguished between "energy" and "exergy". The difference (E - X), in his opinion, is "a kind of entropy"-indeed, "a kind of" because the dimensionality of energy and entropy is different. If one assumes "a kind of ideal gas," then one can compute with the mean. In evaluation research, however, we don't have so large number of observations that the constraints can be neglected. There is no reason to assume that the CLT is valid. For example, there are principles in science such as preferential attachment that operate against the assumption of a tendency to the mean.

Instead of showing this each time, the approach of using percentiles does not have to make the assumption. The hundred percentiles can follow the citation curve as a continuous variable ("quantiles"). One can use non-parametric statistics (which is available for 50 or so years) instead. Instead of determining the deviation from the mean, one can test the observation against the expectation (as when using chi-square). The specification of the expectation can enrich the research design.

Best wishes,
Loet


Means and shares are used as unbiased estimators
of the expected value and the corresponding
probabilities, respectively. Furthermore, in the
case of skewed discrete distributions the mean
value is superior to median. The underlying
methods of application of mathematical statistics
have been described, among others, by
Schubert and Glänzel (1983), Glänzel and Moed
(2002) and reliability-related statistics have been
regularly and successfully applied to bibliometrics
since. These statistical properties have severe
effects on ranking issues as well. Different
ranks can prove as ties because the underlying
indicator values might not differ significantly
(cf. Glänzel and Debackere 2007).
The myth of the inapplicability of Gaussian
statistics in a bibliometric context actually arose
from a misunderstanding, namely from the assumed
comparison of individual observations
with a standard. However, that is not what statistics
does.

--David Pendlebury
________________________________
From: ASIS&T Special Interest Group on Metrics [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Loet Leydesdorff
Sent: Tuesday, August 30, 2011 11:10 PM
To: SIGMETRICS at LISTSERV.UTK.EDU
Subject: [SIGMETRICS] skewed citation distributions should not be averaged

A Rejoinder on Energy versus Impact Indicators<http://arxiv.org/abs/1108.5845>
Scientometrics (in press)
Citation distributions are so skewed that using the mean or any other central tendency measure is ill-advised. Unlike G. Prathap's scalar measures (Energy, Exergy, and Entropy or EEE), the Integrated Impact Indicator (I3) is based on non-parametric statistics using the (100) percentiles of the distribution. Observed values can be tested against expected ones; impact can be qualified at the article level and then aggregated.

pdf available at http://arxiv.org/ftp/arxiv/papers/1108/1108.5845.pdf

** apologies for cross postings
________________________________
Loet Leydesdorff
Professor, University of Amsterdam
Amsterdam School of Communications Research (ASCoR)
Kloveniersburgwal 48, 1012 CX Amsterdam.
Tel. +31-20-525 6598; fax: +31-842239111
loet at leydesdorff.net <mailto:loet at leydesdorff.net> ; http://www.leydesdorff.net/
Visiting Professor, ISTIC, <http://www.istic.ac.cn/Eng/brief_en.html> Beijing; Honorary Fellow, SPRU, <http://www.sussex.ac.uk/spru/> University of Sussex

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20110831/fc8ea46b/attachment.html>


More information about the SIGMETRICS mailing list