AW: [SIGMETRICS] skewed citation distributions should not be averaged

Wed Aug 31 15:00:14 EDT 2011

Dear David,

The mean is strongly influenced by the publications with high citation
counts. The median - another measure of central tendency - is not. Whether
there is such an effect of (a few) highly-cited papers can be checked by
comparing the mean with the median for the citations of a publications set.
If the mean is significantly higher than the median highly cited papers are
effective here.

Best,

Lutz

---------------------------------------
Dr. Dr. habil. Lutz Bornmann
Max Planck Society
Administrative Headquarters
Hofgartenstr. 8
80539 Munich
Tel.: 089/2108-1265
Email: bornmann at gv.mpg.de
WWW: www.lutz-bornmann.de
ResearcherID: http://www.researcherid.com/rid/A-3926-2008

________________________________

Von: ASIS&T Special Interest Group on Metrics im Auftrag von David A.
Pendlebury
Gesendet: Mi 31.08.2011 20:34
An: SIGMETRICS at LISTSERV.UTK.EDU
Betreff: Re: [SIGMETRICS] skewed citation distributions should not be
averaged

Dear Professor Leydesdorff,

Thank you for your reply.

I noticed your example of individuals at the University of Amsterdam in your
paper - and such small data sets are of course subject to many difficulties.
My question arose because of the strong statement -- without qualification --
in your paper: 

"Citation distributions are so skewed that using the mean or any other
central tendency measure is ill-advised."

Best wishes, David

________________________________

From: ASIS&T Special Interest Group on Metrics
[mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Loet Leydesdorff
Sent: Wednesday, August 31, 2011 11:11 AM
To: SIGMETRICS at LISTSERV.UTK.EDU
Subject: Re: [SIGMETRICS] skewed citation distributions should not be
averaged

Dear David:

Wolfgang Glaenzel precisely defined the conditions:

either. This is a misbelief. According to the

central limit theorem, the distribution of the

means of random samples is approximately

normal for a large sample size, provided the

underlying distribution of the population is in

the domain of attraction of the Gaussian distribution.

In other words, sample means approach a

normal distribution regardless of the distribution

of the population if the number of observations

is large enough and the first statistical moments

are finite. Consequently, means and shares of

different samples drawn from the same populations

can be compared with each other and the

significance of the deviation can be determined.

Gangan Prathap's contribution is interesting in this context because using a
physical metaphor, he distinguished between "energy" and "exergy". The
difference (E - X), in his opinion, is "a kind of entropy"-indeed, "a kind
of" because the dimensionality of energy and entropy is different. If one
assumes "a kind of ideal gas," then one can compute with the mean. In
evaluation research, however, we don't have so large number of observations
that the constraints can be neglected. There is no reason to assume that the
CLT is valid. For example, there are principles in science such as
preferential attachment that operate against the assumption of a tendency to
the mean. 

Instead of showing this each time, the approach of using percentiles does not
have to make the assumption. The hundred percentiles can follow the citation
curve as a continuous variable ("quantiles"). One can use non-parametric
statistics (which is available for 50 or so years) instead. Instead of
determining the deviation from the mean, one can test the observation against
the expectation (as when using chi-square). The specification of the
expectation can enrich the research design.

Best wishes, 

Loet

Means and shares are used as unbiased estimators

of the expected value and the corresponding

probabilities, respectively. Furthermore, in the

case of skewed discrete distributions the mean

value is superior to median. The underlying

methods of application of mathematical statistics

have been described, among others, by

Schubert and Glänzel (1983), Glänzel and Moed

(2002) and reliability-related statistics have been

regularly and successfully applied to bibliometrics

since. These statistical properties have severe

effects on ranking issues as well. Different

ranks can prove as ties because the underlying

indicator values might not differ significantly

(cf. Glänzel and Debackere 2007).

The myth of the inapplicability of Gaussian

statistics in a bibliometric context actually arose

from a misunderstanding, namely from the assumed

comparison of individual observations

with a standard. However, that is not what statistics

does.

--David Pendlebury

________________________________

From: ASIS&T Special Interest Group on Metrics
[mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Loet Leydesdorff
Sent: Tuesday, August 30, 2011 11:10 PM
To: SIGMETRICS at LISTSERV.UTK.EDU
Subject: [SIGMETRICS] skewed citation distributions should not be averaged

A Rejoinder on Energy versus Impact Indicators
<http://arxiv.org/abs/1108.5845> 
Scientometrics (in press)

Citation distributions are so skewed that using the mean or any other central
tendency measure is ill-advised. Unlike G. Prathap's scalar measures (Energy,
Exergy, and Entropy or EEE), the Integrated Impact Indicator (I3) is based on
non-parametric statistics using the (100) percentiles of the distribution.
Observed values can be tested against expected ones; impact can be qualified
at the article level and then aggregated. 

pdf available at http://arxiv.org/ftp/arxiv/papers/1108/1108.5845.pdf 

** apologies for cross postings

________________________________

Loet Leydesdorff 

Professor, University of Amsterdam
Amsterdam School of Communications Research (ASCoR)
Kloveniersburgwal 48, 1012 CX Amsterdam.
Tel. +31-20-525 6598; fax: +31-842239111

loet at leydesdorff.net <mailto:loet at leydesdorff.net> ;
http://www.leydesdorff.net/ <http://www.leydesdorff.net/>  
Visiting Professor, ISTIC, <http://www.istic.ac.cn/Eng/brief_en.html>
Beijing; Honorary Fellow, SPRU, <http://www.sussex.ac.uk/spru/> University of
Sussex