skewed citation distributions should not be averaged
Sylvan Katz
j.s.katz at SUSSEX.AC.UK
Wed Aug 31 16:29:15 EDT 2011
Here is my take on Loet's statement and David's question.
Perhaps the key point in Glanzel's paper is made with the following
statements
"Therefore the application of classical tools of moment-based statistics
seems not to be appropriate in research evaluation either. This is a
misbelief. According to the central limit theorem, the distribution of the
means of random samples is approximately normal for a large sample size,
provided the underlying distribution of the population is in the domain of
attraction of the Gaussian distribution. In other words, sample means
approach a normal distribution regardless of the distribution of the
population if the number of observations is large enough and the first
statistical moments are finite."
The caveats are ".. approximately normal for a large sample size", ".. the
underlying distribution of the population is in the domain of the Guassian
distribution, and "... the first statistical moments are finite"
Power law or heavy tailed distributions that have a power law with a tail
exponent less than 3.0 have infinite variance which does not satisfy the
central limit theorem and the use of classic moment-based statistics can be
inappropriate for comparative purposes (Willinger, et al 2008).
Redner (1998) and Clauset et al (2009) reported that the tail exponent for
citation distributions are close to 3.0. Some of our recent investigations
have examined the evolution of the tail exponent of citation distributions
as a function of fixed citation window sizes. The citation distributions
for the WoS and the 13 NSF field levels the tail exponents can go below 3.0
into the range of 2.7 to 2.8 even when a relatively short citation window
is used.
The difficulty one faces doing with such an analysis based on Clauset's
methodology is determining if the tail of the distribution is actually a
power law or some other function such as power law tail with an exponential
decay, stretched exponential, etc (Clauset 2009). We are in the process of
investigating this issue further.
It is possible that means are an inappropriate measure for comparing
population if the citation distribution has a power law tail with exponent
less than 3.0. Even if the exponent is not less than 3.0 our measurement
show the variance of the mean can be in the order of 10-1000 times as large
as the mean as the exponent gets closer 3.0 making comparisons of means
still of questionable value for evaluative purposes. On the other hand if
the citation distribution is not a power law then the comparison of means
maybe appropriate.
A closer look at the evolution of the citation distributions over a long
period of time maybe necessary before a definitive answer can be given to
the question of whether "Citation distributions are so skewed that using
the mean or any other central tendency measure is ill-advised."
It would be interesting to hear what others have to say.
References
Clauset, A., Shalizi, C. S. & Newman, M. E. J. Power-law distributions in
empirical data. SIAM Review 51, 661-703 (2009).
Redner, S. How popular is your paper? An empirical study of the citation
distribution. The European Physical Journal B - Condensed Matter and
Complex Systems 4, 1434-6028 (1998).
Willinger, W., Alderson, D., Doyle, J. C. & Lun, L. in Proceedings of the
2004 Winter Simulation Conference (ed. R. G. Ingalls, M. D. R., J. S.
Smith, and B. A. Peters, eds.) (2004).
Dr. J. Sylvan Katz, Visiting Fellow
SPRU, University of Sussex
http://www.sussex.ac.uk/Users/sylvank
More information about the SIGMETRICS
mailing list