skewed citation distributions should not be averaged

Wed Aug 31 16:29:15 EDT 2011

Here is my take on Loet's statement and David's question.

Perhaps the key point in Glanzel's paper is made with the following 
statements

"Therefore the application of classical tools of moment-based statistics 
seems not to be appropriate in research evaluation either. This is a 
misbelief. According to the central limit theorem, the distribution of the 
means of random samples is approximately  normal for a large sample size, 
provided the underlying distribution of the population is in the domain of 
attraction of the Gaussian distribution. In other words, sample means 
approach a normal distribution regardless of the distribution of the 
population if the number of observations is large enough and the first 
statistical moments are finite."

The caveats are ".. approximately normal for a large sample size", ".. the 
underlying distribution of the population is in the domain of the Guassian 
distribution, and "... the first statistical moments are finite"

Power law or heavy tailed distributions that have a power law with a tail 
exponent less than 3.0 have infinite variance which does not satisfy the 
central limit theorem and the use of classic moment-based statistics can be 
inappropriate for comparative purposes (Willinger, et al 2008).

Redner (1998) and Clauset et al (2009) reported that the tail exponent for 
citation distributions are close to 3.0. Some of our recent investigations 
have examined the evolution of the tail exponent of citation  distributions 
as a function of fixed citation window sizes. The citation distributions 
for the WoS and the 13 NSF field levels the tail exponents can go below 3.0 
into the range of 2.7 to 2.8 even when a relatively short citation window 
is used.

The difficulty one faces doing with such an analysis based on Clauset's 
methodology is determining if the tail of the distribution is actually a 
power law or some other function such as power law tail with an exponential 
decay, stretched exponential, etc (Clauset 2009). We are in the process of 
investigating this issue further.

It is possible that means are an inappropriate measure for comparing 
population if the citation distribution has a power law tail with exponent 
less than 3.0. Even if the exponent is not less than 3.0 our measurement 
show the variance of the mean can be in the order of 10-1000 times as large 
as the mean as the exponent gets closer 3.0 making comparisons of means 
still of questionable value for evaluative purposes. On the other hand if 
the citation distribution is not a power law then the comparison of means 
maybe appropriate.

A closer look at the evolution of the citation distributions over a long 
period of time maybe necessary before a definitive answer can be given to 
the question of whether "Citation distributions are so skewed that using 
the mean or any other central tendency measure is ill-advised."

It would be interesting to hear what others have to say.

References

Clauset, A., Shalizi, C. S. & Newman, M. E. J. Power-law distributions in 
empirical data. SIAM Review 51, 661-703 (2009).

Redner, S. How popular is your paper? An empirical study of the citation 
distribution. The European Physical Journal B - Condensed Matter and 
Complex Systems 4, 1434-6028 (1998).

Willinger, W., Alderson, D., Doyle, J. C. & Lun, L. in Proceedings of the 
2004 Winter Simulation Conference (ed. R. G. Ingalls, M. D. R., J. S. 
Smith, and B. A. Peters, eds.) (2004).

Dr. J. Sylvan Katz, Visiting Fellow
SPRU, University of Sussex
http://www.sussex.ac.uk/Users/sylvank