skewed citation distributions should not be averaged

Wed Aug 31 13:43:21 EDT 2011

Interesting question.  However, Seglen rejects the validity of the impact factor as a  measure of journal worth precisely because it is the arithmetic mean of a skewed distribution.  Here is what he wrote:

                               Citational heterogeneity is thus a fundamental irreducible

                                property of the articles in a journal (as well as of other units

                                of science...).  Very few articles will actually have a citedness

                                close to the journal mean, thus the journal impact factor

                                cannot be used as a representative indicator for individual

                                journal articles.  The overall journal impact can be heavily

                                determined by a few very highly cited articles....  (p. 145)

Seglen follows the traditional, historical interpretation of the arithmetic mean as representative of a type or "typical," and this requires the distribution to be normal for the arithmetic mean to be so.  However, Karl Pearson proved that the normal distribution is virtually nonexistent in reality and that most distributions in reality are of the negative binomial type, because reality is not random and additive but  causal and multiplicative.  To approximate the law of error, you are given the transformational mantra of log + 1 (in case of zeros), and this makes the geometric mean typical.  From this perspective, it would seem, Professor Leydesdorff has a better grasp of the situation, and I stand with him.  He's my buddy.

Stephen J Bensman

LSU Libraries

Lousiana State University

Baton Rouge, LA 70803

From: ASIS&T Special Interest Group on Metrics [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of David A. Pendlebury
Sent: Wednesday, August 31, 2011 11:46 AM
To: SIGMETRICS at LISTSERV.UTK.EDU
Subject: Re: [SIGMETRICS] skewed citation distributions should not be averaged

Dear Sigmetrics readers:

I recalled an interesting contribution by Professor Glänzel that discussed the validity of using means in bibliometric analysis (forget about medians here) and I wondered how others would reconcile Professor Leydesdorff's claim in the article below with Professor Glänzel's argument? I have always adhered to Professor Glänzel's view, and would be interested in seeing a discussion of this issue in this forum.  The excerpt from Professor Glänzel appears here:

Seven Myths in Bibliometrics.

About facts and fiction in quantitative science studies

Wolfgang Glänzel1,2

05 June 2008

1Steunpunt O&O Indicatoren, K.U. Leuven, Dept. MSI, Leuven, Belgium,

Wolfgang dot Glanzel at econ dot kuleuven dot be

2Institute for Research Policy Studies, Hungarian Academy of Sciences, Budapest, Hungary,

glanzw at iif dot hu

from:

H. Kretschmer & F. Havemann (Eds.): Proceedings of WIS 2008, Berlin

Fourth International Conference on Webometrics, Informetrics and Scientometrics & Ninth COLLNET Meeting

Humboldt-Universität zu Berlin, Institute for Library and Information Science (IBI)

This is an Open Access document licensed under the Creative Commons License BY

Available at: http://www.collnet.de/Berlin-2008/GlanzelWIS2008smb.pdf

Excerpt:

2.7 Myth #7: Don't use averages in

bibliometrics

The myth: Methods of classical statistics may

not be applied to bibliometric distributions since

those are discrete and extremely skewed. Therefore

the use of medians and quantiles should be

preferred.

The background of this myth is quite obvious.

The Gaussian normal distribution, being one of

the most important families of continuous probability

distributions, arises in many areas of

statistics. If a statistical sample follows a normal

distribution, then the observations should be

symmetrically distributed around the sample

mean and the standard deviation can be used to

determine a tolerance threshold for individual

observations. However, this is obviously not the

case in bibliometrics. Most bibliometric distributions

are far from being symmetric and discrete.

Publication-activity and citation-impact distributions

are often extremely skewed, the majority of

the observations are below the sample mean and

the rest of the sample elements are located in the

long tail of the distributions. In such cases the

mean value and the standard deviation seem to

be completely useless. Therefore the application

of classical tools of moment-based statistics

seems not to be appropriate in research evaluation

either. This is a misbelief. According to the

central limit theorem, the distribution of the

means of random samples is approximately

normal for a large sample size, provided the

underlying distribution of the population is in

the domain of attraction of the Gaussian distribution.

In other words, sample means approach a

normal distribution regardless of the distribution

of the population if the number of observations

is large enough and the first statistical moments

are finite. Consequently, means and shares of

different samples drawn from the same populations

can be compared with each other and the

significance of the deviation can be determined.

Means and shares are used as unbiased estimators

of the expected value and the corresponding

probabilities, respectively. Furthermore, in the

case of skewed discrete distributions the mean

value is superior to median. The underlying

methods of application of mathematical statistics

have been described, among others, by

Schubert and Glänzel (1983), Glänzel and Moed

(2002) and reliability-related statistics have been

regularly and successfully applied to bibliometrics

since. These statistical properties have severe

effects on ranking issues as well. Different

ranks can prove as ties because the underlying

indicator values might not differ significantly

(cf. Glänzel and Debackere 2007).

The myth of the inapplicability of Gaussian

statistics in a bibliometric context actually arose

from a misunderstanding, namely from the assumed

comparison of individual observations

with a standard. However, that is not what statistics

does.

--David Pendlebury

________________________________

From: ASIS&T Special Interest Group on Metrics [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Loet Leydesdorff
Sent: Tuesday, August 30, 2011 11:10 PM
To: SIGMETRICS at LISTSERV.UTK.EDU
Subject: [SIGMETRICS] skewed citation distributions should not be averaged

A Rejoinder on Energy versus Impact Indicators <http://arxiv.org/abs/1108.5845> 
Scientometrics (in press)

Citation distributions are so skewed that using the mean or any other central tendency measure is ill-advised. Unlike G. Prathap's scalar measures (Energy, Exergy, and Entropy or EEE), the Integrated Impact Indicator (I3) is based on non-parametric statistics using the (100) percentiles of the distribution. Observed values can be tested against expected ones; impact can be qualified at the article level and then aggregated. 

pdf available at http://arxiv.org/ftp/arxiv/papers/1108/1108.5845.pdf 

** apologies for cross postings

________________________________

Loet Leydesdorff 

Professor, University of Amsterdam
Amsterdam School of Communications Research (ASCoR)
Kloveniersburgwal 48, 1012 CX Amsterdam.
Tel. +31-20-525 6598; fax: +31-842239111

loet at leydesdorff.net <mailto:loet at leydesdorff.net> ; http://www.leydesdorff.net/ <http://www.leydesdorff.net/>  
Visiting Professor, ISTIC, <http://www.istic.ac.cn/Eng/brief_en.html> Beijing; Honorary Fellow, SPRU, <http://www.sussex.ac.uk/spru/> University of Sussex 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20110831/d541bbbc/attachment.html>