CWTS Journal Indicators

Sylvan Katz j.s.katz at SUSSEX.AC.UK
Mon Sep 30 10:43:55 EDT 2013


Nees Jan

> the citation distributions underlying the SNIP calculation. In general,
> however, citation distributions do not exactly follow a power law (I am
> assuming that this is what you mean by a ‘scaling distribution’),
> although their tail may have power law properties, at least in an
> approximate sense. Given the skewed nature of citation distributions, I

In a scaling or power law distribution only the tail of the distribution 
exhibits a power law. The magnitude of the scaling exponent of the tail 
has an impact on whether or not the distribution can be characterized by 
its mean and variance.

When the exponent is greater than or equal to 3.0 the distribution can 
be characterized by it mean and variance. However, when the exponent is 
less than 3.0 the variance become infinite, the central limit theorem 
(CLT) no longer applies and the distribution can no longer be 
characterized by its mean and variance. This has important implication 
for any average based measure. Newman explained this as follows in a 
sigmetrics posting two years ago

https://listserv.utk.edu/cgi-bin/wa?A2=ind1109&L=sigmetrics&T=0&F=&S=&X=1CF66970C633426B19&P=3693

"for the Central Limit Theorem to be applicable, and hence for the mean 
to thee valid, the distribution has to fall in the "domain of attraction 
of the Gaussian distribution".  As others have pointed out, the Pareto 
or power-law distribution to which the citation distribution is believed 
to approximate, does not fall in this domain of attraction if its 
exponent is less than 3.  Thus, the theorem is not wrong, but it's not 
applicable here."

"What does this mean in practice?  Of course one can always calculate a 
mean number of citations for a given data sample.  But if one calculates 
such means for different samples -- even samples drawn from the exact 
same underlying distribution -- one will get wildly different answers. 
Indeed, it can be shown that the values of the mean themselves follow a 
power law under these circumstances, and hence can themselves vary over 
orders of magnitude."

For a detailed explanation see Newman, M. E. J. (2005). Power laws, 
Pareto distributions and Zipf’s law. Contemporary Physics, 46(5), 
323-351. See section 3.2.

For an examples of how this effects average based bibliometric 
indicators see my keynote presentation at STI 2012 "Scale Independent 
Measures: Theory and Practice" ( 
http://sticonference.org/index.php?page=proc )

> when average-based measures such as SNIP are complemented with stability
> intervals, I believe that this offers a sufficiently robust approach to
> deal with the skewed nature of citation distributions.

When using average-based indicators based it is important to know the 
distribution of the underlying primary measures. If the distribution is 
a scaling distribution with an exponent less than 3.0 then while the 
average make be calculable it maybe meaningless as the variance maybe 
infinite since central limit theorem would no long applies.

It seems to me that for any bibliometric indicator based on averages to 
be robust the underlying distributions of the primary measures need to 
be shown to fall within the Gaussian domain. And since the exponent of a 
citation distribution can be greater than, less than or equal to 3.0 
then the distribution likely has to be determined each time the 
indicator is calculated since in some instances the distribution will 
fall within a Gaussian distribution and at other times it may be a 
Pareto distribution with a meaningless average.

It would useful to know if the SNIP indicator shows any of these 
sensitivities and hence it would be useful to know if the distribution 
of citations in a given year to papers in the preceding three years 
scales and if it does can the exponent be less than 3.0

Cheers
Sylvan



More information about the SIGMETRICS mailing list