CWTS Journal Indicators
j.s.katz at SUSSEX.AC.UK
Tue Oct 8 13:10:19 EDT 2013
I will give you question some thought. It may take a bit of time as I am
up to my eyeballs with work at the moment.
Have you seen Clauset web page? It has lots of hints and loads of
software for determining distributional characteristics.
However, I have found the while the R: routines are fast the MatLab
routines needed to determine the p-values on the distributions can take
days and sometimes weeks on a regular MatLab server to compute. This is
because the p-values are determined using a Monte Carlo simulations.
Lately I have been using the high performance computing lab at the local
university. On these machines I can run parallel MatLab which reduces
the computational time at least an order of magnitude.
Recently, I was the awarded an Elsevier data set from Scopu under the
Elsevier Bibliometric Research Program. As you may know for years I have
been publishing on the scaling correlation between groups sizes measured
using peer-reviewed papers and group impacts measured using citations to
the groups papers. In every instance the scaling correlation had an
exponent greater than 1.0. Now I am going to look at the scaling
correlation between groups sizes and (1) within field citations and (2)
out of field citations. It is my hypothesis that while the within field
scaling correlation may have an exponent > 1.0 (cumulative advantage)
the outside of field citation will have an exponent of < 1.0 (cumulative
disadvantage). It is going to be a busy winter using Clauset's routines :)
Yes - I would sure like to know if a truncated power law with a tail
exponent of less than 3.0 can be claimed to have a defined variance. I
am currently retired living in Saskatoon Canada with little access to
knowledgeable resources theses days. If your mathematicians can bring
some light to the issue it would be sincerely helpful.
Visiting Research Fellow
On 10/8/2013 10:30 AM, Stephen J Bensman wrote:
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> From: Stephen J Bensman
> Sent: Tuesday, October 08, 2013 11:28 AM
> To: 'j.s.katz at sussex.ac.uk'
> Subject: RE: [SIGMETRICS] CWTS Journal Indicators
> I have read and am reading all that you suggested below. What has happened to me is that I stumbled into a scientific revolution that I did not know about--complex systems, and this revolution is encompassing informetrics. I am very interested in the Yule-Simon distribution, because I am interested in Google Scholar. What I call "Yule Simon" model and you call the power- law model is of interest to me, because it is an accepted model for both the structure of the Web and scientometric laws. Therefore it is a perfect model to serve as a foil for cross-disciplinary comparisons to determine how well different disciplines approximate this model. It really reveals the differences between the structure of disciplines. For example, we find that economics may approximate what Clauset-Shalizi-Newman in their SIAM article call "the power-law +cut-off" because Google citations concentrate on the key book of economist Nobelists, shooting it far to the right and distorting the distributi
on, whereas mathematics cannot come anywhere near the power-law model due to the insularity of the subfields. Mathematicians cannot communicate with each other across sub-disciplines and have difficulty in farting out a right asymptote at all (pardon my Shakespearian language).
> Perhaps you can help me. I need to make the tests recommended by Clauset-Shalizi-Newman in their SIAM article. That way I can go from r^2 approximations to "null-and-alternate hypotheses." I cannot handle their tests and need a computer program that can do this for me--power-law testing for dummies. We have found that, if you have a power-law + cut-off, it gives false, ridiculous readings on the r^2 approximations. We find that if you truncate the book, the model does more approach a power-law model. I am working with some mathematicians on this, and I hope that they can handle this, but mathematicians are not physicists. Perhaps they can help you with your problem. If you want, I can send you my binomial analysis of the Clauset-Shalizi-Newman findings. For example, discrete or counting distributions are more likely than continuous distributions to approximate the power-law model.
> Hopefully, if I can survive Chico, I may be able to survive this. As Nietzsche once said, "What does not kill me, makes me stronger."
> Stephen J Bensman
> LSU Libraries
> Lousiana State University
> Baton Rouge, LA 70803
> -----Original Message-----
> From: Sylvan Katz [mailto:j.s.katz at sussex.ac.uk]
> Sent: Tuesday, October 08, 2013 10:10 AM
> To: Stephen J Bensman
> Subject: Re: [SIGMETRICS] CWTS Journal Indicators
> PLEASE NOTE -- I am reply off the reflector. My concern was not with the Journal Impact Factor it was with SNIP.
>> That is what is meant by "scale free"--there is no measure of central
>> tendency representative of the population. All scientometric
>> measures-- it seems--should be based on the the characteristics of the
>> tail or right asymptote. However, we are finding that this differs
>> wildly depending on the structure of the field. There seems to be no one shoe fits all.
> Perhaps you have not read Newman's articles (Power laws, Pareto distributions and Zipf's law. Contemporary Physics, 46(5), 323-351. ).
> In particular section See section 3.2.
> or perhaps you have not seen my paper that accompanied my keynote speach "Scale-independent measures: theory and practice". It shows that power law distributions can occur in any field or subfield. While it may not be a power law at one point in time it may be at another point in time.
> This analysis was done using the Gold standard technique for determining if a distribution is a power law as given in the article Clauset, A., Shalizi, C. S., & Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661-703.
>> However, this does not seem to be the case empirically. Ordinal
>> rankings of journals by impact factor are remarkably stable over time.
>> I proved this in my article on Garfield and the impact factor posted
>> on Gene's site (see pp. 66-68):
> This is a ranking not an analysis of the underlying distribution.
> Stability over time could mean that the same error occurs again and again i.e. means are calculated for distributions that have an infinite variance and over this error is stable. Unfortunately an analysis of rankings will not tell you anything about the citation distribution. The distribution of the primary measures such as citation distributions needs to be performed before one can assert that the CLT holds.
> Nees did make a good point about truncated power law distribution with exponents less than 3.0 perhaps have a defined variance. I am not a mathematician so I cannot confirm that this may be possible for truncated power laws. I am looking for a mathematician that can clarify this fact.
More information about the SIGMETRICS