Pearson's r and ACA
Loet Leydesdorff
loet at LEYDESDORFF.NET
Wed Jan 21 05:20:40 EST 2004
ps. I understood now from a private email of one of the authors that
they did not apply the logarithmic transformation to the data (despite
some text on p. 555 responding to the referee). The high values for the
Pearson are generated by treating the diagonal values as missing data
and not as zeros. This is noted on p. 554. Of course, zeros depress the
Pearson correlation in the case of otherwise positive values. This
explanation completely clarifies the misunderstanding.
Perhaps, it is useful in this context to note that the treatment of the
main diagonal has been the subject of some early work in scientometrics
by Noma and Price. The references are:
Noma, Elliott (1982). An Improved Method for Analyzing Square
Scientometric Transaction Matrices. Scientometrics 4, 297-316.
Price, Derek J. de Solla (1981). The Analysis of Square Matrices of
Scientometric Transactions. Scientometrics 3, 55-63.
With kind regards, Loet
Dear Stephen,
You mail clarifies the incomprehensible use of the statistics in Ahlgren
et al. (2003) because they indicate r = 0.89 between the variables
"Braun" and "Schubert" while upon computation one only finds r = 0.456.
However, they state in very cryptic wordings (on p. 555) that they
performed a logarithmic transformation and then, indeed, one finds r =
0.89.
I understand that this has been done for reasons of significance testing
given the assumption of a bivariate normal distribution. (Peter van den
Besselaar and I had an exchange in a recent issue of JASIST on
significance testing in the case of descriptive statistics versus
inferential statistics.) However, these authors do not wish to test for
significance. It is confusing.
Even more confusing is on p. 556 that the correlation (after logarithmic
transformation) would go to r = 0.94 in this case by adding only zeros.
The zeros should have no effect after the transformation, shouldn't
they? But this is crucial to the argument of the paper ???
Anyhow, my point was about using information theory. This implies a
logarithmic transformation as you wish to emphasize. More importantly,
it allows for a unique and exact solution to the problem of the
dividedness. I have proven that in "The Challenge of Scientometrics"
(Chapter 9, pp. 166 ff. of the 2001-edition). I'll apply the algorithm
to the matrix under discussion and submit a brief contribution to JASIST
on the subject. The decomposition using information theory is not
disturbed by outliers because the measure in non-parametric.
Perhaps, you can do me the favour to explain the difference in the value
of the correlations between Table 8 and Table 9 in Ahlgren et al.
(2003). Let us focus on "Braun" and "Schubert" as variables. How did
they arrive at r = 0.94?
With kind regards,
Loet
_____
Loet Leydesdorff
Amsterdam School of Communications Research (ASCoR)
Kloveniersburgwal 48, 1012 CX Amsterdam
Tel.: +31-20- 525 6598; fax: +31-20- 525 3681
<mailto:loet at leydesdorff.net> loet at leydesdorff.net ;
<http://www.leydesdorff.net/> http://www.leydesdorff.net/
<http://www.upublish.com/books/leydesdorff-sci.htm> The Challenge of
Scientometrics ; <http://www.upublish.com/books/leydesdorff.htm> The
Self-Organization of the Knowledge-Based Society
> -----Original Message-----
> From: ASIS&T Special Interest Group on Metrics
> [ <mailto:SIGMETRICS at LISTSERV.UTK.EDU>
mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Stephen J Bensman
> Sent: Tuesday, January 20, 2004 4:40 PM
> To: SIGMETRICS at LISTSERV.UTK.EDU
> Subject: [SIGMETRICS] Pearson's r and ACA
>
>
> Dear Loet et al.
>
> In respect to your suggestion of using nonparametric
> statistics to handle
> non-normal distributions, I will answer you only in general
> terms. I was
> trained in statistics by an ecologist, who introduced me to
> biometric statistics. Through him I became intrigued how
> biological, social, and information phenomena act precisely
> in the same way and that biostatistics are therefore the
> statistics applicable to information science. I became
> absolutely fascinated by the unity of society and nature in
> this respect. My first inclination was to use nonparametric
> statistics to counter the nonnormal distributions, but he
> just took my model and contemptously threw it in the waste
> basket. He insisted that you must use the more powerful
> parametric statistics whenever possible, using the
> logarithmic transformation. To emphasize his point, he took
> off a shelf above his desk a little log from his brother's
> woodlot in Maine, which had a little "n" painted on its end.
> He slammed it on his desk and stated, "This is my log natural."
>
> The use of mathematical transformations to normalize
> distributions raises some interesting philosophical
> questions. From the perspective of the normal law of error,
> biological, social, and information reality makes a person
> feel that he is caught in a fun house full of distorting
> mirrors. In order to see and measure error, you have to put
> on mathematical eye glasses, which transform the reality to
> that of the perspective of the normal distribution. This
> makes you wonder--what is actual reality--that of the raw
> data, or that of the data logaritmically transformed to the
> requirements of the normal distribution? B. C. Brookes in
> the article below dealt with this philosophical question ,
> and, basing himself on the psychometric work of Gustav
> Fechner, Brookes argued that the logarithmic perspective was
> the proper one for information science. Interestingly enough
> John Maynard Keynes in his treatise on probability thought
> that the lognormal distribution centered on the geometric
> mean was the proper law of error for society.
>
> However, lately I have been switching over to nonparametric
> techniques for reasons stemming out of what seems to be your
> main research interest--classifying phenomena into sets or
> groups with mathematical and statistical techniques such as
> clustering, factor analysis, etc. Precise mathematical
> techniques including many statistical ones are really not
> applicable to information science due to Bradford's Law of
> Scattering, which causes all information science sets to be
> fuzzy. Therefore, your sets are always plagued by foreign
> contaminants that distort estimates of parameters and result
> in tremendous outliers. To counter this, I have been
> switching to nonparametric techniques like the chi-squared
> test for homogeneity instead of correlation because of the
> ability to work within broad categories instead in terms of
> precise fits. In other words, one has to use cruder methods
> to counter the fuzzy outliers unless one can more precisely
> define set membership. To tell you the honest truth,
> defining precise sets with mathematical techniques like
> cluster analysis is probably beyond my mental capacities and
> will have to be done by the likes of you. All I can say is
> that you should use whatever works as long as you can explain
> to laymen like me what does work and why it does work. This
> would be tremendously helpful.
>
> In respect to entropy I did take a fling at this at the end
> of the article below. I did it on the basis of the theories
> of the famous French statistician Emile Borel, who postulated
> total homogeneity and randomness as a function of entropy.
> According to Borel, tremendously skewed distributions
> resulting in vast inhomogeneities--like those found in
> information science--require vast energy inputs, and, as
> energy inputs decline, the entire system collapses with a
> declining mean and variance around this mean until the system
> can be modeled by the Poisson distribution. A very
> interesting way to model obselescence for purposes of weeding
> library collections. However, in general, I prefer
> biological models to physical ones such as Borel's borrowing
> from thermodynamics.
>
> Anyhow, I hope the above did not bore you and that you find
> the observations useful.
>
> Respectfully,
>
> Stephen J. Bensman
>
> Brookes, Bertram C. 1980a. The foundations of information
> science, part I: Philosophical aspects. Journal of
> information science 2: 125-33.
>
> Bensman, Stephen J. 2000. Probability Distributions in Library and
> Information Science: A Historical and Practitioner Viewpoint.
> Journal of the American Society for Information Science 51: 816-833 .
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20040121/00cf7ad2/attachment.html>
More information about the SIGMETRICS
mailing list