A message from a colleague on Pearsons r and other statistics

Loet Leydesdorff loet at LEYDESDORFF.NET
Sun Jan 4 08:29:01 EST 2004


-----Original Message-----
From: ASIS&T Special Interest Group on Metrics
[mailto:SIGMETRICS at listserv.utk.edu] On Behalf Of Dr. Don Kraft
Sent: Friday, January 02, 2004 4:18 PM
To: SIGMETRICS at listserv.utk.edu
Subject: [SIGMETRICS] A message from a colleague on Pearsons r and other
statistics


  In recent times there have been articles now in press in JASIST with
issues at stake that need clarification.  Pearson's r does have the
characteristic of being responsive to 0s, and this may affect the
outcome of the analysis, depending upon what the purpose of the analysis
is.  My basic point still
holds:  If people think that Pearson's r is distortive because of this
characteristic, then it is beholden upon them to demonstrate in a
practical manner how this can affect conclusions and how a measure that
does not distort in this manner can lead to a different interpretation
of the data. This some authors adamantly refused to do.  Mathematical
fine points are all very well and good, but there needs to be a "so
what" section at the end.  I have spent the last few years reading Karl
Pearson, R.A. Fisher, Student, Richard von Mises, Bortkiewicz, etc., and
their mathematical capabilities far exceeded those of Rousseau and
company. However, their papers--sometimes described as "a jungle of
formulae"--always had "so what" sections, in which the issue at stake
was clearly stated in clear language and pictures.  If people want a
model, then they should read the 3rd 1911 edition of Karl Pearson's "The
Grammar of Science," where he explains contingency and correlation in
clear terms to laymen.  It is a model of clear thinking, which is often
lacking in present-day work. Stephen J Bensman notsjb at lsu.edu


Dear colleague,

I have argued in a series of articles (in Scientometrics, in the early
1990s, and later compiled in a book entitled "The Challenge of
Scientometrics") that one cannot expect scientific communications to be
normally distributed and that therefore parametric measures like the
Pearson correlation are often distorting. Negative powerlaws, for
example, cannot be represented by using the means of the distribution.
The large number of zeros in scientometric matrices are a consequence of
the shape of the distributions and therefore not avoidable in most
applications.

Fortunately, a mathematical theory of communication is available based
on non-parametric statistics. I have elaborated this theory for
scientometric applications in the book mentioned above. The
probabilistic entropy measures are non-parametric. Unfortunately, most
of the available software and most of the statistics is based on
parametric measures (e.g., factor analysis) because attributes to agents
are often normally distributed. Many scientometric indicators (e.g.,
impact factors) are based on averages although one is aware that the
underlying processes are the result of a dynamic at the network level.
(Of course, one can still use the average as an indicator, but the
interpretation is different from the case of a normal distribution.)

Salton's cosine has certain properties that merit discussion. It allows
for a hierarchical representation (because one can also use the cosine
among centroids). The Vector Space Model has more recently been
developed as a form of multidimensional scaling (e.g., Ortego Priego,
2003). Furthermore, this measure is most easy in the computation, while
information-theoretical (probabilistic) measures are often
computationally intensive. I have argued in previous postings in this
thread that the cosine enables us to make a spatial representation
different from the factor analysis (which is usually based on the
Pearson correlation). The cosine, for example, enables us to visualize a
hierarchy in the network, while the factor analysis exhibits
(heterarchical) dimensions. Thus, there may be a beginning of an answer
to your "so what?" question.

With kind regards,


Loet
  _____

Loet Leydesdorff
Science & Technology Dynamics, University of Amsterdam Amsterdam School
of Communications Research (ASCoR) Kloveniersburgwal 48, 1012 CX
Amsterdam
Tel.: +31-20-525 6598; fax: +31-20-525 3681
loet at leydesdorff.net; http://www.leydesdorff.net



More information about the SIGMETRICS mailing list