Pearson's r and ACA

Stephen J Bensman notsjb at LSU.EDU
Tue Jan 20 10:40:15 EST 2004


Dear Loet et al.

In respect to your suggestion of using nonparametric statistics to handle
non-normal distributions, I will answer you only in general terms.   I was
trained in statistics by an ecologist, who introduced me to biometric
statistics.  Through him I became intrigued how biological, social, and
information phenomena act precisely in the same way and that biostatistics
are therefore the statistics applicable to information science.  I became
absolutely fascinated by the unity of society and nature in this respect.
My first inclination was to use nonparametric statistics to counter the
nonnormal distributions, but he just took my model and contemptously threw
it in the waste basket.  He insisted that you must use the more powerful
parametric statistics whenever possible, using the logarithmic
transformation.  To emphasize his point, he took off a shelf above his desk
a little log from his brother's woodlot  in Maine, which had a little "n"
painted on its end.  He slammed it on his desk and stated, "This is my log
natural."

The use of mathematical transformations to normalize distributions raises
some interesting philosophical questions.  From the perspective of the
normal law of error, biological, social, and information reality makes a
person feel that he is caught in a fun house full of distorting mirrors.
In order to see and measure error, you have to put on mathematical eye
glasses, which transform the reality to that of the perspective of the
normal distribution.  This makes you wonder--what is actual reality--that
of the raw data, or that of the data logaritmically transformed to the
requirements of the normal distribution?  B. C. Brookes in the article
below dealt with this philosophical question , and, basing himself on the
psychometric work of Gustav Fechner, Brookes argued that the logarithmic
perspective was the proper one for information science.  Interestingly
enough John Maynard Keynes in his treatise on probability thought that the
lognormal distribution centered on the geometric mean was the proper law of
error for society.

However, lately I have been switching over to nonparametric techniques for
reasons stemming out of what seems to be your main research
interest--classifying phenomena into sets or groups with mathematical and
statistical techniques such as clustering, factor analysis, etc.  Precise
mathematical techniques including many statistical ones are really not
applicable to information science due to Bradford's Law of Scattering,
which causes all information science sets to be fuzzy.  Therefore, your
sets are always plagued by foreign contaminants that distort estimates of
parameters and result in tremendous outliers.  To counter this, I have been
switching to nonparametric techniques like the chi-squared test for
homogeneity instead of correlation because of the ability to work within
broad categories instead in terms of precise fits.  In other words, one has
to use cruder methods to counter the fuzzy outliers unless one can more
precisely define set membership.  To tell you the honest truth, defining
precise sets with mathematical techniques like cluster analysis is probably
beyond my mental capacities  and will have to be done by the likes of you.
All I can say is that you should use whatever works as long as you can
explain to laymen like me what does work and why it does work.  This would
be tremendously helpful.

In respect to entropy I did take a fling at this at the end of the article
below.  I did it on the basis of the theories of the famous French
statistician Emile Borel, who postulated total homogeneity and randomness
as a function of entropy.  According to Borel, tremendously skewed
distributions resulting in vast inhomogeneities--like those found in
information science--require vast energy inputs, and, as energy inputs
decline, the entire system collapses with a declining mean and variance
around this mean until the system can be modeled by the Poisson
distribution.  A very interesting way to model obselescence for purposes of
weeding library collections.  However, in general, I prefer biological
models to physical ones such as Borel's borrowing from thermodynamics.

Anyhow, I hope the above did not bore you and that you find the
observations useful.

Respectfully,

Stephen J. Bensman

Brookes, Bertram C. 1980a. The foundations of information science, part I:
Philosophical aspects. Journal of information science 2: 125–33.

Bensman, Stephen J.   2000. Probability Distributions in Library and
Information Science: A Historical and Practitioner Viewpoint. Journal of
the American Society for Information Science 51: 816-833 .



More information about the SIGMETRICS mailing list