Pearson's r and ACA

Fri Jan 16 23:21:52 EST 2004

Dear Stephen and colleagues,

I have a higher appreciation of Ahlgren et al.'s contribution because
the problem of the non-normality of the distributions is a serious one,
particularly if one extends the multivariate perspective with the
time-series one. Problems of auto-correlation and auto-covariation make
the design almost imtractible. Non-parametric statistics can provide a
much more transparant solution than making transformations in order to
rescue the assumptions of normality.

These considerations brought me to an interest in information theory. In
"The Static and Dynamic Analysis of Network Data Using Information
Theory," Social Networks 13 (1991) 301-345 I provided a set of
algorithms that enabled me to elaborate in concrete studies collected
later in "The Challenge of Scientometrics" (Leiden: DSWO Press/Leiden
University, 1995). Thus, my critique of Ahlgren et al. (2003) would be
that they do not set the next step and more definitively move into
non-parametric statistics. Given the increasing interest in the last
decade or so in entropical systems, entropy statistics seems an obvious
candidate. The explanatory power of these statistics is considerable.

For example, using entropy statistics one can provide an exact solution
for the divisive clustering problem. I provide the proof at pp. 166 ff.
of the second (2001) edition of "The Challenge of Scientometrics" and
apply it there to a small set of chemistry journals. Furthermore one can
allow for asymmetries in distances while similarity criteria in the
parametric tradition are always symmetrical. This is just an example.
All measurement can be expressed in bits of information and therefore be
compared.

With kind regards,

Loet

> -----Original Message-----
> From: ASIS&T Special Interest Group on Metrics
> [mailto:SIGMETRICS at listserv.utk.edu] On Behalf Of Stephen J Bensman
> Sent: Friday, January 16, 2004 6:36 PM
> To: SIGMETRICS at listserv.utk.edu
> Subject: [SIGMETRICS] Pearson's r and ACA
>
>
> Loet and Steven,
>
> Don Kraft forwarded to me your discussion on the utilization
> of measures in ACA, and I have decided to add my two cents.
> Please excuse my interference in your discussion, but I have
> already become involved in this debate in various ways.  A
> comment of mine on this matter will shortly be published
> JASIST, so I might as well make clearer some of the reasoning
> that underlies my position.  I am neither an expert in author
> cocitation analysis (ACA) nor a mathematical expert.  In
> general I may be considered a data person, who uses standard
> statistical techniques, and it is from this position that I
> approached the controvsery that erupted on the pages JASIST
> in the two articles cited below.
>
> In general I favor the Pearson approach that was developed by
> White over
> the measures suggest by Ahlgren, Jarvening, and Rousseau.
> This is for the
> following reasons.
>
> 1)  Utilization of the Pearson has been the standard method
> in ACA, and it has worked quite well until now.  There is now
> a body of research based upon it, to which one compare one's
> results to gain insights from the work of others.
>
> 2)  The Pearson operates within an established system of
> hypothesis testing.  However, for it to operate properly
> within this system, all the variables have to be normally
> distributed.  This is usually accomplished by testing for the
> underlying distributions, and, if these are not normal, one
> performs a mathematical transformation to accomplish this
> objective--square root if the distributions are Poisson, and
> logarithmic, if the distributions are highly skewed.  As in
> most social and biological research, where the processes are
> also multiplicative, information science data usually
> requires the logarithmic transformation to meet the linear
> and additive requirements of the Pearson.  However, as White
> demonstrates, the Pearson is very distributionally robust.
> In general, White seems to consider the Pearson as only a
> sorting mechanism and does not consider tests of significance
> as important in ACA.
>
> 3)  The Pearson measures both similaties and dissimilarities,
> showing similarities as positive correlations and
> dissimilarities as negative correlations.  This is crucial in
> ACA, whose main purpose is to partition authors into
> different sets.  One can easily see partitions in matrices,
> because similarities are positive, dissimilarities are
> negative, and no relationship is zero.  Therefore, there is a
> full scale of measurement. The measures suggested by Ahlgren,
> Jarvening, and Rousseau measure only similarities on a scale
> from 0 to 1 with partition somewhat incongruously at 0.5, and
> it is difficult for me to see the logic of such measures in ACA.
>
> 4)  The Pearson is sensitive to zeros, and correlations do
> change when persons not related to members of a given set of
> persons are added to this set.  However, unlike  Ahlgren,
> Jarvening, and Rousseau, who regard this as a major fault of
> the Pearson and axiomatically posit that the relations must
> remain invariant, I regard this as a major advantage of the
> Pearson, First, it is only natural for relations among
> persons to change when foreign persons are added to their
> mix.  One can think of numerous social situations where this
> happens.  Second, the very changes may be themselves
> informative and lead to further analyses and understanding of
> the relationships.  Ahlgren, Jarvening, and Rousseau
> axiomatically block this.
>
> However, there may be situations, where such changes in
> correlations upon the addition of zeros may be detrimental to
> the proper understanding of the data.  This  Ahlgren,
> Jarvening, and Rousseau have adamantly refused to
> demonstrate, baldly stating to me that their job is only to
> dream up axioms and does not include testing them against any
> reality.  In his demonstration White clearly showed that the
> Pearson leads to exactly the same results as the measures
> proposed by  Ahlgren, Jarvening, and Rousseau, and Rousseau
> has admitted to me that their measures are not superior to
> the Pearson.  However, White was only working with the same
> extreme case set up by Ahlgren, Jarvening, and Rousseau, and
> what really needs to be done is to take a set a data, apply
> both measures to it, and see how their different actions
> could lead to different interpretations of the data.  If both
> measures lead to the same result, then there is no case at
> all for utilizing the measures proposed by Ahlgren,
> Jarvening, and Rousseau.
>
> I hope you find the above useful.  If you have any comments
> or criticisms. please send them to me.  I would be interested
> in hearing your opinions.
>
> Respectfully,
> Stephen J. Bensman
> LSU Libraries
> Louisiana State University
> Baton Rouge, LA
> USA
>
> ATTACHMENT BELOW:
>
>
>
> (See attached file: Rousseau-White.doc)
>
>
>
>
>
>
>
> Ahlgren, P., Jarneving, B., and Rousseau, R.  (2003).
> Requirements for a cocitation similarity measurewith special
> reference to Pearson's correlation coefficient.  Journal of
> the American Society for Information Science and Technology,
> 54, 550-560.
>
> White, H.D. (2003.).  Author cocitation analysis and
> Pearson's r.  Journal of the American Society for Information
> Science and Technology, 54, 1250-1259..
>