Pearson's r and ACA
Stephen J Bensman
notsjb at LSU.EDU
Fri Jan 16 12:35:53 EST 2004
Loet and Steven,
Don Kraft forwarded to me your discussion on the utilization of measures in
ACA, and I have decided to add my two cents. Please excuse my interference
in your discussion, but I have already become involved in this debate in
various ways. A comment of mine on this matter will shortly be published
JASIST, so I might as well make clearer some of the reasoning that
underlies my position. I am neither an expert in author cocitation
analysis (ACA)
nor a mathematical expert. In general I may be considered a data person,
who
uses standard statistical techniques, and it is from this position that I
approached the controvsery that erupted on the pages JASIST in the two
articles cited below.
In general I favor the Pearson approach that was developed by White over
the measures suggest by Ahlgren, Jarvening, and Rousseau. This is for the
following reasons.
1) Utilization of the Pearson has been the standard method in ACA, and it
has worked quite well until now. There is now a body of research based
upon it, to which one compare one's results to gain insights from the work
of others.
2) The Pearson operates within an established system of hypothesis
testing. However, for it to operate properly within this system, all the
variables have to be normally distributed. This is usually accomplished by
testing for the underlying distributions, and, if these are not normal, one
performs a mathematical transformation to accomplish this objective--square
root if the distributions are Poisson, and logarithmic, if the
distributions are highly skewed. As in most social and biological
research, where the processes are also multiplicative, information science
data usually requires the logarithmic transformation to meet the linear and
additive requirements of the Pearson. However, as White demonstrates, the
Pearson is very distributionally robust. In general, White seems to
consider the Pearson as only a sorting mechanism and does not consider
tests of significance as important in ACA.
3) The Pearson measures both similaties and dissimilarities, showing
similarities as positive correlations and dissimilarities as negative
correlations. This is crucial in ACA, whose main purpose is to partition
authors into different sets. One can easily see partitions in matrices,
because similarities are positive, dissimilarities are negative, and no
relationship is zero. Therefore, there is a full scale of measurement.
The measures suggested by Ahlgren, Jarvening, and Rousseau measure only
similarities on a scale from 0 to 1 with partition somewhat incongruously
at 0.5, and it is difficult for me to see the logic of such measures in
ACA.
4) The Pearson is sensitive to zeros, and correlations do change when
persons not related to members of a given set of persons are added to this
set. However, unlike Ahlgren, Jarvening, and Rousseau, who regard this as
a major fault of the Pearson and axiomatically posit that the relations
must remain invariant, I regard this as a major advantage of the Pearson,
First, it is only natural for relations among persons to change when
foreign persons are added to their mix. One can think of numerous social
situations where this happens. Second, the very changes may be themselves
informative and lead to further analyses and understanding of the
relationships. Ahlgren, Jarvening, and Rousseau axiomatically block this.
However, there may be situations, where such changes in correlations upon
the addition of zeros may be detrimental to the proper understanding of the
data. This Ahlgren, Jarvening, and Rousseau have adamantly refused to
demonstrate, baldly stating to me that their job is only to dream up axioms
and does not include testing them against any reality. In his
demonstration White clearly showed that the Pearson leads to exactly the
same results as the measures proposed by Ahlgren, Jarvening, and Rousseau,
and Rousseau has admitted to me that their measures are not superior to the
Pearson. However, White was only working with the same extreme case set up
by Ahlgren, Jarvening, and Rousseau, and what really needs to be done is to
take a set a data, apply both measures to it, and see how their different
actions could lead to different interpretations of the data. If both
measures lead to the same result, then there is no case at all for
utilizing the measures proposed by Ahlgren, Jarvening, and Rousseau.
I hope you find the above useful. If you have any comments or criticisms.
please send them to me. I would be interested in hearing your opinions.
Respectfully,
Stephen J. Bensman
LSU Libraries
Louisiana State University
Baton Rouge, LA
USA
ATTACHMENT BELOW:
(See attached file: Rousseau-White.doc)
Ahlgren, P., Jarneving, B., and Rousseau, R. (2003). Requirements for a
cocitation similarity measurewith special reference to Pearson’s
correlation coefficient. Journal of the American Society for Information
Science and Technology, 54, 550-560.
White, H.D. (2003.). Author cocitation analysis and Pearson’s r. Journal
of the American Society for Information Science and Technology, 54,
1250-1259..
More information about the SIGMETRICS
mailing list