Pearson's r and ACA

Steven A. Morris samorri at OKSTATE.EDU
Fri Jan 16 15:19:20 EST 2004


I don't think the attachment letter on Dr. Bensman's posting made it through
the listerserver.  A copy of his letter can be found at:

http://samorris.ceat.okstate.edu/web/Rousseau-White.doc


Stephen,


A couple of quick comments:

1) How is hypothesis testing applied to ACA?  Do you mean that after
clustering authors, we can apply a hypothesis test to confirm that each
author is indeed a member of a specific group?

2) "Utilization of the Pearson has been the standard method in ACA" is
actually a good argument, but remember, paradigms were meant to be
overthrown.  Maybe ACA is going through some sort of "Kuhnian crisis" at the
moment.  More likely, this discussion is just a "tempest in a teapot." ;-)

3) I wonder if there is some dataset out there where the "true" clustering
of authors is known well enough to allow direct comparison of clustering and
mapping based on different similarity measures.  I think this is would help
answer the "so what?" question that was posed by Dr. Kraft.  It would be
nice to have a "chili cookoff" style contest, similar to some of the signal
processing contests at certain IEEE conferences, to show off ACA author
classification algorithms.

Thanks,

Steven Morris


On Fri, 16 Jan 2004 11:35:53 -0600, Stephen J Bensman <notsjb at LSU.EDU> wrote:

>Loet and Steven,
>
>Don Kraft forwarded to me your discussion on the utilization of measures in
>ACA, and I have decided to add my two cents.  Please excuse my interference
>in your discussion, but I have already become involved in this debate in
>various ways.  A comment of mine on this matter will shortly be published
>JASIST, so I might as well make clearer some of the reasoning that
>underlies my position.  I am neither an expert in author cocitation
>analysis (ACA)
>nor a mathematical expert.  In general I may be considered a data person,
>who
>uses standard statistical techniques, and it is from this position that I
>approached the controvsery that erupted on the pages JASIST in the two
>articles cited below.
>
>In general I favor the Pearson approach that was developed by White over
>the measures suggest by Ahlgren, Jarvening, and Rousseau.   This is for the
>following reasons.
>
>1)  Utilization of the Pearson has been the standard method in ACA, and it
>has worked quite well until now.  There is now a body of research based
>upon it, to which one compare one's results to gain insights from the work
>of others.
>
>2)  The Pearson operates within an established system of hypothesis
>testing.  However, for it to operate properly within this system, all the
>variables have to be normally distributed.  This is usually accomplished by
>testing for the underlying distributions, and, if these are not normal, one
>performs a mathematical transformation to accomplish this objective--square
>root if the distributions are Poisson, and logarithmic, if the
>distributions are highly skewed.  As in most social and biological
>research, where the processes are also multiplicative, information science
>data usually requires the logarithmic transformation to meet the linear and
>additive requirements of the Pearson.  However, as White demonstrates, the
>Pearson is very distributionally robust.  In general, White seems to
>consider the Pearson as only a sorting mechanism and does not consider
>tests of significance as important in ACA.
>
>3)  The Pearson measures both similaties and dissimilarities, showing
>similarities as positive correlations and dissimilarities as negative
>correlations.  This is crucial in ACA, whose main purpose is to partition
>authors into different sets.  One can easily see partitions in matrices,
>because similarities are positive, dissimilarities are negative, and no
>relationship is zero.  Therefore, there is a full scale of measurement.
>The measures suggested by Ahlgren, Jarvening, and Rousseau measure only
>similarities on a scale from 0 to 1 with partition somewhat incongruously
>at 0.5, and it is difficult for me to see the logic of such measures in
>ACA.
>
>4)  The Pearson is sensitive to zeros, and correlations do change when
>persons not related to members of a given set of persons are added to this
>set.  However, unlike  Ahlgren, Jarvening, and Rousseau, who regard this as
>a major fault of the Pearson and axiomatically posit that the relations
>must remain invariant, I regard this as a major advantage of the Pearson,
>First, it is only natural for relations among persons to change when
>foreign persons are added to their mix.  One can think of numerous social
>situations where this happens.  Second, the very changes may be themselves
>informative and lead to further analyses and understanding of the
>relationships.  Ahlgren, Jarvening, and Rousseau axiomatically block this.
>
>However, there may be situations, where such changes in correlations upon
>the addition of zeros may be detrimental to the proper understanding of the
>data.  This  Ahlgren, Jarvening, and Rousseau have adamantly refused to
>demonstrate, baldly stating to me that their job is only to dream up axioms
>and does not include testing them against any reality.  In his
>demonstration White clearly showed that the Pearson leads to exactly the
>same results as the measures proposed by  Ahlgren, Jarvening, and Rousseau,
>and Rousseau has admitted to me that their measures are not superior to the
>Pearson.  However, White was only working with the same extreme case set up
>by Ahlgren, Jarvening, and Rousseau, and what really needs to be done is to
>take a set a data, apply both measures to it, and see how their different
>actions could lead to different interpretations of the data.  If both
>measures lead to the same result, then there is no case at all for
>utilizing the measures proposed by Ahlgren, Jarvening, and Rousseau.
>
>I hope you find the above useful.  If you have any comments or criticisms.
>please send them to me.  I would be interested in hearing your opinions.
>
>Respectfully,
>Stephen J. Bensman
>LSU Libraries
>Louisiana State University
>Baton Rouge, LA
>USA
>
>ATTACHMENT BELOW:
>
>
>
>(See attached file: Rousseau-White.doc)
>
>
>
>
>
>
>
>Ahlgren, P., Jarneving, B., and Rousseau, R.  (2003).  Requirements for a
>cocitation similarity measurewith special reference to Pearson’s
>correlation coefficient.  Journal of the American Society for Information
>Science and Technology, 54, 550-560.
>
>White, H.D. (2003.).  Author cocitation analysis and Pearson’s r.  Journal
>of the American Society for Information Science and Technology, 54,
>1250-1259..



More information about the SIGMETRICS mailing list