Pearson's r and ACA

Fri Jan 16 16:36:29 EST 2004

Steven,

As I have pointed out, I am not an expert in ACA.  Therefore, my answer may
not cover all the possibilites of hypothesis testing in ACA.  The main
purpose of hypothesis testing in ACA, as near as I can judge, would be to
see if the relationships were significant or not.  It provides just another
basis of judgment on the strength of the relationship.  Moreover, another
advantage of utilizing the Pearson is that SAS and SPSS automatically give
you the test.  Therefore, it entails no further effort on your part.
However, as I pointed out,  Howard White--who, after all, played a major
role in developing this technique--does not seem to regard tests of
significance as crucial in ACA.  His opinion would trump mine, as he is the
expert here.

Second, if it were the purpose of the Ahlgren, Jarneving, and Rousseau to
accomplish a Kuhnian overthrow of the established paradigm, then they did a
poor job of it.  To accomplish a Kuhnian overthrow, they would not only
have had to prove the Pearson incorrect but to have substituted a better
measure for it.  They refused to demonstrate how their measures are better,
leaving your "tempest in a teapot" hypothesis still standing.

Finally, Don Kraft did not pose the "so what" question.  I did.  He was
only posting it for me, for which he was roundly berated by a reader of
this LISTSERV for posting Bensman's "diatribe."

Respectfully,
Steve B.

"Steven A. Morris" <samorri at OKSTATE.EDU>@LISTSERV.UTK.EDU> on 01/16/2004
02:19:20 PM

Please respond to ASIS&T Special Interest Group on Metrics
       <SIGMETRICS at LISTSERV.UTK.EDU>

Sent by:    ASIS&T Special Interest Group on Metrics
       <SIGMETRICS at LISTSERV.UTK.EDU>

To:    SIGMETRICS at LISTSERV.UTK.EDU
cc:     (bcc: Stephen J Bensman/notsjb/LSU)

Subject:    Re: [SIGMETRICS] Pearson's r and ACA

I don't think the attachment letter on Dr. Bensman's posting made it
through
the listerserver.  A copy of his letter can be found at:

http://samorris.ceat.okstate.edu/web/Rousseau-White.doc

Stephen,

A couple of quick comments:

1) How is hypothesis testing applied to ACA?  Do you mean that after
clustering authors, we can apply a hypothesis test to confirm that each
author is indeed a member of a specific group?

2) "Utilization of the Pearson has been the standard method in ACA" is
actually a good argument, but remember, paradigms were meant to be
overthrown.  Maybe ACA is going through some sort of "Kuhnian crisis" at
the
moment.  More likely, this discussion is just a "tempest in a teapot." ;-)

3) I wonder if there is some dataset out there where the "true" clustering
of authors is known well enough to allow direct comparison of clustering
and
mapping based on different similarity measures.  I think this is would help
answer the "so what?" question that was posed by Dr. Kraft.  It would be
nice to have a "chili cookoff" style contest, similar to some of the signal
processing contests at certain IEEE conferences, to show off ACA author
classification algorithms.

Thanks,

Steven Morris

On Fri, 16 Jan 2004 11:35:53 -0600, Stephen J Bensman <notsjb at LSU.EDU>
wrote:

>Loet and Steven,
>
>Don Kraft forwarded to me your discussion on the utilization of measures
in
>ACA, and I have decided to add my two cents.  Please excuse my
interference
>in your discussion, but I have already become involved in this debate in
>various ways.  A comment of mine on this matter will shortly be published
>JASIST, so I might as well make clearer some of the reasoning that
>underlies my position.  I am neither an expert in author cocitation
>analysis (ACA)
>nor a mathematical expert.  In general I may be considered a data person,
>who
>uses standard statistical techniques, and it is from this position that I
>approached the controvsery that erupted on the pages JASIST in the two
>articles cited below.
>
>In general I favor the Pearson approach that was developed by White over
>the measures suggest by Ahlgren, Jarvening, and Rousseau.   This is for
the
>following reasons.
>
>1)  Utilization of the Pearson has been the standard method in ACA, and it
>has worked quite well until now.  There is now a body of research based
>upon it, to which one compare one's results to gain insights from the work
>of others.
>
>2)  The Pearson operates within an established system of hypothesis
>testing.  However, for it to operate properly within this system, all the
>variables have to be normally distributed.  This is usually accomplished
by
>testing for the underlying distributions, and, if these are not normal,
one
>performs a mathematical transformation to accomplish this
objective--square
>root if the distributions are Poisson, and logarithmic, if the
>distributions are highly skewed.  As in most social and biological
>research, where the processes are also multiplicative, information science
>data usually requires the logarithmic transformation to meet the linear
and
>additive requirements of the Pearson.  However, as White demonstrates, the
>Pearson is very distributionally robust.  In general, White seems to
>consider the Pearson as only a sorting mechanism and does not consider
>tests of significance as important in ACA.
>
>3)  The Pearson measures both similaties and dissimilarities, showing
>similarities as positive correlations and dissimilarities as negative
>correlations.  This is crucial in ACA, whose main purpose is to partition
>authors into different sets.  One can easily see partitions in matrices,
>because similarities are positive, dissimilarities are negative, and no
>relationship is zero.  Therefore, there is a full scale of measurement.
>The measures suggested by Ahlgren, Jarvening, and Rousseau measure only
>similarities on a scale from 0 to 1 with partition somewhat incongruously
>at 0.5, and it is difficult for me to see the logic of such measures in
>ACA.
>
>4)  The Pearson is sensitive to zeros, and correlations do change when
>persons not related to members of a given set of persons are added to this
>set.  However, unlike  Ahlgren, Jarvening, and Rousseau, who regard this
as
>a major fault of the Pearson and axiomatically posit that the relations
>must remain invariant, I regard this as a major advantage of the Pearson,
>First, it is only natural for relations among persons to change when
>foreign persons are added to their mix.  One can think of numerous social
>situations where this happens.  Second, the very changes may be themselves
>informative and lead to further analyses and understanding of the
>relationships.  Ahlgren, Jarvening, and Rousseau axiomatically block this.
>
>However, there may be situations, where such changes in correlations upon
>the addition of zeros may be detrimental to the proper understanding of
the
>data.  This  Ahlgren, Jarvening, and Rousseau have adamantly refused to
>demonstrate, baldly stating to me that their job is only to dream up
axioms
>and does not include testing them against any reality.  In his
>demonstration White clearly showed that the Pearson leads to exactly the
>same results as the measures proposed by  Ahlgren, Jarvening, and
Rousseau,
>and Rousseau has admitted to me that their measures are not superior to
the
>Pearson.  However, White was only working with the same extreme case set
up
>by Ahlgren, Jarvening, and Rousseau, and what really needs to be done is
to
>take a set a data, apply both measures to it, and see how their different
>actions could lead to different interpretations of the data.  If both
>measures lead to the same result, then there is no case at all for
>utilizing the measures proposed by Ahlgren, Jarvening, and Rousseau.
>
>I hope you find the above useful.  If you have any comments or criticisms.
>please send them to me.  I would be interested in hearing your opinions.
>
>Respectfully,
>Stephen J. Bensman
>LSU Libraries
>Louisiana State University
>Baton Rouge, LA
>USA
>
>ATTACHMENT BELOW:
>
>
>
>(See attached file: Rousseau-White.doc)
>
>
>
>
>
>
>
>Ahlgren, P., Jarneving, B., and Rousseau, R.  (2003).  Requirements for a
>cocitation similarity measurewith special reference to Pearsonâ€™s
>correlation coefficient.  Journal of the American Society for Information
>Science and Technology, 54, 550-560.
>
>White, H.D. (2003.).  Author cocitation analysis and Pearsonâ€™s r.
Journal
>of the American Society for Information Science and Technology, 54,
 >1250-1259..