White HD "Author cocitation analysis and ...

Loet Leydesdorff loet at LEYDESDORFF.NET
Tue Jan 6 14:20:20 EST 2004


Dear Steven,

Thank you for communicating these experimental results. They are
interesting.

It seems to me that you have convincingly shown that the two measures
(the binary and the non-binary one) are different in the case that there
is information available at a measurement scale higher than dichotomous
(e.g., at the interval level). Of course, if one has only binary
information, one can use the binary formulation of the formula, but this
is generated only because the square or the root of one is also one, and
the square or root of zero is also zero. Thus, the cosine is defined
more generally in terms of what you call the non-binary formulation.

I don't agree with the overlap function. It seems to me most naturally
to return to the original matrix of authors cited as cases and citations
as variables (columns). A cocitation is then the case that two cells are
filled in the same column. One can then compute cosines between authors
as the cases. Choose within SPSS for Analyze > Correlate > Distances and
you find all the options, including cosines between cases. There is no
need for the invention of a new function, in my opinion.

With kind regards,


Loet



Dear Loet,

Thanks very much for your interesting remarks.
In answer to item 1 below,  I have always converted the paper to
reference authors matrix and paper to term matrix to binary matrices so
that co-occurences can be calculated easily by multiplying the matrices
by their transpose.  I'd actually never thought of using the cosine
formula that you give below.  I did try that calculation on non-binary
paper to reference authors matrices using:

       cosine(x,y) = Sigma(i) x(i)y(i) / sqrt(Sigma(i) x(i)^2) *
Sigma(i) y(i)^2))

I crossplotted the similarity values thus obtained against "binary"
cosine similarity values.  The results can be seen at:
http://samorris.ceat.okstate.edu/web/non_bin_cos/default.htm
There does appear to be a lot of scatter between these two measures,
though in most of the paper collections it doesn't appear to be biased
off the 1:1 line.  I don't know what effect this difference would have
on clustering of authors. I'm not sure I agree with you that using the
binary version of the cosine similarity is "throwing away information."

After all, references are cited multiple times in papers but the data we

have available (from ISI) only shows that a reference showed up at least

once, yet the data is still very useful.  Granted that knowing the exact

number of times an author was cited in a paper adds more information,
I'm still not sure that using the non-binary cosine formula above is the

most appropriate way to exploit that extra information.  Alternate
approaches are available, for example, using the 'overlap' measure.

I have tried using an "overlap" function to compute cocitation counts
for cosine calculations.  For a paper the overlap of ref author i and
ref author j  is defined as min[m(i), m(j)],  m(i) and m(j) are the
number of times author i and author j were cited in the paper
respectively.  This appears to be a reasonable measure of multiple
co-citation as it doesn't give a lot of weight to co-citations with
authors that tend to appear many times in papers.  So "overlap cosine
similarity" can be calculated using   s(i,,j)  = sum[overlap(i,j)] /
sqrt( n(i)*n(j)) ) , where the sum is over all papers and n(i) and n(j)
are the sum over all papers of the number of citations to author i and j

respectively.  For the datasets I have, you can see crossplots of
"overlap cosine similarity" against "binary cosine similarity at:
http://samorris.ceat.okstate.edu/web/overlap/default.htm .  These plots
show that overlap similarity tends to be a little larger than binary
similarity. This may imply the the overlap method generally tends to
increase similarity over the binary method, but proportionally, so that
there is no effect of distances between authors and thus no effect, bad
or good, on clustering.

On point 2 below,  similarities between a pair of authors using a
co-citation count matrix is based on whether those two authors are
cocited in the same proportions among the other authors.  Correlation
seems a natural  measure for this, as it is the measure used for
estimating linear dependence.  Also it would seem that negative
correlation would be applicable:

Suppose there are two "camps" among a group of 10 authors and that the
1st and 10th authors are the leaders of the two groups respectively.
Assume
two authors have the following co-citation counts:

x = [  1     2     3     4     5     6     7     8     9    10 ]
y = [ 10     9     8     7     6     5     4     3     2     1 ]

so author x is in author 10's camp and author y is in author 1's camp.

in this case rxy = -1, and (1+rxy)/2 gives a similarity of 0.
   while cosine s =  0.5714.  as cosine similarity.

so the rxy similarity shows the authors as disimilar (logical since they

belong to different camps).
  while cosine similarity shows that they are similar.  Wouldn't this
type of effect be a problem with using the cosine similarity for
co-citation count matricies?

With correlation there is still the problem of what to do with authors
that have zero variance or cocitation count matrices that have large
numbers of zeros.

Thanks kindly,

Steven Morris




Loet Leydesdorff wrote:

> Dear Steve,
>
> Thank you for the interesting contribution. Let me make a few remarks:
>
> 1. Why did you reduce the matrices studied to binary ones? ("The
> (i,j)th element of O(p,ra) is unity if paper i cites reference author
> j one or more times, zero otherwise." at
> http://samorris.ceat.okstate.edu/web/rxy/default.htm .) Both r and the

> cosine are well defined for frequency distributions.
>
> The cosine between two vectors x(i) and y(i) is defined as:
>
>         cosine(x,y) = Sigma(i) x(i)y(i) / sqrt(Sigma(i) x(i)^2) *
> Sigma(i) y(i)^2))
>
> For those of you who read this in html:
>
> In the case of the binary matrix this formula degenerates to the
> simpler format that you used:
>
>             cos=n(i,j)/sqrt[n(i)*n(j)]
>
> SPSS calls this simpler format the "Ochiai". Salton & McGill (1983)
> provided the full formula in their "Introduction to Modern Information

> Retrieval" (Auckland, etc.: McGraw-Hill).
>
> There seems no reason to throw away part of the information that is
> available in your datasets. I would be curious to see how your curves
> would look like using the full data. I expect some effects.
>
> 2. Why would your reasoning not hold for ACA? For rough-and-ready
> purposes, one may wish to use either measure as White (2003) posits.
> However, the fundamental points remain the same, isn't it? One could
> also have a zero variance in an ACA matrix or not? The problem with
> the zeros signalled by Ahlgren et al. (2003) remains also in this
> case, isn't it?
>
> 3. In addition to the technical differences, there may be differences
> stemming from the research design that make the researcher decide to
> use one or the other measure. For example, in a factor analytic design

> one uses Pearson's r. For mapping purposes one may also consider the
> Euclidean distance, but this is expected to provide very different
> results. The theoretical purposes of the research have first to be
> specified, in my opinion.
>
> 4. My interest in this issue is driven by my interest in the evolution
> of communication systems. One can expect communication systems to
> develop in different phases like a segmentation, stratification, and
> differentiation. In a segmented communication system only mutual
> relations would count. Euclidean distances may be the right measure.
>
> In a fully differentiated one, one would expect eigenvector to be
> spanned orthogonally at the network level. Here factor analysis
> provides us with insights in the structural differentiation. In the
> in-between stage a stratified communication system is expected to be
> hierarchically organized. The grouping is then reduced to a ranking.
> For this case, the cosine seems a good mapping tool since it organized

> the "star" of the network in the center of the map (using a
> visualization tool). Pearson's r in this case has the disadvantages
> mentioned previously during this discussion.
>
> The Jaccard index seems to operate somewhere between the Euclidean
> distance and the cosine. It focusses on segments, but the
> interpretation is closer to the cosine than to the Euclidean distance
> measure. Thus, I am not sure that one should use this measure in an
> evolutionary analysis.
>
> I mentioned the forthcoming paper of Caroline Wagner and me about
> coauthorship relations (http://www.leydesdorff.net/sciencenets ) in
> which we showed how the cosine-based analysis and mapping versus the
> the Pearson-correlation based factor analysis enabled us to explore
> different aspects of the same matrix. These different aspects can be
> provided with different interpretations: the hierarchy in the network
> and the competitive relations among leading countries, respectively.
> But I still have to develop the fundamental argument more
systematically.
>
> With kind regards,
>
>
> Loet
> ----------------------------------------------------------------------
> --
> Loet Leydesdorff
> Amsterdam School of Communications Research (ASCoR)
> Kloveniersburgwal 48, 1012 CX Amsterdam
> Tel.: +31-20- 525 6598; fax: +31-20- 525 3681
> loet at leydesdorff.net <mailto:loet at leydesdorff.net>;
> http://www.leydesdorff.net/
>
> The Challenge of Scientometrics
> <http://www.upublish.com/books/leydesdorff-sci.htm> ; The
> Self-Organization of the Knowledge-Based Society
> <http://www.upublish.com/books/leydesdorff.htm>
>
> > -----Original Message-----
> > From: ASIS&T Special Interest Group on Metrics
> > [mailto:SIGMETRICS at listserv.utk.edu] On Behalf Of Steven Morris
> > Sent: Tuesday, December 23, 2003 3:26 AM
> > To: SIGMETRICS at listserv.utk.edu
> > Subject: Re: [SIGMETRICS] White HD "Author cocitation analysis and
> > ...
> >
> >
> > Dear colleagues,
> >
> > Regarding rxy vs. cosine similarity:
> >
> > When working with a collection of papers downloaded from the Web of
> > Science, where a paper to reference author citation matrix can be
> > extracted, the calculation of cosine similarity and rxy, the
> > correlation coefficient, are both straightforward. Similarity is
> > based on the number of times a pair of authors are cited together. N

> > is the number of papers in the collection, n(i), n(j) is the number
> > of citations received by ref author i and j, n(i,j) is the number of
> > papers citing both ref author i and ref author j. The
> > correlation coefficient is calculated from
> > rxy=[N*n(i,j)-n(i)*n(j)]/sqrt[(N*n(i)-n(i)^2)*(N*n(j)-n(j)^2)]
> >  while the cosine similarity is calulated using
> > s=n(i,j)/sqrt[n(i)*n(j)]. If N is large compared to the
> > product of the number of cites received by a pair of authors,
> > then rxy and cosine formula give equal results.  See
> > http://samorris.ceat.okstate.edu/web/rxy/default.htm
> > for crossplots of cosine similarity vs. rxy for reference
> > authors from several collections of papers.
> >
> > For collections of papers without domininant reference authors there

> > is very little difference between cosine and rxy.  For collections
> > with dominant reference authors that are cited by a large fraction
> > of the total number of papers, rxy can be much less than cosine
> > similarity.
> >
> > Correlation coefficient is problematic in this case because it is
> > possible for pairs of authors with large co-citation counts to have
> > zero rxy.  For example, two authors, both cited by half the papers
> > in the collection, but cocited by 1/4 of the papers will have a
> > correlation coefficient of zero but a cosine similarity of 1/2.
> > Also, the correlation coefficient is not defined for any author that

> > is cited by all papers in the collection, since that author has zero
> > variance. Recall that rxy is cov(x,y)/sqrt[var(x)*var(y)], so
> > zero variance drives the denominator to zero in the rxy
> > equation, thus undefined rxy.
> >
> > For this reason it's probably better to use cosine similarity than
> > rxy for ACA analysis based on a paper to ref author matrix.
> > Converting similarities to distances for clustering is less
> > problematic as well.
> >
> > The situation is different for ACA based on a co-citation count
> > matrix. In this case the similarity between two authors is not based

> > on how often they are cited together, but whether the two authors
> > are  co-cited in the same proportions among the other authors in the

> > collection.  In this case it would seem that rxy would be the
> > appropriate measure of similarity to use.
> >
> > S. Morris
> >
> >
> >
> > Loet Leydesdorff wrote:
> > >  > -----Original Message-----
> > >  > From: ASIS&T Special Interest Group on Metrics
> > >  > [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Eugene
> > Garfield
> > > > Sent: Monday, December 01, 2003 9:57 PM  > To:
> > > SIGMETRICS at LISTSERV.UTK.EDU  > Subject: [SIGMETRICS] White
> > HD "Author
> > > cocitation analysis  > and Pearson's r" Journal of the American
> > > Society for  > Information Science and Technology 54(13):1250-1259

> > > November 2003,  >  >
> > >  > Howard D. White : Howard.Dalby.White at drexel.edu
> > >  >
> > >  > TITLE    Author cocitation analysis and Pearson's r
> > >  >
> > >  > AUTHOR   White HD
> > >  >
> > >  > JOURNAL  JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION
> > >  >          SCIENCE AND TECHNOLOGY 54 (13): 1250-1259 NOV 2003
> > >
> > > Dear Howard and colleagues,
> > >
> > > I read this article with interest and I agree that for most
> > practical
> > > purposes Pearson's r will do a job similar to Salton's cosine.
> > > Nevertheless, the argument of Ahlgren et al. (2002) seems
> > convincing
> > > to me. Scientometric distributions are often highly skewed and the

> > > mean can easily be distorted by the zeros. The cosine
> > elegantly solves
> > > this problem.
> > >
> > > A disadvantage of the cosine (in comparison to the r) may
> > be that it
> > > does not become negative in order to indicate
> > dissimilarity. This is
> > > particularly important for the factor analysis. I have
> > thought about
> > > input-ing the cosine matrix into the factor analysis (SPSS
> > allows for
> > > importing a matrix in this analysis), but that seems a bit tricky.
> > >
> > > Caroline Wagner and I did a study on coauthorship relations
> > entitled
> > > "Mapping Global Science using International Coauthorships: A
> > > comparison of 1990 and 2000" (Intern. J. of Technology and
> > > Globalization,
> > > forthcoming) in which we used the same matrix for mapping using
> > > the cosine (and then Pajek for the visualization) and for the
> > > factor analysis using Pearson's r. The results are provided as
> > factor plots in
> > > the preprint version of the paper at
> > > http://www.leydesdorff.net/sciencenets/mapping.pdf .
> > >
> > > While the cosine maps exhibit the hierarchy by placing the central

> > > cluster in the center (including the U.S.A. and some
> > Western-European
> > > countries), the factor analysis reveals the main structural axes
> > > of the system as competitive relations between the U.S.A., U.K.,
> > > and continental Europe (Germany + Russia). The French system can
> > > be considered as a fourth axis. These eigenvectors function as
> > > competitors for collaboration with authors from other
> > (smaller or more
> > > peripheral) countries.
> > >
> > > Thus, the two measures enable us to show something differently:
> > > Salton's cosine exhibits the hierarchy and one might say that the
> > > factor analysis on the basis of Pearson's r enables us to show the

> > > heterarchy among competing axes in the system.
> > >
> > > With kind regards,
> > >
> > > Loet
> > >
> > >
> > --------------------------------------------------------------------
> > --
> > > --
> > > Loet Leydesdorff
> > > Amsterdam School of Communications Research (ASCoR)
> > > Kloveniersburgwal 48, 1012 CX Amsterdam
> > > Tel.: +31-20- 525 6598; fax: +31-20- 525 3681 loet at leydesdorff.net

> > > <mailto:loet at leydesdorff.net>; http://www.leydesdorff.net/
> > >
> > > The Challenge of Scientometrics
> > > <http://www.upublish.com/books/leydesdorff-sci.htm> ; The
> > > Self-Organization of the Knowledge-Based Society
> > > <http://www.upublish.com/books/leydesdorff.htm>
> > >
> > >
> > >
> > >  >
> > >  >
> > >  >  Document type: Article  Language: English  Cited
> > References:  > 20
> > > Times Cited: 0  >
> > >  > Abstract:
> > >  > In their article "Requirements for a cocitation similarity  >
> > > measure, with special reference to Pearson's correlation  >
> > > coefficient," Ahlgren, Jarneving, and Rousseau fault  >
> > > traditional author cocitation analysis (ACA) for using  >
> > > Pearson's r as a measure of similarity between authors  > because
> > > it fails two tests of stability of measurement. The  >
> > > instabilities arise when rs are recalculated after a first  >
> > > coherent group of authors has been augmented by a second  >
> > > coherent group with whom the first has little or no  > cocitation.

> > > However, AJ&R neither cluster nor map their data  > to demonstrate

> > > how fluctuations in rs will mislead the  > analyst, and the
> > > problem they pose is remote from both theory  > and practice in
> > > traditional ACA. By entering their own rs  > into multidimensional

> > > scaling and clustering routines, I show  > that, despite rs
> > > fluctuations, clusters based on it are much  > the same for the
> > > combined groups as for the separate groups.  > The combined groups

> > > when mapped appear as polarized clumps of  > points in
> > > two-dimensional space, confirming that differences  > between the
> > > groups have become much more important than  > differences within
> > > the groups-an accurate portrayal of what  > has happened to the
> > > data. Moreover, r produces clusters and  > maps very like those
> > > based on other coefficients that AJ&R  > mention as possible
> > > replacements, such as a cosine similarity  > measure or a chi
> > > square dissimilarity measure. Thus, r  > performs well enough for
> > > the purposes of ACA. Accordingly, I  > argue that qualitative
> > > information revealing why authors are  > cocited is more important

> > > than the cautions proposed in the  > AJ&R critique. I include
> > > notes on topics such as handling the  > diagonal in author
> > > cocitation matrices, lognormalizing data,  > and testing r for
> > > significance.  >
> > >  > KeyWords Plus:
> > >  > INTELLECTUAL STRUCTURE, SCIENCE
> > >  >
> > >  > Addresses:
> > >  > White HD, Drexel Univ, Coll Informat Sci & Technol, 3152
> > >  > Chestnut St, Philadelphia, PA 19104 USA Drexel Univ, Coll
> > >  > Informat Sci & Technol, Philadelphia, PA 19104 USA
> > >  >
> > >  > Publisher:
> > >  > JOHN WILEY & SONS INC, 111 RIVER ST, HOBOKEN, NJ 07030 USA
> > >  >
> > >  > IDS Number:
> > >  > 730VQ
> > >  >
> > >  >
> > >  >  Cited Author            Cited Work                Volume
> > >  >  Page   Year
> > >  >      ID
> > >  >
> > >  >  AHLGREN P             J AM SOC INF SCI TEC          54
> > >  > 550      2003
> > >  >  BAYER AE              J AM SOC INFORM SCI           41
> > >  > 444      1990
> > >  >  BORGATTI SP           UCINET WINDOWS SOFTW
> > >  >          2002
> > >  >  BORGATTI SP           WORKSH SUNB 20 INT S
> > >  >          2000
> > >  >  DAVISON ML            MULTIDIMENSIONAL SCA
> > >  >          1983
> > >  >  EOM SB                J AM SOC INFORM SCI           47
> > >  > 941      1996
> > >  >  EVERITT B             CLUSTER ANAL
> > >  >          1974
> > >  >  GRIFFITH BC           KEY PAPERS INFORMATI
> > >  >  R6      1980
> > >  >  HOPKINS FL            SCIENTOMETRICS                 6
> > >  >  33      1984
> > >  >  HUBERT L              BRIT J MATH STAT PSY          29
> > >  > 190      1976
> > >  >  LEYDESDORFF L         INFORMERICS 87 88
> > >  > 105      1988
> > >  >  MCCAIN KW             J AM SOC INFORM SCI           41
> > >  > 433      1990
> > >  >  MCCAIN KW             J AM SOC INFORM SCI           37
> > >  > 111      1986
> > >  >  MCCAIN KW             J AM SOC INFORM SCI           35
> > >  > 351      1984
> > >  >  MULLINS NC            THEORIES THEORY GROU
> > >  >          1973
> > >  >  WHITE HD              BIBLIOMETRICS SCHOLA
> > >  >  84      1990
> > >  >  WHITE HD              J AM SOC INF SCI TEC          54
> > >  > 423      2003
> > >  >  WHITE HD              J AM SOC INFORM SCI           49
> > >  > 327      1998
> > >  >  WHITE HD              J AM SOC INFORM SCI           41
> > >  > 430      1990
> > >  >  WHITE HD              J AM SOC INFORM SCI           32
> > >  > 163      1981
> > >  >
> > >  >
> > >  > When responding, please attach my original message
> > >  > ______________________________________________________________
> > >  > _________
> > >  > Eugene Garfield, PhD.  email: garfield at codex.cis.upenn.edu
> > >  > home page: www.eugenegarfield.org
> > >  > Tel: 215-243-2205 Fax 215-387-1266
> > >  > President, The Scientist LLC. www.the-scientist.com
> > >  > Chairman Emeritus, ISI www.isinet.com
> > >  > Past President, American Society for Information Science and
> > >  > Technology
> > >  > (ASIS&T)  www.asis.org
> > >  > ______________________________________________________________
> > >  > _________
> > >  >
> > >  >
> > >  >
> > >  > ISSN:
> > >  > 1532-2882
> > >  >
> > >
> >
> >
> > --
> > ---------------------------------------------------------------
> > Steven A. Morris                            samorri at okstate.edu
> > Electrical and Computer Engineering        office: 405-744-1662
> > 202 Engineering So.
> > Oklahoma State University
> > Stillwater, Oklahoma 74078
> > http://samorris.ceat.okstate.edu
> >
>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html>
<head>
  <meta http-equiv="Content-Type"
content="text/html;charset=ISO-8859-1">
  <title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<title></title>
Dear Dr. Leysesdorff,<br>
<br>
The message below was sent as a reply to you on the Sigmetrics mailing
list about a week ago. However, I'm not sure if the list server is
working at the moment. If you've received this before then please
forgive me for having sent it to you twice.  <br> <br> Very kind
regards,<br> <br> Steven Morris<br> <br>
------------------------------------------------------------------------
-------------<br>
<br>
<br>
<br>
<br>
<br>
Dear Loet,<br>
<br>
<font face="Microsoft Sans Serif">Thanks very much for your interesting
remarks.<br> In answer to item 1 below,  I have always converted
the paper to reference authors matrix and paper to term matrix to binary
matrices so that co-occurences can be calculated easily by multiplying
the matrices by their transpose.  I'd actually never thought of
using the cosine formula that you give below.  I did try that
calculation on non-binary paper to reference authors matrices using:<br>
<big><br> </big></font><font face="Microsoft Sans Serif"><font
color="#0000ff"  size="2"><big>      
cosine(x,y) = Sigma(i) x(i)y(i) / sqrt(Sigma(i)
x(i)^2) * Sigma(i) y(i)^2))<br>
</big><br>
<big><font color="#330033">I crossplotted the similarity values thus
obtained against "binary" cosine similarity values.  The results
can be seen at:<br> <a class="moz-txt-link-freetext"
href="http://samorris.ceat.okstate.edu/web/non_bin_cos/default.htm">http
://samorris.ceat.okstate.edu/web/non_bin_cos/default.htm</a>  
<br>
There does appear to be a lot of scatter between these two measures,
though in most of the paper collections it doesn't appear to be biased
off the 1:1 line.  I don't know what effect this difference would
have on clustering of authors. I'm not sure I agree with you that using
the binary version of the cosine similarity is "throwing away
information."  After all, references are cited multiple times in
papers but the data we have available (from ISI) only shows that a
reference showed up at least once, yet the data is still very
useful.  Granted that knowing the exact number of times an author
was cited in a paper adds more information, I'm still not sure that
using the non-binary cosine formula above is the most appropriate way to
exploit that extra information.  Alternate approaches are
available, for example, using the 'overlap' measure.  <br> <br> I
have tried using an "overlap" function to compute cocitation counts for
cosine calculations.  For a paper the overlap of ref author i and
ref author j  is defined as min[m(i), m(j)],  m(i) and m(j)
are the number of times author i and author j were cited in the paper
respectively.  This appears to be a reasonable measure of multiple
co-citation as it doesn't give a lot of weight to co-citations with
authors that tend to appear many times in papers.  So "overlap
cosine similarity" can be calculated using   s(i,,j)  =
sum[overlap(i,j)] / sqrt( n(i)*n(j)) ) , where the sum is over all
papers and n(i) and n(j) are the sum over all papers of the number of
citations to author i and j respectively.  For the datasets I have,
you can see crossplots of "overlap cosine similarity" against "binary
cosine similarity at:<br> <a class="moz-txt-link-freetext"
href="http://samorris.ceat.okstate.edu/web/overlap/default.htm">http://s
amorris.ceat.okstate.edu/web/overlap/default.htm</a>
.  These plots
show that overlap similarity tends to be a little larger than binary
similarity. This may imply the the overlap method generally tends to
increase similarity over the binary method, but proportionally, so that
there is no effect of distances between authors and thus no effect, bad
or good, on clustering.<br> </font></big></font><big>  </big><br>
On point 2 below,  similarities between a pair of authors using a
co-citation count matrix is based on whether those two authors are
cocited in the same proportions among the other authors. 
Correlation seems a natural  measure for this, as it is the measure
used for estimating linear dependence.  Also it would seem that
negative correlation would be applicable:<br> <br> Suppose there are two
"camps" among a group of 10 authors and that the 1st and 10th authors
are the leaders of the two groups respectively.  Assume <br> two
authors have the following co-citation counts:<br> <br> x = [ 
1     2    
3     4    
5     6    
7     8     9   
10 ]<br> y = [ 10     9    
8     7    
6     5    
4     3    
2     1 ]<br> <br> so author x is in author 10's
camp and author y is in author 1's camp.<br> <br> in this case rxy = -1,
and (1+rxy)/2 gives a similarity of 0.<br>    while cosine s
=  0.5714.  as cosine similarity.<br> <br> so the rxy
similarity shows the authors as disimilar (logical since they belong to
different camps).<br>   while cosine similarity shows that they are
similar.  Wouldn't this type of effect be a problem with using the
cosine similarity for co-citation count matricies? <br> <br> With
correlation there is still the problem of what to do with authors that
have zero variance or cocitation count matrices that have large numbers
of zeros.  <br> </font><br> Thanks kindly,<br> <br> Steven
Morris<br> <br> <br> <br> <br> Loet Leydesdorff wrote:<br> <blockquote
type="cite" cite="mid000f01c3c926$3c76ba60$1202a8c0 at loet">
  <meta http-equiv="Content-Type" content="text/html; ">
  <title>Message</title>
  <meta content="MSHTML 6.00.2800.1170" name="GENERATOR">
  <div><!-- Converted from text/plain format --><font face="Arial"



More information about the SIGMETRICS mailing list