No subject

Loet Leydesdorff loet at LEYDESDORFF.NET
Wed Nov 5 15:05:48 EST 2008

Dear Ludo, 

Yes, I know your paper; as you know mine of 2005 entitled "Similarity
measures, author cocitation analysis, and information theory" [JASIST,
56(7), 769 - 772] in which I argued for using the type of measures you
advocate. They are very elegant, but not included in major software packages
and therefore computationally not user-friendly. 

In practice the cosine is a convenient similarity measure and it has
advantages above the Pearson in the case of many zeros in the occurrence
matrix (Ahlgren et al., 2003). Because the cosine runs from zero to one, it
becomes then an urgent question to know which cosine value corresponds with
a positive correlation because r = 0 is a clear and easily understood
threshold value. Much depends of course on the research question; one may be
particularly interested in negative correlations, for example. In many
cases, however, one is interested in distinguishing between positive and
negative correlations.

More generally, there is a large number of similarity measures. I agree that
information-theoretical ones are very elegant because they allow for
asymmetries in the similarity, for combining the similarity measure and the
cluster analysis, and for an extension with the time dimension. My 1995-book
("The Challenge of Scientometrics"; particularly Chapter 9) is on using
information theory for addressing research questions in science and
technology studies. 

Best wishes, 



Loet Leydesdorff 
Amsterdam School of Communications Research (ASCoR), 
Kloveniersburgwal 48, 1012 CX Amsterdam. 
Tel.: +31-20- 525 6598; fax: +31-20- 525 3681 
loet at ; 


> -----Original Message-----
> From: ASIS&T Special Interest Group on Metrics 
> [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Ludo Waltman
> Sent: Wednesday, November 05, 2008 5:29 PM
> Subject: [SIGMETRICS]
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> Dear Leo and Loet,
> Your paper contains some nice theoretical results. However, 
> we have serious doubts about the way in which you use these 
> results for visualization purposes. The general idea of your 
> approach seems to be that a visualization of authors can best 
> be made using the cosine as a similarity measure, but 
> preferably in such a way that in the visualization authors 
> are not connected if their Pearson correlation is below 0. 
> But what is wrong with a Pearson correlation below 0? 
> Consider two authors, A and B. Author A has, respectively, 
> 101, 102, 103, and 104 cocitations with authors C, D, E, and 
> F. Author B has, respectively, 104, 103, 102, and 101 
> cocitations with authors C, D, E, and F. Hence, the Pearson 
> correlation for authors A and B equals -1. Consequently, 
> according to your reasoning, there should be no connection 
> between A and B. But why not? Authors A and B are very 
> similar, since they have almost the same cocitation profile 
> with authors C, D, E, and F. Therefore, there should de!
>  finitely be a connection between authors A and B.
> In our opinion, it makes no sense to ask the question whether 
> the Pearson correlation for two authors is above or below 0. 
> This question is completely irrelevant for visualization 
> purposes. We discuss this in detail in a paper recently 
> published in JASIST (59(10):1653-1661, 2008, 
> By the way, this paper 
> also contains an empirical example showing that sometimes the 
> choice between the cosine and the Pearson correlation results 
> in significantly different visualizations.
> Best regards,
> Ludo Waltman and Nees Jan van Eck
> ========================================================
> Ludo Waltman MSc
> PhD student
> Econometric Institute
> Erasmus School of Economics
> Erasmus University Rotterdam
> P.O. Box 1738
> 3000 DR Rotterdam
> The Netherlands
> Room H9-13
> Tel:      (+31) 10 4088938
> Fax:      (+31) 10 4089162
> E-mail:   lwaltman at
> Homepage:
> ========================================================
> ________________________________
> 	From: ASIS&T Special Interest Group on Metrics 
> [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Loet Leydesdorff
> 	Sent: Wednesday 5 November 2008 11:15
> 	Subject: [SIGMETRICS]
> 	Adminstrative info for SIGMETRICS (for example 
> unsubscribe): 
> 	The relation between Pearson's correlation coefficient 
> r and Salton's cosine measure 
> <> 
> 	Journal of the American Society for Information Science 
> & Technology (forthcoming)
> 	The relation between Pearson's correlation coefficient 
> and Salton's cosine measure is revealed based on the 
> different possible values of the division of the -norm and 
> the -norm of a vector. These different values yield a sheaf 
> of increasingly straight lines which form together a cloud of 
> points, being the investigated relation. The theoretical 
> results are tested against the author co-citation relations 
> among 24 informetricians for whom two matrices can be 
> constructed, based on co-citations: the asymmetric occurrence 
> matrix and the symmetric co-citation matrix. Both examples 
> completely confirm the theoretical results. The results 
> enable us to specify an algorithm which provides a threshold 
> value for the cosine above which none of the corresponding 
> Pearson correlations would be negative. Using this threshold 
> value can be expected to optimize the visualization of the 
> vector space.
> 	<click here for pdf> 
> <> 
> 	Leo Egghe and Loet Leydesdorff

More information about the SIGMETRICS mailing list