SV: [SIGMETRICS] On the Normalization and Visualization of Co-citation Data
Jesper Wiborg Schneider
JWS at DB.DK
Mon Jan 22 03:35:35 EST 2007
Dear Loet and colleagues;
In connection with the ongoing debate on proximity measures in co-citation studies, you may find the following forthcoming two-part article of interest:
* Schneider, J. & Borlund, P. [PDF]
<http://www.db.dk/binaries/JASIST_%20part1_preprint.pdf> Matrix comparison, Part 1: Motivation and important issues for measuring the resemblance between proximity measures or ordination results.
Accepted for publication in the Journal of the American Society for Information Science and Technology.
* Abstract: The present two-part article introduces matrix comparison as a formal means for evaluation purposes in informetric studies such as co-citation analysis. In this, the first part, the motivation behind introducing matrix comparison to informetric studies, as well as two important issues influencing such comparisons, are introduced and discussed. The motivation is spurred by the recent debate on choice of proximity measures and their potential influence upon clustering and ordination results. The two important issues discussed in the present first part are matrix generation and the composition of proximity measures. The present part of the article demonstrates that the approach to matrix generation for the same data set, that is how data is represented and transformed in a matrix, evidently determines the 'behaviour' of proximity measures. Two different matrix generation approaches, will therefore in all probability, lead to different proximity rankings of objects, which further lead to different ordination and clustering results for the same set of objects. Further, this part of the article also demonstrates that a resemblance in the composition of formulas indicates whether two proximity measures may produce similar ordination and clustering results. However, as shown in the case of the angular correlation and cosine measures, a small deviation in otherwise similar formulas, can lead to different rankings depending on the contour of the data matrix transformed. Eventually, the 'behaviour' of proximity measures, that is whether they produce similar rankings of objects, is more or less data-specific. Consequently, we recommend the use of empirical matrix comparison techniques for individual studies in order to investigate the degree of resemblance between proximity measures or their ordination results. Part two of the article introduces and demonstrates two related statistical matrix comparison techniques the Mantel test and Procrustes analysis, respectively. These techniques can compare and evaluate the degree of monotonicity between different proximity measures or their ordination results. As such, the Mantel test and Procrustes analysis can be used as statistical validation tools in informetric studies and thus help choosing suitable proximity measures.
* Schneider, J. & Borlund, P. [PDF <http://www.db.dk/binaries/JASIST_%20part2_preprint.pdf> ]
Matrix comparison, Part 2: Measuring the resemblance between proximity measures or ordination results by use of the Mantel and Procrustes statistics.
Accepted for publication in the Journal of the American Society for Information Science and Technology.
* Abstract: The present two-part article introduces matrix comparison as a formal means for evaluation purposes in informetric studies such as co-citation analysis. In the first part, the motivation behind introducing matrix comparison to informetric studies, as well as two important issues influencing such comparisons, matrix generation and the composition of proximity measures, are introduced and discussed. .In this second part of the article, we introduce and thoroughly demonstrate two related matrix comparison techniques the Mantel test and Procrustes analysis, respectively. These techniques can compare and evaluate the degree of monotonicity between different proximity measures or their ordination results. In common to these techniques is the application of permutation procedures in order to test hypotheses about matrix resemblances. The choice of technique is related to the validation at hand. In the case of the Mantel test, the degree of resemblance between two measures forecast their potentially different affect upon ordination and clustering results. In principle, two proximity measures with a very strong resemblance most likely produce identical results, thus, choice of measure between the two becomes less important. Alternatively, or as a supplement, Procrustes analysis compares the actual ordination results without investigating the underlying proximity measures, by matching two configurations of the same objects in a multidimensional space. An advantage of the Procrustes analysis though, is the graphical solution provided by the superimposition plot and the resulting decomposition of variance components. Accordingly, the Procrustes analysis provides, not only a measure of general fit between configurations, but also values for individual objects enabling more elaborate validations. As such, the Mantel test and Procrustes analysis can be used as statistical validation tools in informetric studies and thus help choosing suitable proximity measures.
Kind regards - Jesper
**********************************************
Jesper Wiborg Schneider, PhD, Assistant Professor
Department of Information Studies Royal School of Library & Information Science
Sohngårdsholmsvej 2, DK-9000 Aalborg, DENMARK Tel. +45 98773041, Fax. +45 98151042
E-mail: jws at db.dk <mailto:jws at db.dk>
Homepage:http://www.db.dk/jws
**********************************************
________________________________
Fra: ASIS&T Special Interest Group on Metrics [mailto:SIGMETRICS at LISTSERV.UTK.EDU] På vegne af Loet Leydesdorff
Sendt: 21. januar 2007 19:46
Til: SIGMETRICS at LISTSERV.UTK.EDU
Emne: [SIGMETRICS] On the Normalization and Visualization of Co-citation Data
On the Normalization and Visualization of Author Co-Citation Data <http://www.leydesdorff.net/aca07/index.htm>
Click here for PDF <http://www.leydesdorff.net/aca07/aca07.pdf>
The debate about which similarity measure one should use for the normalization in the case of Author Co-citation Analysis (ACA) is further complicated when one distinguishes between the symmetrical co-citation-or, more generally, co-occurrence-matrix and the underlying asymmetrical citation-occurrence-matrix. In the Web environment, the approach of retrieving original citation data and then using Salton's cosine or the Pearson correlation to construct a similarity matrix is often not feasible. In that case, one should use the Jaccard index, but preferentially after adding the number of total citations (occurrences) on the main diagonal. Unlike Salton's cosine and the Pearson correlation, the Jaccard index abstracts from the distribution and focuses only on the intersection and the sum of the two sets. Since the distributions in the co-occurrence matrix may partially be based on spurious correlations, this property of the Jaccard index can be considered as an advantage in this case. The argument is illustrated with empirical data.
________________________________
Loet Leydesdorff
Amsterdam School of Communications Research (ASCoR)
Kloveniersburgwal 48, 1012 CX Amsterdam
Tel.: +31-20- 525 6598; fax: +31-20- 525 3681
loet at leydesdorff.net ; http://www.leydesdorff.net/
Now available: The Knowledge-Based Economy: Modeled, Measured, Simulated <http://www.universal-publishers.com/book.php?method=ISBN&book=1581129378> . 385 pp.; US$ 18.95
The Self-Organization of the Knowledge-Based Society <http://www.universal-publishers.com/book.php?method=ISBN&book=1581126956> ; The Challenge of Scientometrics <http://www.universal-publishers.com/book.php?method=ISBN&book=1581126816>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20070122/6a9eb003/attachment.html>
More information about the SIGMETRICS
mailing list