Clustering of Distributions: A Case of Patent Citations

Eugene Garfield garfield at CODEX.CIS.UPENN.EDU
Tue Aug 30 14:29:49 EDT 2011

Clustering of Distributions: A Case of Patent Citations

Author(s): Kejzar, N (Kejzar, Natasa); Korenjak-Cerne, S (Korenjak-Cerne, 
Simona); Batagelj, V (Batagelj, Vladimir)
Source: JOURNAL OF CLASSIFICATION  Volume: 28  Issue: 2  Pages: 156-183  
DOI: 10.1007/s00357-011-9084-x  Published: JUL 2011  

Abstract: Often the data units are described with discrete distributions (work 
described with citation distribution over time, population pyramid described as 
age-sex distribution etc.).When the set of such units is very large, appropriate 
clustering methods can reveal the typical patterns hidden in the data. 
In this paper we present an adapted leaders method combined with a 
compatible adapted agglomerative hierarchical method that are based on 
relative error measure between a unit and the corresponding cluster 
representative-leader. The proposed approach is illustrated on citation 
distributions derived from the data set of US patents from 1980 to 1999. These 
new methods were developed because clustering of units, described with 
distributions, with classical k-means method reveals patterns with single high 
peaks which correspond to a single year. These patterns prevail over other 
distribution shapes also present in the data. Compared with centers in k-means 
method, clusters' representatives obtained with the proposed new methods 
better detect typical distribution shapes of units. The obtained main cluster 
types for different sets of units show three main patterns: patents with early 
or late peak of importance to the community, and patents where the 
importance is slowly increasing throughout the time period.

Language: English
Document Type: Article
Author Keywords: Clustering; Distribution; Leaders method; k-means method; 
Agglomerative hierarchical clustering method; Temporal citation distribution; 
Citation network; Relative error measure; Patents

Addresses: [Kejzar, N] Univ Ljubljana, Fac Med, Inst Biostat & Med Informat, 
IBMI, Ljubljana 1000, Slovenia
[Korenjak-Cerne, S] Univ Ljubljana, Fac Econ, Dept Stat, Ljubljana 1000, 
[Batagelj, V] Univ Ljubljana, Fac Math & Phys, Dept Math, Ljubljana 1000, 
Reprint Address: Kejzar, N (reprint author), Univ Ljubljana, Fac Med, Inst 
Biostat & Med Informat, IBMI, Vrazov Trg 2, Ljubljana 1000, Slovenia

E-mail Address: natasa.kejzar at, simona.cerne at, 
vladimir.batagelj at
ISSN: 0176-4268

More information about the SIGMETRICS mailing list