Clustering of Distributions: A Case of Patent Citations
Eugene Garfield
garfield at CODEX.CIS.UPENN.EDU
Tue Aug 30 14:29:49 EDT 2011
Clustering of Distributions: A Case of Patent Citations
Author(s): Kejzar, N (Kejzar, Natasa); Korenjak-Cerne, S (Korenjak-Cerne,
Simona); Batagelj, V (Batagelj, Vladimir)
Source: JOURNAL OF CLASSIFICATION Volume: 28 Issue: 2 Pages: 156-183
DOI: 10.1007/s00357-011-9084-x Published: JUL 2011
Abstract: Often the data units are described with discrete distributions (work
described with citation distribution over time, population pyramid described as
age-sex distribution etc.).When the set of such units is very large, appropriate
clustering methods can reveal the typical patterns hidden in the data.
In this paper we present an adapted leaders method combined with a
compatible adapted agglomerative hierarchical method that are based on
relative error measure between a unit and the corresponding cluster
representative-leader. The proposed approach is illustrated on citation
distributions derived from the data set of US patents from 1980 to 1999. These
new methods were developed because clustering of units, described with
distributions, with classical k-means method reveals patterns with single high
peaks which correspond to a single year. These patterns prevail over other
distribution shapes also present in the data. Compared with centers in k-means
method, clusters' representatives obtained with the proposed new methods
better detect typical distribution shapes of units. The obtained main cluster
types for different sets of units show three main patterns: patents with early
or late peak of importance to the community, and patents where the
importance is slowly increasing throughout the time period.
Language: English
Document Type: Article
Author Keywords: Clustering; Distribution; Leaders method; k-means method;
Agglomerative hierarchical clustering method; Temporal citation distribution;
Citation network; Relative error measure; Patents
Addresses: [Kejzar, N] Univ Ljubljana, Fac Med, Inst Biostat & Med Informat,
IBMI, Ljubljana 1000, Slovenia
[Korenjak-Cerne, S] Univ Ljubljana, Fac Econ, Dept Stat, Ljubljana 1000,
Slovenia
[Batagelj, V] Univ Ljubljana, Fac Math & Phys, Dept Math, Ljubljana 1000,
Slovenia
Reprint Address: Kejzar, N (reprint author), Univ Ljubljana, Fac Med, Inst
Biostat & Med Informat, IBMI, Vrazov Trg 2, Ljubljana 1000, Slovenia
E-mail Address: natasa.kejzar at mf.uni-lj.si, simona.cerne at ef.uni-lj.si,
vladimir.batagelj at fmf.uni-lj.si
ISSN: 0176-4268
URL: http://www.springerlink.com/content/3k32475jk1003801/
More information about the SIGMETRICS
mailing list