Maitra R. "Clustering massive datasets with applications in software metrics and tomography" TECHNOMETRICS 43 (3): 336-346 AUG 2001

Eugene Garfield garfield at CODEX.CIS.UPENN.EDU
Wed Mar 20 14:00:39 EST 2002


Ranjan Maitra : E-mail: maitra at math.umbc.edu.


TITLE   Clustering massive datasets with applications in software metrics
        and tomography
AUTHOR  Maitra R
JOURNAL TECHNOMETRICS   43 (3): 336-346 AUG 2001

Document type: Article
Language: English
Cited References: 43
Times Cited: 0


Abstract:
Clustering datasets is not an easy problem in general, and the difficulty is
compounded for massive datasets. This article develops, under Gaussian
assumptions, a multistage algorithm that clusters an initial sample. filters
out observations that can be reasonably classified by these clusters, and
iterates the preceding procedure on the remainder. A final step uses the
estimated class probabilities and dispersions to classify each observation
in the dataset. Results on test experiments indicate good performance.
Application to datasets from software metrics and positron emission
tomography required no more than five stages each, suggesting that the
procedure is practical to implement.

Author Keywords:
Gaussian distribution, likelihood ratio test, multistage procedure, sample

KeyWords Plus:
CRITERIA, MODELS

Addresses:
Maitra R, Univ Maryland Baltimore Cty, Dept Math & Stat, Baltimore, MD 21250
USA
Univ Maryland Baltimore Cty, Dept Math & Stat, Baltimore, MD 21250 USA

Publisher:
AMER STATISTICAL ASSOC, ALEXANDRIA

IDS Number:
491TR

ISSN:
0040-1706

Cited Author            Cited Work                Volume      Page      Year

 ARABIE P              J MATH PSYCHOL                10       148      1973
 BANFIELD JD           BIOMETRICS                    49       803      1993
 BECKETT J             P SOC STAT SECT AM S                   983      1977
 BROSSIER G            J CLASSIF                      7       197      1990
 CAN F                 J AM SOC INFORM SCI           35       268      1984
 CELEUX G              PATTERN RECOGN                28       781      1995
 CHENG L               MAR FISH REV                  36         1      1974
 CORMACK RM            J ROYAL STATISTICAL          134       321      1971
 CRUYNOOGHE M          COMPSTAT 1978                          239      1978
 EDDY WF               COMPUT STAT DATA AN           23        29      1996
 EVERITT B             CLUSTER ANAL                                    1974
 EVERITT BS            BIOMETRICS                    35       169      1979
 EVERITT BS            STATISTICS PROBABILI           6       305      1988
 FAYYAD U              P WORKSH MASS DAT NA                            1996
 FOWLKES EB            J CLASSIF                      5       205      1988
 FRIEDMAN HP           J AM STAT ASSOC               62      1159      1967
 GANESALINGAM S        STAT NEERL                    33        81      1979
 GNANADESIKAN R        METHODS STAT DATA AN                            1977
 GOOD IJ               J STAT COMPUTING SIM           9       241      1979
 GORDON AD             COMPSTAT 1986                          149      1986
 HARTIGAN J            CLUSTERING ALGORITHM                            1975
 HARTIGAN JA           J CLASSIF                      2        63      1985
 MAITRA R              J AM STAT ASSOC               93      1340      1998
 MAITRA R              J COMPUT GRAPH STAT            6         1      1997
 MARDIA KV             MULTIVARIATE ANAL                               1979
 MARRIOTT FH           BIOMETRICS                    27       501      1971
 MAZZIOTTA JC          POSITRON EMISSION TO                            1986
 MCQUITTY LL           EDUC PSYCHOL MEAS             35       239      1975
 MIRKIN BG             AUTOMAT REM CONTR             31       786      1970
 MOJENA R              COMPSTAT 1980                          454      1980
 MURTAGH F             MULTIDIMENSIONAL CLU                            1985
 MYERS GJ              COMPOSITE STRUCTURED                            1978
 OSULLIVAN F           STAT METHODS MED RES           3        87      1994
 PHELPS ME             ANN NEUROL                     6       371      1979
 POLLARD D             ANN STAT                       9       135      1981
 RAMEY DB              ENCY STAT SCI                  6       318      1985
 RAND WM               J AM STAT ASSOC               66       846      1971
 RIPLEY BD             ANAL MODELLING DATA                     85      1991
 SCOTT AJ              BIOMETRICS                    27       387      1971
 SYMONS MJ             BIOMETRICS                    37        35      1981
 VANRYZIN J            CLASSIFICATION CLUST                            1977
 YOURDON E             TECHNIQUES PROGRAM S                            1975
 ZUPAN J               CLUSTERIHNG LARGE DA                            1982



More information about the SIGMETRICS mailing list