Maitra R. "Clustering massive datasets with applications in software metrics and tomography" TECHNOMETRICS 43 (3): 336-346 AUG 2001
Eugene Garfield
garfield at CODEX.CIS.UPENN.EDU
Wed Mar 20 14:00:39 EST 2002
Ranjan Maitra : E-mail: maitra at math.umbc.edu.
TITLE Clustering massive datasets with applications in software metrics
and tomography
AUTHOR Maitra R
JOURNAL TECHNOMETRICS 43 (3): 336-346 AUG 2001
Document type: Article
Language: English
Cited References: 43
Times Cited: 0
Abstract:
Clustering datasets is not an easy problem in general, and the difficulty is
compounded for massive datasets. This article develops, under Gaussian
assumptions, a multistage algorithm that clusters an initial sample. filters
out observations that can be reasonably classified by these clusters, and
iterates the preceding procedure on the remainder. A final step uses the
estimated class probabilities and dispersions to classify each observation
in the dataset. Results on test experiments indicate good performance.
Application to datasets from software metrics and positron emission
tomography required no more than five stages each, suggesting that the
procedure is practical to implement.
Author Keywords:
Gaussian distribution, likelihood ratio test, multistage procedure, sample
KeyWords Plus:
CRITERIA, MODELS
Addresses:
Maitra R, Univ Maryland Baltimore Cty, Dept Math & Stat, Baltimore, MD 21250
USA
Univ Maryland Baltimore Cty, Dept Math & Stat, Baltimore, MD 21250 USA
Publisher:
AMER STATISTICAL ASSOC, ALEXANDRIA
IDS Number:
491TR
ISSN:
0040-1706
Cited Author Cited Work Volume Page Year
ARABIE P J MATH PSYCHOL 10 148 1973
BANFIELD JD BIOMETRICS 49 803 1993
BECKETT J P SOC STAT SECT AM S 983 1977
BROSSIER G J CLASSIF 7 197 1990
CAN F J AM SOC INFORM SCI 35 268 1984
CELEUX G PATTERN RECOGN 28 781 1995
CHENG L MAR FISH REV 36 1 1974
CORMACK RM J ROYAL STATISTICAL 134 321 1971
CRUYNOOGHE M COMPSTAT 1978 239 1978
EDDY WF COMPUT STAT DATA AN 23 29 1996
EVERITT B CLUSTER ANAL 1974
EVERITT BS BIOMETRICS 35 169 1979
EVERITT BS STATISTICS PROBABILI 6 305 1988
FAYYAD U P WORKSH MASS DAT NA 1996
FOWLKES EB J CLASSIF 5 205 1988
FRIEDMAN HP J AM STAT ASSOC 62 1159 1967
GANESALINGAM S STAT NEERL 33 81 1979
GNANADESIKAN R METHODS STAT DATA AN 1977
GOOD IJ J STAT COMPUTING SIM 9 241 1979
GORDON AD COMPSTAT 1986 149 1986
HARTIGAN J CLUSTERING ALGORITHM 1975
HARTIGAN JA J CLASSIF 2 63 1985
MAITRA R J AM STAT ASSOC 93 1340 1998
MAITRA R J COMPUT GRAPH STAT 6 1 1997
MARDIA KV MULTIVARIATE ANAL 1979
MARRIOTT FH BIOMETRICS 27 501 1971
MAZZIOTTA JC POSITRON EMISSION TO 1986
MCQUITTY LL EDUC PSYCHOL MEAS 35 239 1975
MIRKIN BG AUTOMAT REM CONTR 31 786 1970
MOJENA R COMPSTAT 1980 454 1980
MURTAGH F MULTIDIMENSIONAL CLU 1985
MYERS GJ COMPOSITE STRUCTURED 1978
OSULLIVAN F STAT METHODS MED RES 3 87 1994
PHELPS ME ANN NEUROL 6 371 1979
POLLARD D ANN STAT 9 135 1981
RAMEY DB ENCY STAT SCI 6 318 1985
RAND WM J AM STAT ASSOC 66 846 1971
RIPLEY BD ANAL MODELLING DATA 85 1991
SCOTT AJ BIOMETRICS 27 387 1971
SYMONS MJ BIOMETRICS 37 35 1981
VANRYZIN J CLASSIFICATION CLUST 1977
YOURDON E TECHNIQUES PROGRAM S 1975
ZUPAN J CLUSTERIHNG LARGE DA 1982
More information about the SIGMETRICS
mailing list