Aphinyanaphongs Y, Statnikov A, Aliferis CF "A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents " Journal of the American Medical Informatics Association 13(4):446-455 July-August 2006.
Eugene Garfield
garfield at CODEX.CIS.UPENN.EDU
Wed Sep 27 14:34:30 EDT 2006
E-mail Addresses:C.F. Aliferis : constantin.aliferis at vanderbilt.edu
Title: A comparison of citation metrics to machine learning filters for the
identification of high quality MEDLINE documents
Author(s): Aphinyanaphongs Y, Statnikov A, Aliferis CF
Source: JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION 13 (4): 446-
455 JUL-AUG 2006
Document Type: Article
Language: English
Cited References: 37 Times Cited: 0
Abstract:
objective: The present study explores the discriminatory performance of
existing and novel gold-standard-specific machine learning (GSS-ML) focused
filter models (i.e., models built specifically for a retrieval task and a
gold standard against which they ate evaluated) and compares their
performance to citation count and impact factors, and non-specific machine
learning (NS-ML) models (i.e., models built for a different task and/or
different gold standard).
Design: Three gold standard corpora were constructed using the SSOAB
bibliography, the ACPJ-cited treatment articles, and the ACPJ-cited
etiology articles. Citation counts and impact factors were obtained for
each article. Support vector machine models were used to classify the
articles using combinations of content, impact factors, and citation counts
as predictors.
Measurements: Discriminatory performance was estimated using the area under
the receiver operating characteristic curve and n-fold cross-validation.
Results: For all three gold standards and tasks, GSS-ML filters
outperformed citation count, impact factors, and NS-ML filters.
Combinations of content with impact factor or citation count produced no or
negligible improvements to the GSS machine learning filters.
Conclusions: These experiments provide evidence that when building
information retrieval filters focused on a retrieval task and corresponding
gold standard, the filter models have to be built specifically for this
task and gold standard. Under those conditions, machine learning filters
outperform standard citation metrics. Furthermore, citation counts and
impact factors add marginal value to discriminatory performance. Previous
research that claimed better performance of citation metrics than machine
learning in one of the corpora examined here is attributed to using machine
learning filters built for a different gold standard and task.
KeyWords Plus: DETECTING CLINICALLY SOUND; OPTIMAL SEARCH STRATEGIES; TEXT
CATEGORIZATION; RETRIEVAL
Addresses: Aliferis CF (reprint author), Vanderbilt Univ, Dept Biomed
Informat, Eskind Biomed Lib, Discovery Syst Lab, Room 412,2209 Garland Ave,
Nashville, TN 37232 USA
Vanderbilt Univ, Dept Biomed Informat, Eskind Biomed Lib, Discovery Syst
Lab, Nashville, TN 37232 USA
E-mail Addresses: constantin.aliferis at vanderbilt.edu
Publisher: ELSEVIER SCIENCE INC, 360 PARK AVE SOUTH, NEW YORK, NY 10010-
1710 USA
Subject Category: COMPUTER SCIENCE, INFORMATION SYSTEMS; COMPUTER SCIENCE,
INTERDISCIPLINARY APPLICATIONS; INFORMATION SCIENCE & LIBRARY SCIENCE;
MEDICAL INFORMATICS
IDS Number: 064EU
ISSN: 1067-5027
CITED REFERENCES :
ACP J 131 : A15 1999
LIBSVM LIB SUPPORT V : 2005
PUBMED : 2005
ALIFERIS C
P AMIA S WASH DC : 2003
ALIFERIS CF
METMBS : 371 1908
APHINYANAPHONGS Y
Text categorization models for high-quality article retrieval in internal
medicine
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION 12 : 207 2005
APHINYANAPHONGS Y
MEDINFO : 2004
BAEZAYATES R
MODERN INFORMATION R : 1999
BERNSTAM EV
J AM MED INFORM ASS : 2005
DELONG ER
COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING
CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH
BIOMETRICS 44 : 837 1988
DUDA S
AMIA S WASH D C : 2005
DUDOIT S
126 UC BERK DIV BIOS : 2003
DUMAIS S
P ACM CIKM98 NOV : 1998
FAWCETT T
HPL20034 : 2003
GARFIELD E
CAN CITATION INDEXIN : 1965
GARFIELD E
INT J CLIN HLTH PSYC 3 : 363 2003
GARFIELD E
SCI PUBL POLICY 19 : 321 1992
GUYON I
Gene selection for cancer classification using support vector machines
MACHINE LEARNING 46 : 389 2002
HAND DJ
A simple generalisation of the area under the ROC curve for multiple class
classification problems
MACHINE LEARNING 45 : 171 2001
HAYNES RB
DEVELOPING OPTIMAL SEARCH STRATEGIES FOR DETECTING CLINICALLY SOUND STUDIES
IN MEDLINE
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION 1 : 447 1994
HSU CW
PRACTICAL GUIDE SUPP : 2005
JENKINS M
HLTH INFO LIB J 21 : 148 2004
JOACHIMS T
LEARNING CLASSIFY TE : 2002
KLEINBERG
P ACM SIAM S DISCR A : 1997
LEOPOLD E
Text categorization with support vector machines. How to represent texts in
input space ?
MACHINE LEARNING 46 : 423 2002
PAGANO M
PRINCIPLES BIOSTATIS : 2000
PAGE L
PAGERANK CITATION RA : 1998
PORTER MF
AN ALGORITHM FOR SUFFIX STRIPPING
PROGRAM-AUTOMATED LIBRARY AND INFORMATION SYSTEMS 14 : 130 1980
PROVOST F
ICML 98 15 INT C MAC : 1998
SALTON G
TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL
INFORMATION PROCESSING & MANAGEMENT 24 : 513 1988
SCHEFFER T
ERROR ESTIMATION MOD : 1999
SUN A
ICDM : 2001
TSAMARDINOS I
AI STAT : 2003
VAPNIK V
STAT LEARNING THEORY : 1998
WEISS S
COMPUTER SYSTEMS LEA : 1991
WILCZYNSKI NL
Optimal search strategies for detecting clinically sound prognostic studies
in EMBASE: An analytic survey
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION 12 : 481 2005
YANG Y
22 ANN ACM C RES DEV : 1999
More information about the SIGMETRICS
mailing list