Elmer V. Bernstam MD, MSE1*, Jorge R. et al, "Using citation data to improve retrieval from MEDLINE " published in JAMIA October 12, 2005
Eugene Garfield
garfield at CODEX.CIS.UPENN.EDU
Wed Dec 14 16:41:38 EST 2005
TITLE : Using citation data to improve retrieval from MEDLINE
AUTHORS: Elmer V. Bernstam MD, MSE1*, Jorge R. Herskovic MD, MS1, Yindalon
Aphinyanaphongs MS2, Constantin F. Aliferis MD, PhD2, Madurai G. Sriram1,
and William R. Hersh MD3
Affiliation of the authors: 1 School of Health Information Sciences, The
University of Texas Health Science Center at Houston, Houston, TX; 2
Department of Biomedical Informatics, Vanderbilt University, Nashville, TN;
3 Department of Medical Informatics and Clinical Epidemiology, Oregon
Health & Science University, Portland, OR
* To whom correspondence should be addressed.
Objective To determine whether algorithms developed for the World Wide Web
can be applied to the biomedical literature in order to identify articles
that are important as well as relevant.
Design and Measurements A direct comparison of eight algorithms: simple
PubMed queries, clinical queries (sensitive and specific versions), vector
cosine comparison, citation count, PageRank and machine learning based on
polynomial support vector machines. The objective was to prioritize
important articles, defined as being included in a pre-existing
bibliography of important literature in surgical oncology.
Results Citation-based algorithms were more effective than non citation-
based algorithms at identifying important articles. The most effective
strategies were simple citation count and PageRank, which on average
identified over six important articles in the first 100 results compared to
0.85 for the best non-citation based algorithm (p < 0.001). We saw similar
differences between citation-based and non citation-based algorithms at 10,
20, 50, 200, 500 and 1000 results (p < 0.001). Citation lag affects
performance of PageRank more than simple citation count. However, in spite
of citation lag, citation-based algorithms remain more effective than non-
citation based algorithms.
Conclusion Algorithms which have proven successful on the World Wide Web
can be applied to biomedical information retrieval. Citation-based
algorithms can help identify important articles within large sets of
relevant results. Further studies are needed to determine whether citation-
based algorithms can effectively meet actual user information needs.
More information about the SIGMETRICS
mailing list