Ruths, D; Al Zamal, F. 2010. A Method for the Automated, Reliable Retrieval of Publication-Citation Records. PLOS ONE 5 (8): art. no.-e12133

Eugene Garfield garfield at CODEX.CIS.UPENN.EDU
Tue Sep 21 13:23:54 EDT 2010

Ruths, D; Al Zamal, F. 2010. A Method for the Automated, Reliable Retrieval of 
Publication-Citation Records. PLOS ONE 5 (8): art. no.-e12133..

Author Full Name(s): Ruths, Derek; Al Zamal, Faiyaz
Language: English
Document Type: Article
KeyWords Plus: INDEX

Abstract: Background: Publication records and citation indices often are used 
to evaluate academic performance. For this reason, obtaining or computing 
them accurately is important. This can be difficult, largely due to a lack of 
complete knowledge of an individual's publication list and/or lack of time 
available to manually obtain or construct the publication-citation record. While 
online publication search engines have somewhat addressed these problems, 
using raw search results can yield inaccurate estimates of publication-citation 
records and citation indices.
Methodology: In this paper, we present a new, automated method that 
produces estimates of an individual's publication-citation record from an 
individual's name and a set of domain-specific vocabulary that may occur in the 
individual's publication titles. Because this vocabulary can be harvested directly 
from a research web page or online (partial) publication list, our method delivers 
an easy way to obtain estimates of a publication-citation record and the 
relevant citation indices. Our method works by applying a series of stringent 
name and content filters to the raw publication search results returned by an 
online publication search engine. In this paper, our method is run using Google 
Scholar, but the underlying filters can be easily applied to any existing 
publication search engine. When compared against a manually constructed data 
set of individuals and their publication-citation records, our method provides 
significant improvements over raw search results. The estimated publication-
citation records returned by our method have an average sensitivity of 98% 
and specificity of 72% (in contrast to raw search result specificity of less than 
10%). When citation indices are computed using these records, the estimated 
indices are within 10% of the true value, compared to raw search results which 
have overestimates of, on average, 75%.
Conclusions: These results confirm that our method provides significantly 
improved estimates over raw search results, and these can either be used 
directly for large-scale (departmental or university) analysis or further refined 
manually to quickly give accurate publication-citation records.

Addresses: [Ruths, Derek; Al Zamal, Faiyaz] McGill Univ, Dept Comp Sci, 
Montreal, PQ, Canada

Reprint Address: Ruths, D, McGill Univ, Dept Comp Sci, Montreal, PQ, Canada.
E-mail Address: druths at
ISSN: 1932-6203
DOI: 10.1371/journal.pone.0012133

More information about the SIGMETRICS mailing list