use of statistical sampling to prepare reference sets for calculation of percentile rank of times cited

Wed Feb 18 11:18:17 EST 2015

This question pertains to the preparation of reference sets to serve as baseline data for the calculation of percentile rank of times cited, adjusted for field, article type, and year of publication. Conventionally, all articles in a reference set are used to calculate percentile ranks; however, given the time required for database lookups when reference sets number in the tens of thousands, we have been considering a statistical sampling approach. The method would be to use a small, random selection of articles from each reference set to impute percentile ranks from the complete set. Although this approach does not allow for accurate calculation of precise (fractions of) percentile ranks at the high end of the distribution, it appears that approximately 200 articles are sufficient to approximate whole-number percentile ranks with a small margin of error. I would like to solicit feedback about whether this method has been tried before, and whether it would be viewed as acceptable and statistically sound.

Michael Bales
Digital Curation Fellow
Samuel J. Wood Library & C.V. Starr Biomedical Information Center
Weill Cornell Medical College
1300 York Avenue, Room D-120C
New York, NY 10065
(P) 646-962-2552
(C) 646-331-0016
(F) 212-746-8364
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20150218/8a114cab/attachment.html>