Web robot accuracy analysis: suggestions invited
David Goodman
dgoodman at LIU.EDU
Sat Dec 17 17:02:33 EST 2005
The posting from SH should be set in context of our joint
relating posting to another list, which I copy below for convenience:
*********************************************
"We have just posted the results
from our cooperative project:
Antelman, K., Bakkalbasi, N., Goodman, D., Hajjem, C. and Harnad,
S. (2005) Evaluation of Algorithm Performance on Identifying
OA. Technical Report, North Carolina State University Libraries, North
Carolina State University. http://eprints.ecs.soton.ac.uk/11689/
ABSTRACT: This is a second signal-detection analysis of the accuracy
of a robot in detecting open access (OA) articles (by checking by
hand how many of the articles the robot tagged OA were really OA,
and vice versa). We found that the robot significantly overcodes for
OA.
In our Biology sample, 40% of identified OA was in fact OA. In
our Sociology sample, only 18% of identified OA was in fact OA.
Missed OA was lower: 12% in Biology and 14% in Sociology.
The sources of the error are impossible
to determine from the present data, since the algorithm
did not capture URL's for documents identified as OA. In conclusion,
the robot is not yet performing at a desirable level, and future work
may be needed to determine the causes, and improve the algorithm.
(in alphabetical order)
Kristin Antelman, North Carolina State University Libraries
<kristin_antelman at ncsu.edu>
Nisa Bakkalbasi, Yale University Library
<nisa.bakkalbasi at yale.edu, >
David Goodman, Palmer School of Library and Information Science,
Long Island University <dgoodman at liu.edu>
Chawki Hajjem, Institut des sciences cognitives, Université du Québec
à Montréal <Hajjem at vif.com>
Stevan Harnad, Institut des sciences cognitives, Université du Québec
à Montréal <harnad at ecs.soton.ac.uk> "
************************************************************
More information about the SIGMETRICS
mailing list