Powley, B; Dale, R High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers PROC OF THE 2007 IEEE INTL CONF ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07) 119-124, 2007

Eugene Garfield garfield at CODEX.CIS.UPENN.EDU
Tue Feb 19 11:44:43 EST 2008


Email address: bpowley at comp.mq.edu.au 

Author(s): Powley, B (Powley, Brett); Dale, R (Dale, Robert) 

Title: High accuracy citation extraction and named entity recognition for 
a heterogeneous corpus of academic papers 

Source: PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL 
LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07) 119-124, 2007 

Language: English 

Document Type: Article 

Conference Title: International Conference on Natural Language Processing 
and Knowledge Engineering 

Conference Date: AUG 30-SEP 01, 2007 

Conference Location: Beijing, PEOPLES R CHINA
 
Conference Sponsors: IEEE Signal Proc Soc, Chinese Assoc Artificial 
Intellignece, Chinese Informat Proc Soc China, IEEE Beijing Sect, Beijing 
Univ Posts & Telecommun 

Abstract: Citation indices are increasingly being used not only as 
navigational tools for researchers, but also as the basis for measurement 
of academic performance and research impact. This means that the 
reliability of tools used to extract citations and construct such indices 
is becoming more critical; however, existing approaches to citation 
extraction still fall short of the high accuracy required if critical 
assessments are to be based on them. In this paper, we present techniques 
for high accuracy extraction of citations from academic papers, designed 
for applicability across a broad range of disciplines and document styles. 
We integrate citation extraction, reference parsing, and author named 
entity recognition to significantly improve performance in citation 
extraction, and demonstrate this performance on a cross-disciplinary 
heterogeneous corpus. Applying our algorithm to previously unseen 
documents, we demonstrate high F-measure performance of 0.98 for author 
named entity recognition and 0.97 for citation extraction. 

Reprint Address: Powley, B, Macquarie Univ, Ctr Language Technol, Sydney, 
NSW 2109 Australia. 

Publisher Name: IEEE 

Publisher Address: 345 E 47TH ST, NEW YORK, NY 10017 USA 

ISBN: 978-1-4244-1610-3 

Cited Reference Count: 8 

BERGMARK D
CSTR20001821 : 2000 

BERGMARK D
SIGIR FORUM 35 : 2001 

BESAGNI D
DOCUMENT ANAL RECOGN : 84 2003 

GARFIELD E
CITATION INDEXES FOR SCIENCE - NEW DIMENSION IN DOCUMENTATION THROUGH 
ASSOCIATION OF IDEAS
SCIENCE 122 : 108 1955 

GIUFFRIDA G
DL 00 : 77 2000 

POWLEY B
P 8 RIAO INT C LARG : 2007 

SEYMORE K
AAAI 99 WORKSH MACH : 1999 

TAKASU A
P 3 ACM IEEE CS JOIN : 2003 



More information about the SIGMETRICS mailing list