Sitas, A (Sitas, Anestis); Kapidakis, S (Kapidakis, Sarantos) Duplicate detection algorithms of bibliographic descriptions LIBRARY HI TECH, 26 (2): 287-301 2008

Eugene Garfield garfield at CODEX.CIS.UPENN.EDU
Mon Aug 18 12:24:29 EDT 2008


E-mail Address: sitas at lit.auth.gr

Author(s): Sitas, A (Sitas, Anestis); Kapidakis, S (Kapidakis, Sarantos) 

Title: Duplicate detection algorithms of bibliographic descriptions 

Source: LIBRARY HI TECH, 26 (2): 287-301 2008 

Language: English 

Document Type: Article 

Author Keywords: cataloguing; algorithms; bibliographic systems; records 
management 

Keywords Plus: ISRAELS UNION LIST; DATABASE; RECORDS 

Abstract: Purpose - The purpose of this paper is to focus on duplicate 
record detection algorithms used for detection in bibliographic databases.
Design/methodology/approach - Individual algorithms, their application 
process for duplicate detection and their results are described based on 
available literature (published articles), information found at various 
library web sites and follow-up e-mail communications.
Findings - Algorithms are categorized according to their application as a 
process of a single step or two consecutive steps. The results of 
deletion, merging, and temporary and virtual consolidation of duplicate 
records are studied.
Originality/value - The paper presents an overview of the duplication 
detection algorithms and an up-to-date state of their application in 
different library systems. 

Addresses: Aristotle Univ Thessaloniki, Sch Philosophy, Thessaloniki, 
Greece; Technol Inst Thessaloniki, Sch Lib Sci, Thessaloniki, Greece; 
Ionian Univ, Arch & Lib Sci Dept, Paleo Anaktoro, Greece 

Reprint Address: Sitas, A, Aristotle Univ Thessaloniki, Sch Philosophy, 
Thessaloniki, Greece. 

E-mail Address: sitas at lit.auth.gr 

Cited Reference Count: 14 

Times Cited: 0 

Publisher: EMERALD GROUP PUBLISHING LIMITED
 
Publisher Address: HOWARD HOUSE, WAGON LANE, BINGLEY BD16 1WA, W 
YORKSHIRE, ENGLAND 

ISSN: 0737-8831 

DOI: 10.1108/07378830810880379 

29-char Source Abbrev.: LIBR HI TECH 

ISO Source Abbrev.: Libr. Hi Tech 

Source Item Page Count: 15 

Subject Category: Information Science & Library Science 

ISI Document Delivery No.: 329YX 

*ILCSO
US OCLC ILLINET ONL : 2004 

COUSINS S
COPAC SERVICE : 2006 

COUSINS SA
Duplicate detection and record consolidation in large bibliographic 
databases: the COPAC database experience 
JOURNAL OF INFORMATION SCIENCE 24 : 231 1998 

COYLE K
6 U CAL DLA 1992 

COYLE K
ASIS P 4 : 77 1985 

HICKEY TB
J LIB AUTOMATION 2 : 125 1979 

HUNSTAD S
CATALOGING CLASSIFIC 8 : 239 1988 

LAZINGER SS
TO MERGE AND NOT TO MERGE - ISRAELS UNION LIST OF MONOGRAPHS IN THE 
CONTEXT OF MERGING ALGORITHMS 
INFORMATION TECHNOLOGY AND LIBRARIES 13 : 213 1994 

MEIR DD
Measuring the performance of a merging algorithm: Mismatches, missed-
matches, and overlap in Israel's Union List 
INFORMATION TECHNOLOGY AND LIBRARIES 17 : 116 1998 

ONEILL E
DUPLICATE RECORDS ON : 1990 

TONEY SR
CLEANUP AND DEDUPLICATION OF AN INTERNATIONAL BIBLIOGRAPHIC DATABASE 
INFORMATION TECHNOLOGY AND LIBRARIES 11 : 19 1992 

VOUGIOUKLIS G
ELIDOC : 2007 

WANNINGER PD
IS THE OCLC DATABASE TOO LARGE - A STUDY OF THE EFFECT OF DUPLICATE 
RECORDS IN THE OCLC SYSTEM 
LIBRARY RESOURCES & TECHNICAL SERVICES 26 : 353 1982 

WILLIAMS ME
AUTOMATIC MERGING OF MONOGRAPHIC DATA-BASES - IDENTIFICATION OF DUPLICATE 
RECORDS IN MULTIPLE FILES - IUCS SCHEME
JOURNAL OF LIBRARY AUTOMATION 12 : 156 1979 



More information about the SIGMETRICS mailing list