Citation Matching in Sanskrit Corpora by AS Prasad

Eugene Garfield eugene.garfield at THOMSONREUTERS.COM
Thu Mar 10 16:02:30 EST 2011


TITLE:          Citation Matching in Sanskrit Corpora Using Local

                Alignment (Article, English)

AUTHOR:         Prasad, AS; Rao, S     

E-mail      :         abhinandan.sp at iiitb.net

 

SOURCE:         SANSKRIT COMPUTATIONAL LINGUISTICS 6465. 2010. p.124-136

                SPRINGER-VERLAG BERLIN, BERLIN

 

 

 

KEYWORDS:       citation matching; local alignment; Smith-Waterman-Gotoh

                algorithm; Sanskrit; Mahabharata; Mahabharata-

                Tatparyanirnaya

 

ABSTRACT:       Citation matching is the problem of finding which

citation occurs in a given textual corpus. Most existing citation
matching work is done on scientific literature. The goal of this paper
is to present methods for performing citation matching on Sanskrit
texts.

Exact matching and approximate matching are the two methods for
performing citation matching. The exact matching method checks for exact
occurrence of the citation with respect to the textual corpus.

Approximate matching is a fuzzy string-matching method which computes a
similarity score between an individual line of the textual corpus and
the citation. The Smith-Waterman-Gotoh algorithm for local alignment,
which is generally used in bioinformatics, is used here for calculating
the similarity score. This similarity score is a measure of the
closeness between the text and the citation. The exact-and
approximate-matching methods are evaluated and compared. The methods
presented can be easily applied to corpora in other Indic languages like
Kannada, Tamil, etc. The approximate-matching method can in particular
be used in the compilation of critical editions and plagiarism detection
in a literary work.

 

AUTHOR ADDRESS: AS Prasad, Int Inst Informat Technol, Bangalore,
Karnataka,

                India

   ISSN: 0302-9743

 

------------------------------------------------------------------------
------------

Eugene Garfield, PhD. email:  garfield at codex.cis.upenn.edu
<mailto:garfield at codex.cis.upenn.edu>  
home page: www.eugenegarfield.org <http://www.eugenegarfield.org/> 
Tel: 610-525-8729 Fax: 610-560-4749

Chairman Emeritus, ThomsonReuters Scientific (formerly ISI)
1500 Spring Garden Street, Philadelphia, PA 19130-4067

Editor Emeritus, The Scientist LLC. www.the-scientist.com
<http://www.the-scientist.com/>    
400 Market St. Suite 330 Philadelphia, PA 19106-2535

Past President, American Society for Information Science and Technology
(ASIS&T) www.asist.org <http://www.asist.org/>  

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20110310/1172c048/attachment.html>


More information about the SIGMETRICS mailing list