Citation Matching in Sanskrit Corpora by AS Prasad
Eugene Garfield
eugene.garfield at THOMSONREUTERS.COM
Thu Mar 10 16:02:30 EST 2011
TITLE: Citation Matching in Sanskrit Corpora Using Local
Alignment (Article, English)
AUTHOR: Prasad, AS; Rao, S
E-mail : abhinandan.sp at iiitb.net
SOURCE: SANSKRIT COMPUTATIONAL LINGUISTICS 6465. 2010. p.124-136
SPRINGER-VERLAG BERLIN, BERLIN
KEYWORDS: citation matching; local alignment; Smith-Waterman-Gotoh
algorithm; Sanskrit; Mahabharata; Mahabharata-
Tatparyanirnaya
ABSTRACT: Citation matching is the problem of finding which
citation occurs in a given textual corpus. Most existing citation
matching work is done on scientific literature. The goal of this paper
is to present methods for performing citation matching on Sanskrit
texts.
Exact matching and approximate matching are the two methods for
performing citation matching. The exact matching method checks for exact
occurrence of the citation with respect to the textual corpus.
Approximate matching is a fuzzy string-matching method which computes a
similarity score between an individual line of the textual corpus and
the citation. The Smith-Waterman-Gotoh algorithm for local alignment,
which is generally used in bioinformatics, is used here for calculating
the similarity score. This similarity score is a measure of the
closeness between the text and the citation. The exact-and
approximate-matching methods are evaluated and compared. The methods
presented can be easily applied to corpora in other Indic languages like
Kannada, Tamil, etc. The approximate-matching method can in particular
be used in the compilation of critical editions and plagiarism detection
in a literary work.
AUTHOR ADDRESS: AS Prasad, Int Inst Informat Technol, Bangalore,
Karnataka,
India
ISSN: 0302-9743
------------------------------------------------------------------------
------------
Eugene Garfield, PhD. email: garfield at codex.cis.upenn.edu
<mailto:garfield at codex.cis.upenn.edu>
home page: www.eugenegarfield.org <http://www.eugenegarfield.org/>
Tel: 610-525-8729 Fax: 610-560-4749
Chairman Emeritus, ThomsonReuters Scientific (formerly ISI)
1500 Spring Garden Street, Philadelphia, PA 19130-4067
Editor Emeritus, The Scientist LLC. www.the-scientist.com
<http://www.the-scientist.com/>
400 Market St. Suite 330 Philadelphia, PA 19106-2535
Past President, American Society for Information Science and Technology
(ASIS&T) www.asist.org <http://www.asist.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20110310/1172c048/attachment.html>
More information about the SIGMETRICS
mailing list