Sriphaew, K; Theeramunkong, T Quality evaluation for document relation discovery using citation information IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E90D (8): 1225-1234 AUG 200

Eugene Garfield garfield at CODEX.CIS.UPENN.EDU
Wed Jun 25 10:51:03 EDT 2008


E-mail Address: thanaruk at siit.tu.ac.th

Author(s): Sriphaew, K (Sriphaew, Kritsada); Theeramunkong, T 
(Theeramunkong, Thanaruk) 

Title: Quality evaluation for document relation discovery using citation 
information 

Source: IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E90D (8): 1225-1234 
AUG 2007 

Language: English 

Document Type: Article 

Author Keywords: document relations; frequent itemset mining; citation 
matrix; quality evaluation; document relation evaluation 

Abstract: Assessment of discovered patterns is an important issue in the 
field of knowledge discovery. This paper presents an evaluation method 
that utilizes citation (reference) information to assess the quality of 
discovered document relations. With the concept of transitivity as 
direct/indirect citations, a series of evaluation criteria is introduced 
to define the validity of discovered relations. Two kinds of validity, 
called soft validity and hard validity, are proposed to express the 
quality of the discovered relations. For the purpose of impartial 
comparison, the expected validity is statistically estimated based on the 
generative probability of each relation pattern. The proposed evaluation 
is investigated using more than 10,000 documents obtained from a research 
publication database. With frequent itemset mining as a process to 
discover document relations, the proposed method was shown to be a 
powerful way to evaluate the relations in four aspects: soft/hard scoring, 
direct/indirect citation, relative quality over the expected value, and 
comparison to human judgment. 

Addresses: Thammasat Univ, Sirindhorn Int Inst Technol, Sch Informat & 
Comp Technol, Bangkok 10200, Thailand 

Reprint Address: Sriphaew, K, Thammasat Univ, Sirindhorn Int Inst Technol, 
Sch Informat & Comp Technol, 2 Prachan Rd, Bangkok 10200, Thailand. 

E-mail Address: thanaruk at siit.tu.ac.th 

Cited Reference Count: 19 

Times Cited: 0 

Publisher: IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG 

Publisher Address: KIKAI-SHINKO-KAIKAN BLDG MINATO-KU SHIBAKOEN 3 CHOME, 
TOKYO, 105, JAPAN 

ISSN: 0916-8532 

29-char Source Abbrev.: IEICE TRANS INFORM SYST 

ISO Source Abbrev.: IEICE Trans. Inf. Syst. 

Source Item Page Count: 10 

Subject Category: Computer Science, Information Systems; Computer Science, 
Software Engineering 

ISI Document Delivery No.: 202WE 

GANIZ M
LUCSE05027 : 2005 

GORDON MD
Using latent semantic indexing for literature based discovery 
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE 49 : 674 1998 

HAN J
2000 ACM SIGMOD INT 2000 1 

KESSLER MM
BIBLIOGRAPHIC COUPLING BETWEEN SCIENTIFIC PAPERS 
AMERICAN DOCUMENTATION 14 : 10 1963 

KLEINBERG J
ACM 46 : 604 1999 

LINDSAY RK
Literature-based discovery by lexical statistics 
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE 50 : 574 1999 

MCCALLUM AK
BOW TOOLKIT STAT LAN : 1996 

NANBA H
11 SIG CLASS RES WOR 2000 117 

PAGE L
PAGERANK CITATION RA : 1998 

PRATT W
P 16 NAT C ART INT : 80 1999 

ROSCH E
PRINCIPLES CATEGORIZ : 27 1978 

ROUSSEAU R
A classification of author co-citations: Definitions and search strategies 
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY 
55 : 513 DOI 10.1002/asi.10401 2004 

SALTON G
INTRO MODERN INFORM : 1986 

SALTON G
INTRO MODERN INFORM : 1983 

SMALL H
J AM SOC INFORM SCI 42 : 676 1973 
SRIPHAEW K

P 23 INT C ART INT A : 112 2005 

SWANSON DR
MEDICAL LITERATURE AS A POTENTIAL SOURCE OF NEW KNOWLEDGE 
BULLETIN OF THE MEDICAL LIBRARY ASSOCIATION 78 : 29 1990 

SWANSON DR
PERSPECTIVES BIOL ME 30 : 1 1986 

WHITE H
BIBLIOMETRICS ANN RE : 119 1989 
   



More information about the SIGMETRICS mailing list