[Asis-l] JASIST Volume 54, Number 7 TOC. Special Issue, Web Retrieval & Mining

Mon Apr 7 12:47:56 EDT 2003

Journal of the American Society for Information Science and
Technology
JASIST
VOLUME 54, NUMBER 7

[Note: at the end of this message are URLs for viewing contents of
JASIST from past issues.  Below, the contents of Bert Boyce's "In
this Issue" and portions of  Hsinchun Chen's Introduction to the
JASIST Special Topic Section on Web Retrieval and Mining: A
Machine Learning Perspective has been cut into the Table of
Contents.]

IN THIS ISSUE
Bert L. Boyce p. 593

RESEARCH

The Connection Between the Research of a University and Counts
of Links to Its Web Pages: An Investigation Based upon a
Classification of the Relationships of Pages to the Research of the
Host University
Mike Thewall and Gareth Harries pp. 594-602
Published online 13 March 2003
         Given evidence that patterns of web linking between
Universities can be strongly associated with research productivity,
Thelwall and Harries attempt to identify reasons for this
association by looking at characterization (the nature of linked to
pages) and the effect of different link counting models.
Using Thelwall's database of the link structures of 107 major universities,
a file for each university was created containing all pages in its web
site that were targets of links from other universities in the
database. All pages that had at least 5% as many links as the home
page were categorized into a four part classification: not locally
created, not academic content, high link pages like databases or
gateway pages, and other. From this grouping new link structure
databases for each category were constructed, and 56 lists of link
counts were created and correlated with published research
productivity scores for 108 UK universities using several different
counting models. Both the models and the categorization effect the
correlation coefficients, and it appears that choosing categories
most related to a university's research will result in stronger
associations.

Type/Token-Taken Informetrics
Leo Egghe
Published online 24 March 2003 pp.603-610
         Egghe terms the study of the relationship of items to
sources as, first, dual informatics, and then, after the linguistic
usage, type/token informatics. If one studies the use of a type, one
finds that typically high token types are chosen over low token
types, and one can make assertions about this use relative to
average (expected) use. Such study is termed type/token-taken (or
T/TT) informetrics as opposed to the dual (T/T) informetrics
exemplified by Lotka's law and (T/TT) describes the source/item
relationship as it is experienced by users. The searcher will find
more hits than could be expected from the database, and will find
that more books have been checked out than could be expected
from the database, since it can be shown that the average number
of user observed items per source is larger than the real existing
number. It can also be shown that for each fixed Lotkaian exponent
the T/TT mean is an increasing function of the T/T mean.

Adapting Measures of Clumping Strength to Assess Term-Term
Similarity
Abraham Bookstein, Vladimir Kulyukin, Timo Raita and John
Nicholson
Published online 13 March 2003 pp. 611-620
         Bookstein, et alia, construct measures of semantic term
association based upon a statistical model of language and the
capture of the peculiarities to be found in text generation. The
theory is that terms that share the same content will be found
together, or clump within a document in the places where that
semantic content is discussed, and that variation from random term
distribution will indicate such text segments. If one computes a
clumping measure for a term only over those portions of text where
a second term is present, it is likely to differ from that same
measure computed for the first term over the whole text. In fact, if
they carry the same content when measured together, the first term
may appear to occur at random by the clumping measure in the
context of the second term, even though it is strongly clumped
outside this context, and thus a comparison of the two
measurements should indicate term association. A basic
association is shown in this manner which takes into account not
only the number of documents in which a term occurs, but also the
number of occurrences, although it is also possible to design a
measure that takes into account the clumping measure that is
generated when the second term is specifically excluded. This
would cover the case where the first term's clumping strength is
dependent upon the second term's strength. An experiment using
twenty content terms from the Columbia Encyclopedia data base
found their association scores with all other terms and the 100 pairs
with the highest scores. Judges then ranked the term associations as
"successes," "failures," or "can't says." Precision type measures
were then computed both with "can't says" not counted, and
counted as failures and were quite high. It appears to be possible to
distinguish between symmetric and asymmetric associations purely
on a statistical basis since each term in a pair may either influence
the others clumping behavior in the same manner or one may
influence the other but not the reverse.

SPECIAL TOPIC SECTION: WEB RETRIEVAL AND MINING
Guest Editor: Hsinchun Chen

Introduction to the JASIST Special Topic Section on Web
Retrieval and Mining: A Machine Learning Perspective
Hsinchun Chen
Published online 13 March 2003 pp. 621-624
         This special issue consists of six papers that report research
in web retrieval and mining. Most papers apply or adapt various
pre-web retrieval and analysis techniques to other interesting and
challenging web-based applications.
         The Web has become the world's largest knowledge
repository. Extracting knowledge from the Web efficiently and
effectively is becoming increasingly important for various Web
applications.  The current Web still consists of more information
than knowledge. Also, most of the Web mining activities are still
in their early stages and will continue to develop as the Web
evolves.  We hope this collection of research papers will help
advance our knowledge and understanding of this fascinating and
evolving field of web retrieval and mining.

Client-Side Monitoring for Web Mining
Kurt D. Fenstermacher and Mark Ginsburg pp. 625-637
Published online 17 March 2003
         Client-Side Monitoring for Web Mining, by Fenstermacher
and Ginsburg, proposes a client-side monitoring system that is
unobtrusive and supports flexible data collection. Moreover, the
proposed framework encompasses client-side applications (such as
standard office productivity tools) beyond the Web browser.

Relevant Term Suggestion in Interactive Web Search Based on
Contextual Information in Query Session Logs
Chien-Kang Huang, Lee-Feng Chien and Yen-Jen Oyang
pp.638-649
Published online 13 March 2003
         Relevant Term Suggestion in Interactive Web Search Based
on Contextual Information in Query Session Logs, by Huang,
Chien, and Oyang, proposes a query log-based term suggestion
approach to interactive Web search. Using this approach, relevant
terms suggested for a user query are those that co-occur in similar
query sessions from search engine logs, rather than in the retrieved
documents. Their experiments showed that the proposed approach
can exploit the contextual information in a user query session to
make useful suggestions.

DocCube: Multi-Dimensional Visualisation and Exploration of
Large Document Sets
Josiane Mothe, Claude Chrisment, Bernard Dousset, and Joel Alaux pp. 650-659
Published online 13 March 2003
         DocCube: Multi-Dimensional Visualization and
Exploration of Large Documents Sets, by Mothe, Chrisment, Dousset,
and Alaux, presents a novel user interface that provides
global visualization of large document sets to help users formulate
query and access documents. Concept hierarchies are introduced to
facilitate browsing.

A Novel Method for Discovering Fuzzy Sequential Patterns Using
the Simple Fuzzy Partition Method
Ruey-Shun Chen and Yi-Chung Hu
Published online 28 March 2003 pp. 660-670
         A Novel Method for Discovering Fuzzy Sequential Patterns
Using the Simple Fuzzy Partition Method, by Chen and Hu,
proposes a fuzzy data mining technique to discover fuzzy
sequential patterns.

Automatic Generation of English/Chinese Thesaurus Based on a
Parallel Corpus in Laws
Christopher C. Yang and Johnny Luk
Published online 28 March 2003 pp. 671-682
         Automatic Generation of English/Chinese Thesaurus Based
on a Parallel Corpus in Laws, by Yang and Luk, describes a project
that aims to address cross-lingual semantic interoperability by
developing a cross-lingual thesaurus based on an English/Chinese
parallel corpus. Their experiments showed that such a thesaurus is
useful in suggesting relevant terms in a different language.

HelpfulMed: Intelligent Searching for Medical Information over
the Internet
Hsinchun Chen, Ann M. Lally, Bin Zhu, and Michael Chau
Published online 13 March 2003 pp. 683-694
         HelpfulMed: Intelligent Searching for Medical Information
over the Internet, by Chen, Lally, Zhu, and Chau, describes an
intelligent, web-based medical portal that supports meta searching,
vertical search engine creation, term suggestion, and knowledge
map browsing, all in an integrated web-based architecture. Initial
user evaluations of the system were promising in comparison to
other traditional medical search engines.

BOOK REVIEW

Looking for Information: A Survey of Research on Information
Seeking, Needs and Behavior, by Donald O. Case
Reviewed by Reijo Savolainen
Published online 24 March 2003 pp. 695-697

CALLS FOR PAPERS
Published online 24-28 March 2003 pp. 698-699

------------------------------------------------------
The ASIS web site <http://www.asis.org/Publications/JASIS/tocs.html>
contains the Table of Contents and brief abstracts as above from January
1993 (Volume 44) to date.

The John Wiley Interscience site <http://www.interscience.wiley.com>
includes issues from 1986 (Volume 37) to date.  Guests have access only
to tables of contents and abstracts.  Registered users of the interscience
site have access to the full text of these issues and to preprints.

Executive Director
American Society for Information Science and Technology
1320 Fenwick Lane, Suite 510
Silver Spring, MD  20910
FAX: (301) 495-0810
PHONE: (301) 495-0900

http://www.asis.org