[Asis-l] TOC, JASIST Vol. 54 # 11
Richard Hill
rhill at asis.org
Tue Aug 12 15:08:27 EDT 2003
Journal of the American Society for Information Science and Technology
Volume 54, Issue 11, 2003.
[Note: at the end of this message are URLs for viewing contents of JASIST
from past issues. Below, the contents of Bert Boyce's "In this Issue" has
been cut into the Table of Contents.]
Editorial
987
In this issue
Bert R. Boyce
Published Online: 6 Aug 2003
Research Article
989
Quality Control in Scholarly Publishing: a New Proposal
Stefano Mizzaro
Published Online: 4 Jun 2003
Mizzaro presents a model for scholarly communication that permits
the use of electronic journals, removes the reviewing process,
while maintaining quality of papers, and measures the quality of
researchers' contributions. Journal subscribers, both as authors and as
readers, have scores associated with them, as do contributed papers. An
author's score increases with the publication of papers judged positively
by readers, and a reader's score decreases when a judgement highly at
variance with the mean judgement is expressed, and paper's scores depend
upon cumulated reader's judgements. A steadiness score is associated with
each of the other scores. Judgements on papers lead to update of the
paper's score, and thus the scores of its authors and readers. A paper's
score is the mean of the judgements of its readers, each weighted by that
reader's score. An author's score is the weighted mean of the papers
previously published, and a reader's the weighted mean of the goodness of
previously expressed judgements.
1006
Peripheral Social Awareness Information in Collaborative Work
Michael B. Spring, Vichita Vathanophas
Published Online: 12 Jun 2003
Spring and. Vathanophas investigate the effect of awareness by
team members of the work of other members of their team on productivity.
Sixty undergraduates were assigned to twenty groups of three all using the
CASCADE collaborative authoring system. Each subject in a group worked in
a different location on assigned tasks communicating with the team only by
e-mail. Information on the number of actions taken by a team member, the
percentage of required minutes actually worked, and a measure commitment to
the project were collected and available to half the participant teams. The
use of the awareness tool is associated with a decrease in work quality and
intergroup communication. It is possible that the tool reduced the need for
communication and that it negatively influences the effort of some subjects.
1014
Performance measurement framework for hierarchical text classification
Aixin Sun, Ee-Peng Lim, Wee-Keong Ng
Published Online: 4 Jun 2003
The evaluation of automatic classification of documents normally
has taken place in flat schemes where hierarchical structure is not taken
into account. Since partial success is possible if a document is classified
correctly at a high level but mis-classified at a lower level, new measures
should reflect hierarchical information. The traditional recall and
precision based measures will not indicate that classification into classes
similar to the correct ones is superior to classification into totally
unrelated groupings. Sun, Lim, and Ng advocate maintenance of pair-wise
category similarity values and an average category similarity. If wrong
assignment occurs the values in the contingency table for recall and
precision are modified using the similarity values but limited to a zero to
one range. Category similarity can be replaced with number of links between
categories in the hierarchical tree if an acceptable distance is specified
by a user. Since in a hierarchical classification, mis-categorization at a
higher level will lead to mis-categorization by a lower level classifier,
the number of such documents blocked as a proportion of those that should
be classified at a low level is termed the blocking factor for the higher
level. This value can provide valuable information on the performance of
subtree classifiers. Using the Reuters 21,578 document news collection
which is organized into 135 categories, three category trees were manually
derived. Binary classifiers were trained at each level of hierarchy, and
when run on the test portion of the collection, the new measures computed.
Support Vector Machine classifiers out performed Naive Bayes classifiers.
1029
A Comparison of Youngsters' Use of Cd-rom and the Internet as Information
Resources
Andrew K. Shenton, Pat Dixon
Published Online: 12 Jun 2003
Shenton and Dixon draw a sample from six high preforming English
schools in the town of Whitley Bay. Three were first schools, two middle
schools and one a high school. Choosing at random from one class in each
year group 188 subjects were selected all of whom had been exposed to
CD-ROM searching and some to Internet search. Twelve focus groups and 121
individual interviews were utilized to gather subjects' articulations on
their own information behavior. Subjects generally attempted to converge
upon a particular article of interest or even mere specifically, material
in such an item. The target item in CD-ROMs was often an encyclopedia
entry, and with the Internet, a web page. Subjects often had favorite
encyclopedias or search engines which they used repeatedly, and often had a
favorite website for awareness of developments in an area of interest.
Single word or short phrase searches were the norm without Boolean
operations in either medium. There was an expectation of quick satisfaction
and little concern for accuracy or authority of retrieved sources. Home use
of CD-ROM files was common while many children had no home internet access,
or had such access restricted by their parents. Internet use increased with
respondent age but older subjects found it slow, noisy, and less than user
friendly. CD-ROM usage decreases with age.
1050
Relevance Data for Language Models Using Maximum Likelihood
David Bodoff, Bin Wu, K. Y. Michael Wong
Published Online: 12 Jun 2003
Bodoff, Wu, and Wong use a relevance feedback model that requires
the searcher to establish hypothetical distributions for the relevance
assessments for each document query pair, the hypothetical distribution of
documents in the true document vector, and the distribution of queries in
the true query vector. They then use a maximum likelihood estimation to
find optimized document and query representations and thus adjust both
document and query vectors. One such a model might use the cosine (D,Q) for
relevant documents and 1- cosine (D,Q) for non-relevant documents, while
assuming normal distributions for document and query error and using
maximum likelihood to minimize the angles between document vectors, query
vectors and between document and query vectors, with the resulting new
values used for later queries. It would also be possible to assume both
true and observed vectors to be of unit length so that the distributions
all depend upon the angle between observation and mean resulting in a
(cosine, cosine, cosine) model rather that a (cosine, normal, normal) model
which would result in a maximum likelihood function similar to the
traditional Rocchio heuristic. Using five vector space models ( tf*idf,
plus four feedback methods - Rocchio heuristic, Bartell, maximum
likelihood, and alpha-beta heuristic, which adjusts documents toward
adjusted rather than original queries) with the Cranfield and CISI data two
thirds of the queries were randomly chosen for training, the document
indexes trained for each method, and the remaining one third tested. Both
maximum likelihood models ran rapidly and resulted in highly significant
improvement over the baseline and both heuristics using average precision.
1062
An IP-level Analysis of Usage Statistics for Electronic Journals in
Chemistry: Making Inferences about User Behavior
Philip M. Davis, Leah R. Solla
Published Online: 4 Jun 2003
Davis and Solla study downloads of 29 ACS electronic journals at
Cornell University during a three month period by individual IP addresses
rather than unidentifiable individual users. Chemistry and Chemical
Engineering accounted for 42% of downloads, followed by other Engineering
departments at 12.5%, Medical College at 6.5%, Food Science at 4.9%, and
Molecular Biology at 2.6%. Libraries accounted for 3.4% and the remote
modem pool only 1.5%. Three percent of users downloaded more than 100
articles, 14% more than 20, and 38% downloaded 1 or 2 articles during the
sample period. With the exception of two outliers, JACS and Biochemistry,
the relationship between number of downloads and number of IP addresses is
linear. A thousand downloads will lead to an expectation of 114 using
addresses. The relationship between number of journals consulted and number
of articles downloaded is quadratic and outliers are heavy users of one or
two journals. Journals consulted per IP address seems to fit a Lotka
distribution. The system appears to be used heavily for print on demand
copies.
1069
Greeklish: an Experimental Interface for Automatic Transliteration
Alexandros Karakos
Published Online: 12 Jun 2003
In script transliteration the generated character string may not
always be pronounced as was the source character string, since the phonetic
habits of those using the alphabet of the generated string will govern.
Since the Internet normally uses ASCII and thus is restricted to the Roman
alphabet, transliteration is a problem for users of non-roman alphabets but
none-the-less string conversion is useful. Greeklish is the expression of
Greek words in the Roman alphabet, and in this paper, the name for a herein
described C++ Windows application provided by Karakos that transcribes any
text's characters on the Windows clipboard from Greek to English or vice
versa.
Letter to the Editor
1075
The Sample Size Dependency of Statistical Measures and Synchronic
Potentiality in Informetrics. Some Comments on Some Comments by Professor
Burrell
Fuyuki Yoshikane, Kyo Kageura, Keita Tsuji
Published Online: 25 Jun 2003
1076
The Sample Size Dependency of Statistical Measures in Informetrics? Some
Comments
Quentin L. Burrell
Published Online: 12 Jun 2003
------------------------------------------------------
The ASIS web site <http://www.asis.org/Publications/JASIS/tocs.html>
contains the Table of Contents and brief abstracts as above from January
1993 (Volume 44) to date.
The John Wiley Interscience site <http://www.interscience.wiley.com>
includes issues from 1986 (Volume 37) to date. Guests have access only to
tables of contents and abstracts. Registered users of the interscience
site have access to the full text of these issues and to preprints.
Executive Director
American Society for Information Science and Technology
1320 Fenwick Lane, Suite 510
Silver Spring, MD 20910
FAX: (301) 495-0810
PHONE: (301) 495-0900
http://www.asis.org
More information about the Asis-l
mailing list