[Asis-l] TOC, JASIST Vol. 54 # 11

Tue Aug 12 15:08:27 EDT 2003

Journal of the American Society for Information Science and Technology
Volume 54, Issue 11, 2003.

[Note: at the end of this message are URLs for viewing contents of JASIST 
from past issues.  Below, the contents of Bert Boyce's "In this Issue" has 
been cut into the Table of Contents.]

Editorial

987
In this issue
Bert R. Boyce
Published Online: 6 Aug 2003

Research Article

989
Quality Control in Scholarly Publishing: a New Proposal
Stefano Mizzaro
Published Online: 4 Jun 2003
         Mizzaro presents a model for scholarly communication that permits 
the use of electronic journals, removes the reviewing process, 
while  maintaining quality of papers,  and measures the quality of 
researchers' contributions. Journal subscribers, both as authors and as 
readers, have scores associated with them, as do contributed papers.  An 
author's score increases with the publication of papers judged positively 
by readers, and a reader's score decreases when a judgement highly at 
variance with the mean judgement is expressed, and paper's scores depend 
upon cumulated reader's judgements. A steadiness score is associated with 
each of the other scores. Judgements on papers lead to update of the 
paper's score, and thus the scores of its authors and readers. A paper's 
score is the mean of the judgements of its readers, each weighted by that 
reader's score. An author's score is the weighted mean of the papers 
previously published, and a reader's the weighted mean of the goodness of 
previously expressed judgements.

1006
Peripheral Social Awareness Information in Collaborative Work
Michael B. Spring, Vichita Vathanophas
Published Online: 12 Jun 2003
         Spring and. Vathanophas investigate the effect of awareness by 
team members of the work of other members of their team on productivity. 
Sixty undergraduates were assigned to twenty groups of three all using the 
CASCADE collaborative authoring system.  Each subject in a group worked in 
a different location on assigned tasks communicating with the team only by 
e-mail. Information on the number of actions taken by a team member, the 
percentage of required minutes actually worked, and a measure commitment to 
the project were collected and available to half the participant teams. The 
use of the awareness tool is associated with a decrease in work quality and 
intergroup communication. It is possible that the tool reduced the need for 
communication and that it negatively influences the effort of some subjects.
1014
Performance measurement framework for hierarchical text classification
Aixin Sun, Ee-Peng Lim, Wee-Keong Ng
Published Online: 4 Jun 2003
         The evaluation of automatic classification of documents normally 
has taken place in flat schemes where hierarchical structure is not taken 
into account. Since partial success is possible if a document is classified 
correctly at a high level but mis-classified at a lower level, new measures 
should reflect hierarchical information. The traditional recall and 
precision based measures will not indicate that classification into classes 
similar to the correct ones is superior to classification into totally 
unrelated groupings. Sun, Lim, and Ng advocate maintenance of pair-wise 
category similarity values and an average category similarity. If wrong 
assignment occurs the values in the contingency table for recall and 
precision are modified using the similarity values but limited to a zero to 
one range. Category similarity can be replaced with number of links between 
categories in the hierarchical tree if an acceptable distance is specified 
by a user. Since in a  hierarchical classification, mis-categorization at a 
higher level will lead to mis-categorization by a lower level classifier, 
the number of such documents blocked as a proportion of those that should 
be classified at a low level is termed the blocking factor for the higher 
level. This value can provide valuable information on the performance of 
subtree classifiers.  Using the Reuters 21,578 document news collection 
which is organized into 135 categories, three category trees were manually 
derived.  Binary classifiers were trained at each level of hierarchy, and 
when run on the test portion of the collection, the new measures computed. 
Support Vector Machine classifiers out performed Naive Bayes classifiers.

1029
A Comparison of Youngsters' Use of Cd-rom and the Internet as Information 
Resources
Andrew K. Shenton, Pat Dixon
Published Online: 12 Jun 2003
         Shenton and Dixon draw a sample from six high preforming English 
schools in the town of Whitley Bay. Three were first schools, two middle 
schools and one a high school. Choosing at random from one class in each 
year group 188 subjects were selected all of whom had been exposed to 
CD-ROM searching and some to Internet search. Twelve focus groups and 121 
individual interviews were utilized to gather subjects' articulations on 
their own information behavior. Subjects generally attempted to converge 
upon a particular article of interest or even mere specifically, material 
in such an item. The target item in CD-ROMs was often an encyclopedia 
entry, and with the Internet, a web page. Subjects often had favorite 
encyclopedias or search engines which they used repeatedly, and often had a 
favorite website for awareness of developments in an area of interest. 
Single word or short phrase searches were the norm without Boolean 
operations in either medium. There was an expectation of quick satisfaction 
and little concern for accuracy or authority of retrieved sources. Home use 
of CD-ROM files was common while many children had no home internet access, 
or had such access restricted by their parents. Internet use increased with 
respondent age but older subjects found it slow, noisy, and less than user 
friendly. CD-ROM usage decreases with age.

1050
Relevance Data for Language Models Using Maximum Likelihood
David Bodoff, Bin Wu, K. Y. Michael Wong
Published Online: 12 Jun 2003
         Bodoff, Wu, and Wong use a relevance feedback model that requires 
the searcher to establish hypothetical distributions for the relevance 
assessments for each document query pair, the hypothetical distribution of 
documents in the true document vector, and the distribution of queries in 
the true query vector. They then use a maximum likelihood estimation to 
find optimized document and query representations and thus adjust both 
document and query vectors. One such a model might use the cosine (D,Q) for 
relevant documents and 1- cosine (D,Q) for non-relevant documents, while 
assuming normal distributions for document and query error and using 
maximum likelihood to minimize the angles between document vectors, query 
vectors and between document  and query vectors, with the resulting new 
values used for later queries.  It would also be possible to assume both 
true and observed vectors to be of unit length so that the distributions 
all depend upon the angle between observation and mean resulting in a 
(cosine, cosine, cosine) model rather that a (cosine, normal, normal) model 
which would result in a maximum likelihood function similar to the 
traditional Rocchio heuristic. Using five vector space models ( tf*idf, 
plus four feedback methods - Rocchio heuristic, Bartell, maximum 
likelihood, and alpha-beta heuristic, which adjusts documents toward 
adjusted rather than original queries) with the Cranfield and CISI data two 
thirds of the queries were randomly chosen for training, the document 
indexes trained for each method, and the remaining one third tested. Both 
maximum likelihood models ran rapidly and  resulted in highly significant 
improvement over the baseline and both heuristics using average precision.

1062
An IP-level Analysis of Usage Statistics for Electronic Journals in 
Chemistry: Making Inferences about User Behavior
Philip M. Davis, Leah R. Solla
Published Online: 4 Jun 2003
         Davis and Solla study downloads of 29 ACS electronic journals at 
Cornell University during a three month period by individual IP addresses 
rather than unidentifiable individual users. Chemistry and Chemical 
Engineering accounted for 42% of downloads, followed by other Engineering 
departments at 12.5%, Medical College at 6.5%, Food Science at 4.9%, and 
Molecular Biology at 2.6%.  Libraries accounted for 3.4% and the remote 
modem pool only 1.5%. Three percent of users downloaded more than 100 
articles, 14% more than 20, and 38% downloaded 1 or 2 articles during the 
sample period. With the exception of two outliers, JACS and Biochemistry, 
the relationship between number of downloads and number of IP addresses is 
linear. A thousand downloads will lead to an expectation of 114 using 
addresses. The relationship between number of journals consulted and number 
of articles downloaded is quadratic and outliers are heavy users of one or 
two journals. Journals consulted per IP address seems to fit a Lotka 
distribution. The system appears to be used heavily for print on demand 
copies.

1069
Greeklish: an Experimental Interface for Automatic Transliteration
Alexandros Karakos
Published Online: 12 Jun 2003
         In script transliteration the generated character string may not 
always be pronounced as was the source character string, since the phonetic 
habits of those using the alphabet of the generated string will govern. 
Since the Internet normally uses ASCII and thus is restricted to the Roman 
alphabet, transliteration is a problem for users of non-roman alphabets but 
none-the-less string conversion is useful. Greeklish is the expression of 
Greek words in the Roman alphabet, and in this paper, the name for a herein 
described C++ Windows application provided by Karakos that transcribes any 
text's characters on the Windows clipboard from Greek to English or vice 
versa.

Letter to the Editor

1075
The Sample Size Dependency of Statistical Measures and Synchronic 
Potentiality in Informetrics. Some Comments on Some Comments by Professor 
Burrell
Fuyuki Yoshikane, Kyo Kageura, Keita Tsuji
Published Online: 25 Jun 2003

1076
The Sample Size Dependency of Statistical Measures in Informetrics? Some 
Comments
Quentin L. Burrell
Published Online: 12 Jun 2003

------------------------------------------------------
The ASIS web site <http://www.asis.org/Publications/JASIS/tocs.html> 
contains the Table of Contents and brief abstracts as above from January 
1993 (Volume 44) to date.

The John Wiley Interscience site <http://www.interscience.wiley.com> 
includes issues from 1986 (Volume 37) to date.  Guests have access only to 
tables of contents and abstracts.  Registered users of the interscience 
site have access to the full text of these issues and to preprints.

Executive Director
American Society for Information Science and Technology
1320 Fenwick Lane, Suite 510
Silver Spring, MD  20910
FAX: (301) 495-0810
PHONE: (301) 495-0900

http://www.asis.org