[Asis-l] TOC, JASIST, Volume 53, # 13

Fri Oct 11 09:13:51 EDT 2002

Journal of the American Society for Information Science and Technology
JASIST
VOLUME 53, NUMBER 13

[Note: URLs for viewing contents of JASIST from past issues are at the 
bottom.  Immediately below, the contents of Bert Boyce's "In This Issue" 
has been cut into the Table of Contents.]

  EDITORIAL

  In This Issue
  Bert R. Boyce
  1083

  RESEARCH
  State Digital Library Usability: Contributing Organizational Factors
  Hong (Iris) Xie and Dietmar Wolfram
  Published online 18 September 2002
  1085
   In this issue Xie and Wolfram study the Wisconsin state digital library 
BadgerLink to determine the organizational factors that lead to different 
use requirements and the degree to which these are met, as well as impact 
on physical libraries. To this end, usage data from EBSCOhost and ProQuest 
logs for BadgerLink were analyzed, 313 Wisconsin libraries of all types 
were surveyed (76% response rate), and analyzed along with 81 responses to 
a voluntary web survey of end users. Heaviest users were K-12 schools and 
institutions of higher education. Heaviest use sites were the two largest 
state universities and the state's largest public library. Small libraries 
were infrequent users. Web survey respondents were mature working 
professionals. Sixty percent searched for specific information, but 46% 
reported browsing in subject areas. Libraries with dedicated Internet 
access reported more frequent usage than those with dial-up connection. 
Those who accessed from libraries reported more frequent use than those at 
work or at home. Libraries that trained end users reported more use, but 
the majority of the web survey respondents reported themselves as 
self-taught. Logs confirm reported subject interests. Three surrogates were 
requested for every full text document but full text availability is 
reported as the reason for
use by 30% of users. Availability has led to the cancellation of 
subscriptions in many libraries that are important promoters of the 
service. A model will need to include interactions based upon the influence 
of each involved participant on the others. It will also need to include 
the extension of the activities of one participant to other participant 
organizations and the communication among these organizations.

  Unfounded Attribution of the ``Half-Life'' Index-Number of Literature 
Obsolescence to Burton and Kebler: A Literature Science Study
  Endre Szava-Kovats
  Published online 21 August 2002
  1098
   Szava-Kovats demonstrates that the common attribution of the origin of 
the concept of half-life in subject-oriented journal literatures to the 
1960 Burton and Kebler article in American Documentation is not 
correct.  The first use appears to be in C. R. Gosnell's 1944 paper in 
College and Research Libraries. It was later discussed by J. D. Bernal at 
the 1958 International Conference on Scientific Information in Washington, 
DC. While Burton and Kebler do solve some of the theoretical problems by 
redefining half-life, they do not express confidence in the use of 
half-life in this milieu, and Burton later advocates in 1961 the term 
``median age'' which was introduced by Broadus in this context in 1953.

  Is the Relationship Between Numbers of References and Paper Lengths the 
Same for All Sciences?
  Helmut A. Abt and Eugene Garfield
  Published online 19 September 2002
  1106
   It has been shown in the physical sciences that a paper's length is 
related to its number of references in a linear manner. Abt and Garfield 
here look at the life and social sciences with the thought that if the 
relation holds the citation counts will provide a measure of relative 
importance across these disciplines. In the life sciences 200 research 
papers from 1999-2000 were scanned in each of 10 journals to produce counts 
of 1000 word normalized pages. In the social sciences an average of 70 
research papers in nine journals were scanned for the two-year period. 
Papers of average length in the various sciences have the same average 
number of references within plus or minus 17%. A look at the 30 to 60 
papers over the two years in 18 review journals indicates twice the 
references of research papers of the same length.

  Algorithmic Procedure for Finding Semantically Related Journals
  Alexander I. Pudovkin and Eugene Garfield
  Published online 3 September 2002
  1113
   Journal Citation Reports provides a classification of journals most 
heavily cited by a given journal and which most heavily cite that journal, 
but size variation is not taken into account. Pudovkin and Garfield suggest 
a procedure for meeting this difficulty. The relatedness of journal i to 
journal j is determined by the number of citations from journal i to 
journal j in a given year normalized by the product of the papers published 
in the j journal in that year times the number of references cited in the i 
journal in that year. A multiplier of ten to the sixth is suggested to 
bring the values into an easily perceptible range. While citations received 
depend upon the overall cumulative number of papers published by a journal, 
the current year is utilized since that data is available in JCR. Citations 
to current year papers would be quite low in most fields and thus not 
included. To produce the final index, the maximum of the A citing B value, 
and the B citing A value is chosen and used to indicate the closeness of 
the journals. The procedure is illustrated for the journal Genetics.

  Using Graded Relevance Assessments in IR Evaluation
  Jaana Kekalainen and Kalervo Jarvelin
  Published online 3 September 2002
  1120
   Kekalainen and Jarvelin use what they term generalized, nonbinary recall 
and precision measures where recall is the sum of the relevance scores of 
the retrieved documents divided by the sum of relevance scores of all 
documents in the data base, and precision is the sum of the relevance 
scores of the retrieved documents divided by the number of documents where 
the relevance scores are real numbers between zero and one. Using the 
In-Query system and a text data base of 53,893 newspaper articles with 30 
queries selected from those for which four relevance categories to provide 
recall measures were available, search results were evaluated by four 
judges. Searches were done by average key term weight, Boolean expression, 
and by average term weight where the terms are grouped by a synonym 
operator, and for each case with and without expansion of the original 
terms. Use of higher standards of relevance appears to increase the 
superiority of the best method. Some methods do a better job of getting the 
highly
relevant documents but do not increase retrieval of marginal ones. There is 
evidence that generalized precision provides more equitable results, while 
binary precision provides undeserved merit to some methods. Generally 
graded relevance measures seem to provide additional insight into IR 
evaluation.

  Automatic Thesaurus Generation for Chinese Documents
  Yuen-Hsien Tseng
  Published online 19 September 2002
  1130
   Tseng constructs a word co-occurrence based thesaurus by means of the 
automatic analysis of Chinese text. Words are identified by a longest 
dictionary match supplemented by a key word extraction algorithm that 
merges back nearby tokens and accepts shorter strings of characters if they 
occur more often than the longest string. Single character auxiliary words 
are a major source of error but this can be greatly reduced with the use of 
a 70-character 2680 word stop list.
   Extracted terms with their associate document weights are sorted by 
decreasing frequency and the top of this list is associated using a Dice 
coefficient modified to account for longer documents on the weights of term 
pairs. Co-occurrence is not in the document as a whole but in paragraph or 
sentence size sections in order to reduce computation time. A window of 29 
characters or 11 words was found to be sufficient. A thesaurus was produced 
from 25,230 Chinese news articles and judges asked to review the top 50 
terms associated with each of 30 single word query terms. They determined 
69% to be relevant.

  On Bidirectional English-Arabic Search
  M. Aljlayl, O. Frieder, and D. Grossman
  Published online 19 September 2002
  1139
   Aljlayl, Frieder, and Grossman review machine translation of query 
methodologies and apply them to English-Arabic/Arabic-English 
Cross-Language Information Retrieval. In the dictionary method, replacement 
of each term with all possible equivalents in the target language results 
in considerable ambiguity, while taking the first term in the dictionary 
list reduces the ambiguity but may fail to capture the meaning. A Two-Phase 
method takes all possible equivalents and translates them back, retaining 
only those that generate the original term. It results in an average query 
length of six terms in TREC7 and 12 in TREC9. Arabic to English 
translations consistently preformed below the original English queries, and 
the Two-Phase method consistently preformed at the highest level and 
significantly better than the Every-Match method.
   Machine translation using other techniques is economical for queries but 
not likely so for documents. Using ALKAFI, a commercial translation system 
from Arabic to English and the Al-Mutarjim Al-Arabey system for English to 
Arabic, nearly 60% of monolingual retrievals were generated going from 
Arabic to English. Smaller numbers of terms in the source query improve 
performance, and these systems require syntactically well-formed queries 
for good performance.

  The Influence of Mental Models and Goals on Search Patterns During Web 
Interaction
  Debra J. Slone
  Published online 19 September 2002
  1152
   Thirty-one patrons, who were selected by Slone to provide a range of age 
and experience, agreed when approached while using the catalog of the Wake 
County library system to try searching via the Internet.  Fifteen searched 
the Wake County online catalog in this manner and 16 searched the World 
Wide Web, including that catalog. They were subjected to brief 
pre-structured taped interviews before and after their searches and 
observed during the searching process resulting in a log of behaviors, 
comments, pages accessed, and time spent. Data were analyzed across 
participants and categories. Web searches were characterized as linking, 
URL, search engine, within a site domain, and searching a web catalog; and 
participants by the number of these techniques used. Four used only one, 13 
used two, 11 used three, two used four, and one all five.
  Participant experience was characterized as never used, used search 
engines, browsing experience, email experience, URL experience, catalog 
experience, and finally chat room/newsgroup experience.  Sixteen percent of 
the participants had never used the Internet, 71% had used search engines, 
65% had browsed, 58% had used email, 39% had used URLs, 39% had used online 
catalogs, and 32% had used chat rooms. The catalog was normally consulted 
before the web, where both were used, and experience with an online catalog 
assists in web use. Scrolling was found to be unpopular and practiced 
halfheartedly.
  Children's Use of the Yahooligans! Web Search Engine. III. Cognitive and 
Physical Behaviors on Fully
Self-Generated Search Tasks
  Dania Bilal
  Published online 19 September 2002
  1170
   Bilal, in this third part of her Yahooligans! study looks at children's 
performance with self-generated search tasks, as compared to previously 
assigned search tasks looking for differences in success, cognitive 
behavior, physical behavior, and task preference. Lotus ScreenCam was used 
to record interactions and post search interviews to record impressions. 
The subjects, the same 22 seventh grade children in the previous studies, 
generated topics of interest that were mediated with the researcher into 
more specific topics where necessary. Fifteen usable sessions form the 
basis of the study. Eleven children were successful in finding information, 
a rate of 73% compared to 69% in assigned research questions, and 50% in 
assigned fact-finding questions.
   Eighty-seven percent began using one or two keyword searches. Spelling 
was a problem. Successful children made fewer keyword searches and the 
number of search moves averaged 5.5 as compared to 2.4 on the research 
oriented task and 3.49 on the factual. Backtracking and looping were 
common. The self-generated task was preferred by 47% of the subjects.

  Book Reviews
  Usability Testing for Library Web Sites: A Hands-On Guide, by Elaina 
Norlin and CM Winters
  Matt Jones
  Published online 7 August 2002
  1184

  Accessing and Browsing Information and Communication, by Ronald E. Rice, 
Maureen McCreadie, and Shan-Ju L. Chang
  Robert J. Sandusky
  Published online 29 August 2002
  1185

  Strategies for Electronic Commerce and the Internet, by Henry C. Lucas, Jr.
  Roisin Faherty
  Published online 22 August 2002
  1187

  CALL FOR PAPERS
  1189

----------
[Note: The ASIST home page 
<http://www.asis.org/Publications/JASIS/tocs.html> contains the Table of 
Contents and abstracts from Bert Boyce's "In This Issue" from January 1993 
(Volume 44) to date.

The John Wiley Interscience site <http://www.interscience.wiley.com> 
includes issues from 1986 (Volume 37) to date.  Guests have access only to 
tables of contents and abstracts.  Registered users of the interscience 
site have access to the full text of these issues and to preprints.]

Executive Director
American Society for Information Science and Technology
1320 Fenwick Lane, Suite 510
Silver Spring, MD  20910
FAX: (301) 495-0810
PHONE: (301) 495-0900

http://www.asis.org