[Asis-l] JASIST TOC, Vol 54, Number 6

Mon Mar 3 14:52:20 EST 2003

Journal of the American Society for Information Science and Technology
JASIST
VOLUME 54, NUMBER 6

[Note: and the end of this message are URLs for viewing contents of JASIST 
from past issues.  The contents of Bert Boyce's "In this Issue" has been 
cut into the Table of Contents.]

CONTENTS

IN THIS ISSUE
Bert L. Boyce
471

RESEARCH

Web Search Strategies and Approaches to Studying
Nigel Ford, David Miller, and Nicola Moss
473
Published online 25 February 2003
         In this issue Ford, Miller and Moss utilize 68 volunteers from a 
population of  250 Master's students to complete on the web three search 
tasks with clear fact based goals and three or less facets. One task 
required broadening the search concepts from those given, a second provided 
a specific terminology for one facet but required a second facet that would 
require translation, and the third required general to specific 
transformation. The students were measured as to their performance on 
Entewistle's Revised Inventory of Approaches to Studying providing values 
for ten study variables and asked to assess their experience on the 
Internet, with Alta Vista, and with Boolean search. Searches were conducted 
on Alta Vista using Netscape Navigator 4 with participants free to choose 
and switch Boolean, best match or combined search modes at will while a 
front end script recorded all submitted searches and help access. Search 
related variables extracted were from  Boolean only queries, best match 
only queries, and combined queries. Factor analyses were conducted on all 
variables for each search mode for each search. In task one Boolean is 
differentiated from best match search by sharing high loads on active 
interest, intention to reproduce, fear of failure, and relating ideas. The 
combined searcher is linked with the best match searcher with low active 
interest, low intention to reproduce and low fear of failure. In task 
2  Boolean is differentiated from best match search by sharing high loads 
on intention to reproduce and low on intention to understand. Best match 
loads positively with intention to understand and negatively with intention 
to reproduce. Combined searching linked with both good and with poor time 
management.  In task 3 the loads mimic task 1.  It seems Boolean is 
consistently linked to a reproductive rather than a meaning seeking 
approach, but also with high levels of interest and fear of failure. Best 
match associates with the converse of these measures.

Three Target Document Range Metrics for University Web Sites
Mike Thewall and David Wilkinson
489
Published online 25 February 2003
          Thelwall and Wilkinson use crawls of university web sites in the 
UK, Australia, and New Zealand to generate all links targeted at same 
country university web sites which they then use to create a graph 
structure for study. Using Broder's study as a model they identify a 
strongly connected component, SCC, where one could start anywhere in the 
set and reach every other page, and an Out component whose pages  can be 
reached from all strongly connected pages but provide no link back to that 
set. The other components in the Broder model are not accessible except 
with access to a major search engine database. In link and out link counts 
for all three university systems in both the Out and SCC components when 
graphed logarithmically display the linear nature which would indicate that 
power laws, and a success breeds success phenomena, are generally in 
effect. However, automatically generated pages, non-HTML web pages, and 
large resource-driven sites all were associated with anomalies in this 
observation.

Searching for Images: The Analysis of Users' Queries for Image Retrieval in 
American History
Youngok Choi and Edie M. Rasmussen
497
Published online 25 February 2003
         Choi and Rasmussen collect queries to the Library of Congress's 
American Memory photo archive from 48 scholars in American History by way 
of interviews and pre and post search questionnaires. Their interest is in 
the types of information need common in the visual domain, and the 
categories of terms most often used or indicated as appropriate for the 
description of image contents. Each search resulted in the provision of 20 
items for evaluation by the searcher. Terms in queries and acceptable 
retrievals were categorized by a who, what, when, where faceted 
classification and queries into four needs categories; specific, general, 
abstract, and subjective. Two out of three analysts assigned all 38 
requests into the same one of the four categories and in 19 cases all three 
agreed. General/nameable needs accounted for 60.5%, specific needs 26.3%, 
7.9% for general/abstract, and 5.3% for subjective needs. The facet 
analysis indicated most content was of the form person/thing or 
event/condition limited by geography or time.

Information as Commodity and Economic Sector: Its Emergence in the 
Discourse of Industrial Classification
Cheryl Knott Malone and Fernando Elichirigoity
511
Published online 25 February 2003
         Malone and Elichirigoity review the concept of "information" as it 
exists in the 1997 implemented North American Industry Classification 
System (NAICS), the current scheme for the organization of governmental 
data about the economies of the U.S., Canada, and Mexico. The term 
represents one of 20 major economic sectors based upon processes of 
production and upon which data may be reported. It also represents a 
measurable commodity based upon the concept of copyright. A review of the 
background studies and reports which document the development of NAICS 
shows the desire for a single underlying principle, similarity of 
production processes rather than a marketing approach, and the construction 
of the information sector within the context of globalization and the 
internet. The three nations agreed in 1996 that the information sector 
should consist of industries engaged in the "transformation of information 
into a commodity that is produced, manipulated and distributed...," or as 
the NAICS manual states, industries that "primarily create and disseminate 
a product subject to copyright." However, industries that transfer or 
transport such products are also included which seems inconsistent with the 
production principle. In 2002 the category was modified to separate 
internet publishing and broadcasting from these subcategories and to create 
an internet services category.

A Method for the Comparative Analysis of Concentration of Author 
Productivity, Giving Consideration to the Effect of Sample Size Dependency 
of Statistical Measures
Fuyuki Yoshikane, Kyo Kageura, and Keita Tsuji
511

Published online 25 February 2003
         Studies of the concentration of author productivity based upon 
counts of papers by individual authors will produce measures that change 
systematically with sample size.  Yoshikane,  Kageura, and Tsuji seek a 
statistical framework which will avoid this scale effect problem. Using the 
number of authors in a field as an absolute concentration measure, and 
Gini's index as a relative concentration measure, they describe 
four  literatures form both viewpoints with measures insensitive to one 
another. Both measures will increase with sample size. They then plot 
profiles of the two measures on the basis of a Monte-Carlo simulation of 
1000 trials for 20 equally spaced intervals and compare the characteristics 
of the literatures. Using data from conferences hosted by four academic 
societies between 1992 and 1997, they find a coefficient of loss exceeding 
0.15 indicating measures will depend highly on sample size. The simulation 
shows that a larger sample size leads to lower absolute concentration and 
higher relative concentration. Comparisons made at the same sample size 
present quite different results than the original data and allow direct 
comparison of population characteristics.

Incorporating User Search Behavior into Relevance Feedback
Ian Ruthven, Mounia Lalmas, and Keith van Rijsbergen
528
Published online 25 February 2003
         Ruthvewn,  Mounia, and van Rijsbergen rank and select terms for 
query expansion using information gathered on searcher evaluation behavior. 
Using the TREC Financial Times and Los Angeles Times collections and search 
topics from TREC-6 placed in simulated work situations, six student 
subjects each preformed three searches on an experimental system and three 
on a control system with instructions to search by natural language 
expression in any way they found comfortable. Searching was analyzed for 
behavior differences between experimental and control situations, and for 
effectiveness and perceptions. In three experiments paired t-tests were the 
analysis tool with controls being a no relevance feedback system, a 
standard ranking for automatic expansion system, and a standard ranking for 
interactive expansion while the  experimental systems based ranking upon 
user information on temporal relevance and partial relevance.  Two further 
experiments compare using user behavior (number assessed relevant and 
similarity of relevant documents) to choose a query expansion technique 
against a non-selective technique and finally the effect of providing the 
user with knowledge of the process. When partial relevance data and time of 
assessment data are incorporated in term ranking more relevant documents 
were recovered in fewer iterations, however retrieval effectiveness overall 
was not improved. The subjects, none-the-less, rated the suggested terms as 
more useful and used them more heavily. Explanations of what the feedback 
techniques were doing led to higher use of the techniques.

Requirements for a Cocitation Similarity Measure, with Special Reference to 
Pearson's Correlation Coefficient
Per Ahlgren, Bo Jarneving, and Ronald Rousseau
549
Published online 25 February 2003
         Ahlgren,  Jarneving, and. Rousseau review accepted procedures for 
author co-citation analysis first pointing out that since in the raw data 
matrix the row and column values are identical i,e, the co-citation count 
of two authors, there is no clear choice for diagonal values. They suggest 
the number of times an author has been co-cited with himself excluding self 
citation rather than the common treatment as zeros or as missing values. 
When the matrix is converted to a similarity matrix the normal procedure is 
to create a matrix of Pearson's r coefficients between data vectors. 
Ranking by r and by co-citation frequency and by intuition can easily yield 
three different orders. It would seem  necessary that the adding of zeros 
to the matrix will not affect the value or the relative order of similarity 
measures but it is shown that this is not the case with Pearson's r. Using 
913 bibliographic descriptions form the Web of Science of articles form 
JASIS and Scientometrics, authors names were extracted, edited and 12 
information retrieval authors and 12 bibliometric authors each from the top 
100 most cited were selected. Co-citation and r value (diagonal elements 
treated as missing) matrices were constructed, and then reconstructed in 
expanded form. Adding zeros can both change the r value and the ordering of 
the authors based upon that value. A chi-squared distance measure would not 
violate these requirements, nor would the cosine coefficient. It is also 
argued that co-citation data is ordinal data since there is no assurance of 
an absolute zero number of co-citations, and thus Pearson is not 
appropriate. The number of ties in co-citation data make the use of the 
Spearman rank order coefficient problematic.

Modeling the Information-Seeking Behavior of Social Scientists: Ellis's 
Study Revisited
Lokman I. Meho and Helen R. Tibbo
569
Published online 25 February 2003
         Meho and  Tibbo show that the Ellis model of information seeking 
applies to a web environment by way of a replication of his study in this 
case using behavior of social science faculty studying stateless nations, a 
group diverse in skills, origins, and research specialities. Data were 
collected by way of e-mail interviews.  Material on stateless nations was 
limited to papers in English on social science topics published between 
1998 and 2000. Of these 251 had 212  unique authors identified as academic 
scholars and had sufficient information to provide e-mail addresses. Of the 
139 whose addresses were located, 9 who were physically close were reserved 
for face to face interviews, and of the remainder 60 agreed to participate 
and responded to the 25 open ended question interview. Follow up questions 
generated a 75% response. Of the possible face to face interviews five 
agreed to participate and provided 26 thousand words as opposed to 69 
thousand by the 45 e-mail participants. The activities of the Ellis model 
are confirmed but four additional activities are also identified. These 
are: accessing, i.e. finding the material identified in indirect sources of 
information;  networking, or the maintaining of close contacts with a wide 
range of colleagues and other human sources;  verifying, i.e. checking the 
accuracy of new information; and information managing, the filing and 
organizing of collected information.  All activities are grouped into four 
stages: searching, accessing, processing, and ending.

BOOK REVIEWS

Electronic Collection Department: A Practical Guide, by Stuart D. Lee
487
Reviewed by Marianne Afifi
Published online 25 February 2003

Beyond Our Control? Confronting the Limits of Our Legal System in the Age 
of CyberSpace, by Stuart Biegel
588
Reviewed by Kenneth Einar Himma
Published online 25 February 2003

Economic Growth in the Information Age, by Dale W. Jorgensen
591
Reviewed by John Cullen
Published online 25 February 2003

------------------------------------------------------
The ASIS web site <http://www.asis.org/Publications/JASIS/tocs.html> 
contains the Table of Contents and brief abstracts as above from January 
1993 (Volume 44) to date.

The John Wiley Interscience site <http://www.interscience.wiley.com> 
includes issues from 1986 (Volume 37) to date.  Guests have access only to 
tables of contents and abstracts.  Registered users of the interscience 
site have access to the full text of these issues and to preprints.

Executive Director
American Society for Information Science and Technology
1320 Fenwick Lane, Suite 510
Silver Spring, MD  20910
FAX: (301) 495-0810
PHONE: (301) 495-0900

http://www.asis.org