[Asis-l] JASIST TOC, Vol 54, Number 6
Richard Hill
rhill at asis.org
Mon Mar 3 14:52:20 EST 2003
Journal of the American Society for Information Science and Technology
JASIST
VOLUME 54, NUMBER 6
[Note: and the end of this message are URLs for viewing contents of JASIST
from past issues. The contents of Bert Boyce's "In this Issue" has been
cut into the Table of Contents.]
CONTENTS
IN THIS ISSUE
Bert L. Boyce
471
RESEARCH
Web Search Strategies and Approaches to Studying
Nigel Ford, David Miller, and Nicola Moss
473
Published online 25 February 2003
In this issue Ford, Miller and Moss utilize 68 volunteers from a
population of 250 Master's students to complete on the web three search
tasks with clear fact based goals and three or less facets. One task
required broadening the search concepts from those given, a second provided
a specific terminology for one facet but required a second facet that would
require translation, and the third required general to specific
transformation. The students were measured as to their performance on
Entewistle's Revised Inventory of Approaches to Studying providing values
for ten study variables and asked to assess their experience on the
Internet, with Alta Vista, and with Boolean search. Searches were conducted
on Alta Vista using Netscape Navigator 4 with participants free to choose
and switch Boolean, best match or combined search modes at will while a
front end script recorded all submitted searches and help access. Search
related variables extracted were from Boolean only queries, best match
only queries, and combined queries. Factor analyses were conducted on all
variables for each search mode for each search. In task one Boolean is
differentiated from best match search by sharing high loads on active
interest, intention to reproduce, fear of failure, and relating ideas. The
combined searcher is linked with the best match searcher with low active
interest, low intention to reproduce and low fear of failure. In task
2 Boolean is differentiated from best match search by sharing high loads
on intention to reproduce and low on intention to understand. Best match
loads positively with intention to understand and negatively with intention
to reproduce. Combined searching linked with both good and with poor time
management. In task 3 the loads mimic task 1. It seems Boolean is
consistently linked to a reproductive rather than a meaning seeking
approach, but also with high levels of interest and fear of failure. Best
match associates with the converse of these measures.
Three Target Document Range Metrics for University Web Sites
Mike Thewall and David Wilkinson
489
Published online 25 February 2003
Thelwall and Wilkinson use crawls of university web sites in the
UK, Australia, and New Zealand to generate all links targeted at same
country university web sites which they then use to create a graph
structure for study. Using Broder's study as a model they identify a
strongly connected component, SCC, where one could start anywhere in the
set and reach every other page, and an Out component whose pages can be
reached from all strongly connected pages but provide no link back to that
set. The other components in the Broder model are not accessible except
with access to a major search engine database. In link and out link counts
for all three university systems in both the Out and SCC components when
graphed logarithmically display the linear nature which would indicate that
power laws, and a success breeds success phenomena, are generally in
effect. However, automatically generated pages, non-HTML web pages, and
large resource-driven sites all were associated with anomalies in this
observation.
Searching for Images: The Analysis of Users' Queries for Image Retrieval in
American History
Youngok Choi and Edie M. Rasmussen
497
Published online 25 February 2003
Choi and Rasmussen collect queries to the Library of Congress's
American Memory photo archive from 48 scholars in American History by way
of interviews and pre and post search questionnaires. Their interest is in
the types of information need common in the visual domain, and the
categories of terms most often used or indicated as appropriate for the
description of image contents. Each search resulted in the provision of 20
items for evaluation by the searcher. Terms in queries and acceptable
retrievals were categorized by a who, what, when, where faceted
classification and queries into four needs categories; specific, general,
abstract, and subjective. Two out of three analysts assigned all 38
requests into the same one of the four categories and in 19 cases all three
agreed. General/nameable needs accounted for 60.5%, specific needs 26.3%,
7.9% for general/abstract, and 5.3% for subjective needs. The facet
analysis indicated most content was of the form person/thing or
event/condition limited by geography or time.
Information as Commodity and Economic Sector: Its Emergence in the
Discourse of Industrial Classification
Cheryl Knott Malone and Fernando Elichirigoity
511
Published online 25 February 2003
Malone and Elichirigoity review the concept of "information" as it
exists in the 1997 implemented North American Industry Classification
System (NAICS), the current scheme for the organization of governmental
data about the economies of the U.S., Canada, and Mexico. The term
represents one of 20 major economic sectors based upon processes of
production and upon which data may be reported. It also represents a
measurable commodity based upon the concept of copyright. A review of the
background studies and reports which document the development of NAICS
shows the desire for a single underlying principle, similarity of
production processes rather than a marketing approach, and the construction
of the information sector within the context of globalization and the
internet. The three nations agreed in 1996 that the information sector
should consist of industries engaged in the "transformation of information
into a commodity that is produced, manipulated and distributed...," or as
the NAICS manual states, industries that "primarily create and disseminate
a product subject to copyright." However, industries that transfer or
transport such products are also included which seems inconsistent with the
production principle. In 2002 the category was modified to separate
internet publishing and broadcasting from these subcategories and to create
an internet services category.
A Method for the Comparative Analysis of Concentration of Author
Productivity, Giving Consideration to the Effect of Sample Size Dependency
of Statistical Measures
Fuyuki Yoshikane, Kyo Kageura, and Keita Tsuji
511
Published online 25 February 2003
Studies of the concentration of author productivity based upon
counts of papers by individual authors will produce measures that change
systematically with sample size. Yoshikane, Kageura, and Tsuji seek a
statistical framework which will avoid this scale effect problem. Using the
number of authors in a field as an absolute concentration measure, and
Gini's index as a relative concentration measure, they describe
four literatures form both viewpoints with measures insensitive to one
another. Both measures will increase with sample size. They then plot
profiles of the two measures on the basis of a Monte-Carlo simulation of
1000 trials for 20 equally spaced intervals and compare the characteristics
of the literatures. Using data from conferences hosted by four academic
societies between 1992 and 1997, they find a coefficient of loss exceeding
0.15 indicating measures will depend highly on sample size. The simulation
shows that a larger sample size leads to lower absolute concentration and
higher relative concentration. Comparisons made at the same sample size
present quite different results than the original data and allow direct
comparison of population characteristics.
Incorporating User Search Behavior into Relevance Feedback
Ian Ruthven, Mounia Lalmas, and Keith van Rijsbergen
528
Published online 25 February 2003
Ruthvewn, Mounia, and van Rijsbergen rank and select terms for
query expansion using information gathered on searcher evaluation behavior.
Using the TREC Financial Times and Los Angeles Times collections and search
topics from TREC-6 placed in simulated work situations, six student
subjects each preformed three searches on an experimental system and three
on a control system with instructions to search by natural language
expression in any way they found comfortable. Searching was analyzed for
behavior differences between experimental and control situations, and for
effectiveness and perceptions. In three experiments paired t-tests were the
analysis tool with controls being a no relevance feedback system, a
standard ranking for automatic expansion system, and a standard ranking for
interactive expansion while the experimental systems based ranking upon
user information on temporal relevance and partial relevance. Two further
experiments compare using user behavior (number assessed relevant and
similarity of relevant documents) to choose a query expansion technique
against a non-selective technique and finally the effect of providing the
user with knowledge of the process. When partial relevance data and time of
assessment data are incorporated in term ranking more relevant documents
were recovered in fewer iterations, however retrieval effectiveness overall
was not improved. The subjects, none-the-less, rated the suggested terms as
more useful and used them more heavily. Explanations of what the feedback
techniques were doing led to higher use of the techniques.
Requirements for a Cocitation Similarity Measure, with Special Reference to
Pearson's Correlation Coefficient
Per Ahlgren, Bo Jarneving, and Ronald Rousseau
549
Published online 25 February 2003
Ahlgren, Jarneving, and. Rousseau review accepted procedures for
author co-citation analysis first pointing out that since in the raw data
matrix the row and column values are identical i,e, the co-citation count
of two authors, there is no clear choice for diagonal values. They suggest
the number of times an author has been co-cited with himself excluding self
citation rather than the common treatment as zeros or as missing values.
When the matrix is converted to a similarity matrix the normal procedure is
to create a matrix of Pearson's r coefficients between data vectors.
Ranking by r and by co-citation frequency and by intuition can easily yield
three different orders. It would seem necessary that the adding of zeros
to the matrix will not affect the value or the relative order of similarity
measures but it is shown that this is not the case with Pearson's r. Using
913 bibliographic descriptions form the Web of Science of articles form
JASIS and Scientometrics, authors names were extracted, edited and 12
information retrieval authors and 12 bibliometric authors each from the top
100 most cited were selected. Co-citation and r value (diagonal elements
treated as missing) matrices were constructed, and then reconstructed in
expanded form. Adding zeros can both change the r value and the ordering of
the authors based upon that value. A chi-squared distance measure would not
violate these requirements, nor would the cosine coefficient. It is also
argued that co-citation data is ordinal data since there is no assurance of
an absolute zero number of co-citations, and thus Pearson is not
appropriate. The number of ties in co-citation data make the use of the
Spearman rank order coefficient problematic.
Modeling the Information-Seeking Behavior of Social Scientists: Ellis's
Study Revisited
Lokman I. Meho and Helen R. Tibbo
569
Published online 25 February 2003
Meho and Tibbo show that the Ellis model of information seeking
applies to a web environment by way of a replication of his study in this
case using behavior of social science faculty studying stateless nations, a
group diverse in skills, origins, and research specialities. Data were
collected by way of e-mail interviews. Material on stateless nations was
limited to papers in English on social science topics published between
1998 and 2000. Of these 251 had 212 unique authors identified as academic
scholars and had sufficient information to provide e-mail addresses. Of the
139 whose addresses were located, 9 who were physically close were reserved
for face to face interviews, and of the remainder 60 agreed to participate
and responded to the 25 open ended question interview. Follow up questions
generated a 75% response. Of the possible face to face interviews five
agreed to participate and provided 26 thousand words as opposed to 69
thousand by the 45 e-mail participants. The activities of the Ellis model
are confirmed but four additional activities are also identified. These
are: accessing, i.e. finding the material identified in indirect sources of
information; networking, or the maintaining of close contacts with a wide
range of colleagues and other human sources; verifying, i.e. checking the
accuracy of new information; and information managing, the filing and
organizing of collected information. All activities are grouped into four
stages: searching, accessing, processing, and ending.
BOOK REVIEWS
Electronic Collection Department: A Practical Guide, by Stuart D. Lee
487
Reviewed by Marianne Afifi
Published online 25 February 2003
Beyond Our Control? Confronting the Limits of Our Legal System in the Age
of CyberSpace, by Stuart Biegel
588
Reviewed by Kenneth Einar Himma
Published online 25 February 2003
Economic Growth in the Information Age, by Dale W. Jorgensen
591
Reviewed by John Cullen
Published online 25 February 2003
------------------------------------------------------
The ASIS web site <http://www.asis.org/Publications/JASIS/tocs.html>
contains the Table of Contents and brief abstracts as above from January
1993 (Volume 44) to date.
The John Wiley Interscience site <http://www.interscience.wiley.com>
includes issues from 1986 (Volume 37) to date. Guests have access only to
tables of contents and abstracts. Registered users of the interscience
site have access to the full text of these issues and to preprints.
Executive Director
American Society for Information Science and Technology
1320 Fenwick Lane, Suite 510
Silver Spring, MD 20910
FAX: (301) 495-0810
PHONE: (301) 495-0900
http://www.asis.org
More information about the Asis-l
mailing list