Eric G. Ackermann "The Laws of the Web: Patterns in the Ecology of Information" by Bernardo A. Huberman. Cambridge, MA. MIT Press, 2002. 105 pp

Tue Dec 17 14:48:14 EST 2002

Eric G. Ackermann: E-mail: eackerma at vt.edu

Eric G. Ackermann  "The Laws of the Web: Patterns in the Ecology of
Information" by Bernardo A. Huberman. Cambridge, MA. MIT Press, 2002. 105 pp

Title     The laws of the Web: Patterns in the ecology of information.
           by Huberman BA
Author    Ackermann EG
Journal   JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
          TECHNOLOGY  53 (11): 969-970 SEP 2002

 Document type: Book Review  Language: English
 Cited References: 5         Times Cited: 0

Addresses:
Ackermann EG, Virginia Polytech Inst & State Univ, Blacksburg, VA 24061 USA
Virginia Polytech Inst & State Univ, Blacksburg, VA 24061 USA

Publisher:
JOHN WILEY & SONS INC, NEW YORK

IDS Number:
587TE

ISSN:
1532-2882

 Cited Author            Cited Work                Volume      Page    Year

 BRODER A              GRAPH STRUCTURE WEB                             2000
 GARFIELD E            ESSAYS INFORMATION S           1       222      1971
 HUBERMAN BA           LAWS WEB PATTERNS EC                            2001
 LOTKA AJ              J WASHINGTON ACADEMY          16       317      1926
 SEGLEN PO             J AM SOC INFORM SCI           43       628      1992

FULL TEXT OF REVIEW:

The Laws of the Web: Patterns in the Ecology of Information.  Bernardo A.
Huberman.  Cambridge, MA: The MIT Press, 2001;  105 pps.  Price: $24.95
(ISBN: 0-262-08303-5)

Bernardo Huberman has written what amounts to an extended explanatory essay
summarizing the extensive research into the structure of the World Wide Web
and the regularities that underlie it that he and his colleagues conducted
over a thirteen-year period. It is not a scholarly work in the traditional
sense, as it has few citations, and most of those only to the empirical
studies upon which it is based. Huberman's goal is instead to communicate in
plain language the concepts and findings originally expressed in the
technical jargon and mathematical formalism of the original studies, as well
as their practical implications for Web site design, server technology, and
e-commerce.

Huberman is currently an HP Fellow and the Director of the Information
Dynamics Laboratory at the Hewlett-Packard Laboratories in Palo Alto,
California. He is both a Consulting Professor in the Department of Applied
Physics and a faculty member in the Symbolic Systems Program at Stanford
University. Huberman has a Ph.D. in Physics from the University of
Pennsylvania, and has published in the areas of condensed matter physics,
chaos in physical systems, non-linear dynamical systems, artificial
intelligence, large distributed systems, and the dynamics of the growth and
use of the World Wide Web (for more detail, see
http://www.hpl.hp.com/shl/people/huberman/).

The book is published in a relatively small format with only ninety-nine
pages of text. It is organized into eight chapters and an epilogue: Chapter
1 ("E-cology"), Chapter 2 ("The Phenomena of the Web"), Chapter 3
("Evolution and Structure"), Chapter 4 ("Small Worlds"), Chapter 5 ("As We
Surf"), Chapter 6 ("Social Dilemmas and Internet Congestion"), Chapter 7
("Downloading Information"), and Chapter 8 ("Markets and the Web").
Following the Epilogue, there is a short list of references and a brief
index.

The book is well illustrated with graphs and charts that are clearly
presented and well labeled. The text is written very clearly and cleanly,
with concepts fully explained using no discernable jargon or quantification.
There were only two inaccuracies detected. On page ten, it is stated that
telephony started to spread throughout the United States at the beginning of
the twenty-first century instead of the twentieth century, and on page
eleven where Moore's Law "states that silicon chips double in complexity
every two years" instead of every eighteen months (see
http://www.webopedia.com/TERM/M/Moores_Law.html). Though annoying, neither
error seems to adversely affect the quality of the work.

In this book, Huberman maintains that "in spite of its haphazard growth the
Web hides powerful underlying regularities" (p. viii). These regularities or
laws were first predicted by using statistical mechanics and non-linear
dynamics to develop theoretical models for the study of human behavior
acting in "large distributed systems, ranging from economic systems to the
Internet at large" (p. viii). His methodology focuses on the study of an
aggregate system's behavior, in this case the Web, rather than on individual
actions, using statistical models created by physicists to explain "the
behavior of matter in terms of its constituent components, such as atoms and
molecules" (p.23). This methodology allows investigators to bridge the gap
between individual actions and systemic behavior.

The data was gathered not from the Web itself but from the Internet Archive,
a group located in San Francisco that periodically captures and stores "the
entire textural content of the World Wide Web" (p.1). Huberman characterizes
this effort as the equivalent of "a giant ecological survey" of the Internet
(p.3), itself an ecology of knowledge and information composed of a rich
array of dynamic interaction and structural complexity. Huberman finds the
Web a perfect laboratory in which to study human behavior and information
foraging "with a precision and on a scale never possible before" (p. 16).

The complexity of the Web and its distributed nature makes the Internet
behave as a non-linear system. According to Huberman, a non-linear system is
one "whose behaviors cannot be explained by just adding all the partial
actions" of its components (p. 21). Such a system exhibits erratic behavior,
is extremely sensitive to initial conditions, and produces systemic results
that often seem to imply no direct connection between "the well-defined
behavior of the components and the global outcome that one observes" (p.
21).

Huberman as well as other researchers (e.g., Broder, Kumar, Maghoul, et al
2000), discovered an underlying structure to the seeming chaos of the
Internet. The structure is composed of a number of regularities or laws
known as power law distributions. Distributions are mathematical entities
that describe or quantify "how many instances of a given size" occur in the
system under study, in this case "the patterns observed on the Web" by
researchers (p. 25). A power law distribution then describes in a
probabilistic fashion phenomena "where large events are rare, but small ones
quite common" (p. 31). For example, "the probability of finding a Web site
with a given number of pages, n, is proportional to 1/nx, where x is a
number greater than or equal to 1" (p.25; see also Broder, Kumar, Maghoul,
et al, 2000). This characteristic is shared by other informatic and
bibliometric distributions, such as Zipf's Law (rank distribution of word
use) (p. 31; Broder, Kumar, Maghoul, et al, 2000), Bradford's Law of
Scattering (distribution of journal use) (Garfield, 1971), Lotka's Law
(productivity distribution of scientific papers) (Lotka, 1926), and the
distribution of citations to scientific papers (Seglen, 1992). Power law
distributions are independent of scale, so that it remains the same
regardless of the range of instances (e.g., 100 to 1000, or 50,000 to
100,000) of the phenomena (e.g., Web pages) being studied. These power law
distributions form Huberman's laws of the Web: the Law of Link Structure,
Law of Surfing, Law of Congestion, and Law of Web Site Visitation.

The Law of Link Structure is based on the finding that an average of four
clicks (or links) separates any two randomly chosen Web sites, while only an
average of nineteen clicks (or links) separate any two randomly chosen Web
pages. This law implies the presence of what Huberman calls "small world
networks" (p. 36). Small world networks are communities of "common
affinities" arranged in a core of several sites that contain most of the
most relevant pages and "hubs" or "pages that contain links to many other
good pages" (p. 39). The power law-like distribution is found in the pattern
of the Web's link structure and small world communities where a few users
(and Web sites) have the most links to others, while most users (and Web
sites) have only a few links to others. The practical implications of the
Law of Link Structure will result, according to Huberman, in better search
engine design and better electronic "marketing of specific products" (p.
37).

The study of surfing behavior or how Web users move from link to link
reveals a previously unknown relationship with the biological concept of
Brownian motion. Brownian motion describes the average "behavior of
particles executing random motions as opposed to the exact wandering of a
single one" (p. 44). The Law of Surfing "determines the number of users who
will surf to a given depth" within a given Web site (p. 47, 53). According
to Huberman, any information value found by surfing fluctuates. When the
information value found by a surfer reaches a certain value or threshold,
surfing stops. What is important then is the number of clicks a user is
willing to make before ending the session, or the "average number of clicks
per session" (p. 45). This finding can then be used to design better Web
sites and commercial portals by helping to determine at what point a site
must provide any surfers with a sufficient amount of good quality
information or other the incentives to go to other pages.

The Law of Congestion is an example of what Huberman calls "social
dilemmas", which are a "class of problems that are pervasive in society and
difficult to resolve" (p. 56). For the Web, it involves greedy consumption
of bandwidth encouraged by the flat fee structure governing Internet access.
If everyone indulges in such greedy behavior, the overall performance of the
Internet is degraded. Yet for an individual to exercise restraint in order
to keep the Web free of undue congestion does not seem like rational
economic behavior to Huberman. Actual observed Web use however does not
conform to this model. Most of the time greedy behavior does not consume all
the available bandwidth. Instead the Internet experiences sudden "spikes of
congestion" or "Internet storms" that quickly subside as congestion becomes
unacceptable and users cut back or stop their surfing (p. 61).

The Law of Congestion can have serious e-commerce implications, especially
for on-line banks with many complicated transactions to process and foreign
currency traders who execute on-line trades in a rapidly changing
information environment. Based on financial portfolio theory, Huberman
devised and tested a group of strategies analogous to asset diversification
that continuously sought the "most efficient trade-off between the average
time a request will take and the variance or risk in that time" (p. 76).

Computer simulations also showed that even if everyone used a portfolio
strategy, congestion would be minimum because all the action is asynchronous
or not happening at the same time. In the worse case scenario when everyone
is using the optimum strategy, "the situation is no worse than when no one
used the strategy" (p. 81).

The Law of Web Site Visitation was discovered by an analysis of the pattern
of commercial transactions on the Internet. It turns out that for all the
Web sites examined, as well as for sites "in specific categories like sex,
travel, or education, the distribution of visitors per site follows a
universal power law" (p. 88). According to Huberman, "a small number of
sites command the traffic of a large segment of the Web population, a
signature of winner-take-all markets" (p. 89). The implication for Web site
development and e-commerce is clear. There is a very low probability that
any given new Web site will "capture a significant number of users," whereas
it is far more likely to become just one more Web site that attracts only a
few users a day (p. 89)

With the discovery and commercial exploitation of the laws or regularities
in the structure of the Web, Huberman predicts the continued growth of
e-commerce and the development in the future of frictionless markets
characterized by "strong price competition, ease of search for best values,
and low margins for the producers" (p. 86), and free of government
regulation. Huberman briefly discusses the problem of privacy and the
reconciliation of the two main approaches taken to address the abuses, a
libertarian one (favored by the United States and Huberman) where
"individuals and companies, rather than the government" (favored by the
Europeans) resolve any problems (p.98). He touches briefly on the problem of
applying conflicting local, national and regional laws to the Internet, and
predicts a "brave new world" of ongoing "interplay between news laws and
their enforcement, and the ingenuity of those who can always device (sic)
novel [technological and legal] ways of bypassing those restrictions" (e.g.,
Napster and Gnutella) (p. 99).

Huberman successfully packed quite a bit of interesting information into a
relatively short book. For those looking for a more in-depth or technical
treatment of the nature and structure of the Web and its underlying power
law distributions will have to look elsewhere. For those satisfied with a
more superficial treatment packaged in a stimulating and thought-provoking
manner, or for the interested reader looking for a good introduction to the
subject, this book is for you.

Eric G. Ackermann
University Libraries
Virginia Polytechnic Institute and State University
P.O. Box 90001
Blacksburg, VA  24062-9001
E-mail: eackerma at vt.edu

References
Broder, A., Kumar, R., Maghoul, F., Raghavan, P, Rajagopalan, S., Stata, R.,
Tomkins,
A., & Wiener, J. (2000). Graph structure in the web [On-line]. Available:
http://www.almaden.ibm.com/cs/k53/www9.final

Garfield, E. (1971). The mystery of the transposed journal lists- Wherein
Bradford's Law
of Scattering is generalized according to Garfield's Law of Concentration.
In E.
Garfield, Essays of an Information Scientist, vol. 1 (pp.222-223).
Philadelphia:
ISI Press.

Lotka, A. (1926). The frequency distribution of scientific productivity.
Journal of the
Washington Academy of Science 16: 317-323.

Seglen, P. (1992). The skewness of science. Journal of the American Society
for
Information Science 43 (9): 628-638.

When responding, please attach my original message
_______________________________________________________________________
Eugene Garfield, PhD.  email: garfield at codex.cis.upenn.edu
home page: www.eugenegarfield.org
Tel: 215-243-2205 Fax 215-387-1266
President, The Scientist LLC. www.the-scientist.com
Chairman Emeritus, ISI www.isinet.com
Past President, American Society for Information Science and Technology
(ASIS&T)  www.asis.org
_______________________________________________________________________