[Sigia-l] Information Visualization
Karl Fast
karl.fast at pobox.com
Tue Nov 11 10:51:45 EST 2003
> Likewise, IV has a few technical barriers, chief among them is
> scalability.....Unfortunately, IV feedback on large data sets is
> impossible now and will remain so in the near future.
Here is a counter example to the scaling impossibilities.
A recent paper in IPM (Information Processing & Management)
describes a visualization system that generates author co-citation
maps in real-time.
- the database has 1.26 million citation records supplied by ISI
(i.e.: Web of Science, the world's largest scientific citation
database).
- maps are generated using either kohonen feature maps or
pathfinder networks, the two most common algorithms used for
visualizing co-citation networks
- it is all done in real-time and works over the web. It's not
public (you need a password), but I have used it and it spits
back new maps as fast as Google returns results.
- I have heard that this paper actually represents work completed
two years ago....and they're way beyond this now
Certainly scaling is a challenge in many cases, but it's been
vanquished in others.
Xia Lin pioneered the use of self-organizing Kohonen maps in
information science. In his JASIS paper ('97, I think) he required a
Cray to generate maps for a few hundred documents. Now he's dealing
with over a million records. That's a huge leap forward.
For those who want the gory details:
Lin, X., White, Howard D., & Buzydlowski J. (2003). Real-time author
co-citation mapping for online searching, Information Processing &
Management, 39, 689-706.
Author searching is traditionally based on the matching of name
strings. Special characteristics of authors as personal names and
subject indicators are not considered. This makes it di cult to
identifya set of related authors or to group authors bysub jects
in retrieval systems. In this paper, we describe the design and
implementation of a prototype visualization system to enhance
author searching. The system, called AuthorLink, is based on
author co-citation analysis and visualization mapping algorithms
such as Kohonen's feature maps and Pathinder networks. AuthorLink
produces interactive author maps in real time from a database of
1.26 million records supplied bythe Institute for Scienti c
Information. The maps show subject groupings and more ne-grained
intellectual connections among authors. Through the interactive
interface the user can take advantage of such information to re ne
queries and retrieve documents through point-and-click
manipulation of the authors names.
> I've written on this here before, but bandwidth, database access and
> app server latency are insurmountable barriers when serving IV in
> large numbers, with no solution in sight.
It's not so insurmountable. At least not in this case.
That doesn't mean that the scalability problem has been universally
licked, but the problem isn't universally intractable either.
--karl
More information about the Sigia-l
mailing list