[Sigia-l] Information Visualization

Karl Fast karl.fast at pobox.com
Tue Nov 11 10:51:45 EST 2003


> Likewise, IV has a few technical barriers, chief among them is
> scalability.....Unfortunately, IV feedback on large data sets is
> impossible now and will remain so in the near future. 

Here is a counter example to the scaling impossibilities. 

A recent paper in IPM (Information Processing & Management)
describes a visualization system that generates author co-citation
maps in real-time. 

 - the database has 1.26 million citation records supplied by ISI
   (i.e.: Web of Science, the world's largest scientific citation
   database).

 - maps are generated using either kohonen feature maps or
   pathfinder networks, the two most common algorithms used for
   visualizing co-citation networks

 - it is all done in real-time and works over the web. It's not
   public (you need a password), but I have used it and it spits
   back new maps as fast as Google returns results.

 - I have heard that this paper actually represents work completed
   two years ago....and they're way beyond this now


Certainly scaling is a challenge in many cases, but it's been
vanquished in others. 

Xia Lin pioneered the use of self-organizing Kohonen maps in
information science. In his JASIS paper ('97, I think) he required a
Cray to generate maps for a few hundred documents. Now he's dealing
with over a million records. That's a huge leap forward. 

   
For those who want the gory details:
   
Lin, X., White, Howard D., & Buzydlowski J. (2003). Real-time author
  co-citation mapping for online searching, Information Processing &
  Management, 39, 689-706.

  Author searching is traditionally based on the matching of name
  strings. Special characteristics of authors as personal names and
  subject indicators are not considered. This makes it di cult to
  identifya set of related authors or to group authors bysub jects
  in retrieval systems. In this paper, we describe the design and
  implementation of a prototype visualization system to enhance
  author searching. The system, called AuthorLink, is based on
  author co-citation analysis and visualization mapping algorithms
  such as Kohonen's feature maps and Pathinder networks. AuthorLink
  produces interactive author maps in real time from a database of
  1.26 million records supplied bythe Institute for Scienti c
  Information. The maps show subject groupings and more ne-grained
  intellectual connections among authors. Through the interactive
  interface the user can take advantage of such information to re ne
  queries and retrieve documents through point-and-click
  manipulation of the authors names.

> I've written on this here before, but bandwidth, database access and
> app server latency are insurmountable barriers when serving IV in
> large numbers, with no solution in sight.

It's not so insurmountable. At least not in this case. 

That doesn't mean that the scalability problem has been universally
licked, but the problem isn't universally intractable either.


--karl



More information about the Sigia-l mailing list