[Sigia-l] "Automatic Meaning Discovery Using Google"

Sun Jan 30 01:56:48 EST 2005

Wouldn't that be great? Well, perhaps it may be closer than we think:

The "Google distance"

... Paul Vitanyi and Rudi Cilibrasi of the National Institute for
Mathematics and Computer Science in Amsterdam, the Netherlands, realised
that a Google search can be used to measure how closely two words relate to
each other. For instance, imagine a computer needs to understand what a hat
is.

To do this, it needs to build a word tree - a database of how words relate
to each other. It might start with any two words to see how they relate to
each other. For example, if it googles "hat" and "head" together it gets
nearly 9 million hits, compared to, say, fewer than half a million hits for
"hat" and "banana". Clearly "hat" and "head" are more closely related than
"hat" and "banana".

To gauge just how closely, Vitanyi and Cilibrasi have developed a
statistical indicator based on these hit counts that gives a measure of a
logical distance separating a pair of words. They call this the normalised
Google distance, or NGD. The lower the NGD, the more closely the words are
related.

Automatic meaning extraction

By repeating this process for lots of pairs of words, it is possible to
build a map of their distances, indicating how closely related the meanings
of the words are. From this a computer can infer meaning, says Vitanyi.
"This is automatic meaning extraction. It could well be the way to make a
computer understand things and act semi-intelligently," he says.

The technique has managed to distinguish between colours, numbers, different
religions and Dutch painters based on the number of hits they return,

NewScientist: Google's search for meaning
<http://www.newscientist.com/article.ns?id=dn6924>

Paper (PDF)
<http://www.arxiv.org/pdf/cs.CL/0412098>

Are people from Google allowed to comment on this?

Ziya
Nullius in Verba