[Sigia-l] Tilde for synonym search in Google
James Aylett
james.aylett at tangozebra.com
Fri Sep 3 09:42:58 EDT 2004
Ed Housman wrote:
> The way the expand operator SHOULD work is for the system to
> look at all the words in the "top" hits hits, then find among
> these a small set of other words those that occur in many
> times in those hits, focusing on rare words. (So entering
> Russia should return Moscow, Putin, Kremlin, etc.) Then,
> somehow you have to pick the hottest of those new words
> and add them to the search. The ranked results bring up
> new possibly relevant information, more wide-ranging than
> the original search. Of course if the user enters several
> words in the search, the very top hits should be the ones that
> have ALL those words.
Is Google's tilde a true expand operator? I'd have thought it's more likely
to use a thesaurus to do query expansion, rather than using the result set
of the query to generate an expand set of possible new terms and then throw
them back through the search (ie doing an extra search for each tilde-word).
Of course, the thesaurus could be built up by running single word queries
against Google's entire database and taking the most promising expand terms
for each. However that's not really what synonym searching would be expected
to do, and as you point out there's a statistical basis behind this that
means you'll get weird expand words sometimes. With a thesaurus approach you
only need to worry about homonyms (and spelling-based ones, rather than
sound-based, at that).
It's not clear from the Google documentation, but I'd expect it to be doing
synonyms, not expand set generation. Partly because it is more readily
understood by people, which tends to help in improving the quality of
results ...
James
--
James Aylett
Chief Technical Architect, Tangozebra
t 020 7535 9850 f 020 7535 9900
w http://tangozebra.com/
This e-mail message, including any attachments, is intended only for
the person or entity to which it is addressed, and may contain
confidential information. If you are not the intended recipient, any
review, retransmission, disclosure, copying, modification or other use
of this e-mail message or attachments is strictly forbidden.
Copyright Tangozebra 2004. All Rights Reserved.
________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
More information about the Sigia-l
mailing list