[Sigia-l] "Search Inside the Music"

Wed Jul 12 17:07:14 EDT 2006

I have received a ton email on this, asking for further explanation or
questioning the merits, etc. Sorry to group my comments here.

As you may recall, I have been posting a bunch of interface research links
for music search and discovery for some time here. This is important not
just because we all need better tools but also because some of the
fundamental problems in indexing, analyzing, filtering and classifying music
are shared by similar basic problems in other fields.

The reason why this particular Sun Labs research is significant is because:

"[it] is exploring new methods of searching music by its acoustic content
and context. This project is aimed at helping people find and organize their
music based on the properties of the music itself: lyrics, musical theme,
melody, tempo, rhythm, and instrumentation."

How?

³Every song is really a series of acoustic features and characteristics that
can be measured, analyzed, tracked, and compared,² he said. ³So the first
thing we do is generate metadata directly from the audio content.² A few of
the features that can be extracted and analyzed include pitch, harmony, key,
timbre, instrumentation, tempo, rhythm patterns, and intensity or energy
level.

Others have tried similar approaches, but the scale of Sun Labs' is pretty
awesome, as it:

"analyzes the features of the music frame by frame, measuring multiple
attributes such as pitch, beat, instrumentation, and so on. Each frame
represents a 40-millisecond slice of the music. In an average 200-second
song there are 5,000 frames...²

According to Sun your PC may take over six years to analyze the 2-3 million
songs in Apple's iTMS, where as they aim to use grid computing to reduce
that to a mere weekend.

So the difference here is one of absolute granularity. Not songs, not
passages, not themes, not genres, not collaborative filtering, not human
classification...just pure quant analysis every 40 millisecond for every
song in collections of millions of songs.

This is like identifying animal species not by their external looks but by
their DNAs, as it were. Or searching through a billion fingerprints not by
sequentially comparing their bitmap likenesses, but by sheer computation of
their vectoral formulations. And so on.

I'm not in a position to say if the claims are accurate or if their goals
are attainable, but I sure can see the logic and novelty therein.

Remember also that Sun sells hardware and systems for large
computation-intensive businesses. How they may be able to commercialize this
is an open question.

Also remember that Google built a $100 billion business by disintermediating
human classification with algorithmic relationship analysis and applying
that notion to a bunch of fields from search to news to advertising, etc.

Those who can solve the "If you liked that, you'll also like these" problem
with much greater accuracy than, say, Amazon.com will surely be our heroes,
won't they?

----
Ziya

Usability >  Simplify the Solution
Design >  Simplify the Problem