[Sigia-l] Turning URLs Into Directories

Sat Jun 30 10:28:39 EDT 2007

Quoting Ziya Oz:

> The basic idea behind standard URLs is simple - given a type of object,
> like
> a book or a movie or a music album, create a URL schema that can be used
> by
> any site.
> What architectural issues do you foresee for this?

The problem with this idea is that the Internet, many IAs' opinions aside,
is not a library except in increasingly rare and marginal cases. Instead,
it's a department store...whether or not you are actually selling anything
(and chances are, if you're going to get paid to worry about this five years
from now, you will be selling things). Therefore, URLs, as a crucial part of
search engine indexing, should be seen as store signage, not catalog index
cards. To put it in more fancy-pants terms, it's the difference between
names (now) and labels (Universal URL).
By commoditizing the signifier-signified relationship, the Universal URL
removes the URL from usefulness as an indexing tool; imagine thousands of
results all ending with /harrypotter from a Google search. Owing to other
factors (link weight, mainly), it's guaranteed that Amazon and B & N, etc.,
will be at the top of that list, and Ralph's Books in Peoria will be at the
bottom. He is now, too...but under the current system, if he's savvy and
hand-polishes his pages, he might be able to rank highly in several niches
(Harry Potter ephemera, Harry Potter accessories, whatever) which encourages
competition; makes him money, some of which he'll plow back into the Web;
and encourages the Web itself under the principles of Metcalfe's Law. If he
is forced to use a Universal URL, he'll either give up or refuse to use the
URL, which is the real issue. Obviously, the standard must be accepted in
order to work...ask the US Metric By '84! folks.
Secondly, Google would never accept this, since it would destroy their
business model. Google is responsible for the current health of the Web,
and, indirectly, for all of our jobs. If you don't know how their long-tail
revenue model works, I exhort you to learn as soon as possible.
Increasingly, it *is* the Web. I will also do my best personally and
professionally to defend and promote this model, since it is based on the
continuing health of small publishers and the encouragement of specialist,
human-not-spam-machine-created production, both of which I like a lot.
Thirdly, all of this aside, like most semantic Web schemes this ignores
human evil.  To spell it out, suppose I use the /harrypotter to index my
page full of porn links? What's to stop me? Your outrage? I'm still making
money. Not all that much money, per page, so what I do is I make five or ten
thousand copies of the page with slightly different domains and dump them
into the environment. I have automated tools that let me do this in about an
hour. Now we're talking ROI! (Yahoo!'s inability to deal with this is
directly responsible for their continuing slide.) So as a search engine, in
essence I have to ignore those Universal URLs...or weight other page
elements so heavily in the equation (i.e., do exhaustive checks to make sure
of relevance) that the original concept becomes meaningless. This is in fact
what happened to the Keywords meta tags, which got so stuffed with spam
(free naked celebrities!) that they stopped being relevant sometime in the
late 90s. The search engines explicitly ignore them now. Sadly, plenty of
people still believe that's how you "index" your pages.
In short, we have enough trouble with the Wal-Mart effect in real life. We
mustn't bring it to our information indexing efforts.