[Sigia-l] automated site mapping tools?

Andrew McNaughton andrew at scoop.co.nz
Sat Jun 22 14:01:00 EDT 2002


On Sat, 22 Jun 2002, Eric Scheid wrote:

> I'm presented with a site with 7,000+ static HTML files which has grown,
> um, organically over the years.
>
> What are the recommendations for software which will crawl the site and
> produce a list of all pages and all links on those pages? It doesn't
> necessarily need to produce pretty hierarchical diagrams (we're not even
> certain if the site is truly hierarchical).

FWIW, I'd probably do it in perl.  A recursive web crawler takes about 20
lines of glue code using LWP and HTML::LinkExtor.  Another 10 lines would
get you a database of all the links.  That gives you a lot of flexibility
to look at the data however you like, but it's dependent on available
skills.  It would be quicker for me to write something like that than to
find out about existing tools.

Hierarchy is generally something which is imposed rather than inherent.
While it would almost certainly be useful to have lists of which pages
link to a given page, you'll amost certainly need to impose some sort of
structure in order to make sense of a map of that many pages.  Mapping
your pages by shortest path from the front page is likely to be a good
starting point.

Andrew McNaughton




More information about the Sigia-l mailing list