[Sigia-l] automated site mapping tools?

karl fast karl.fast at pobox.com
Sat Jun 22 09:06:19 EDT 2002


> I'm presented with a site with 7,000+ static HTML files which has grown, 
> um, organically over the years.
> 
> What are the recommendations for software which will crawl the site and 
> produce a list of all pages and all links on those pages? It doesn't 
> necessarily need to produce pretty hierarchical diagrams (we're not even 
> certain if the site is truly hierarchical).

You can always buy something, but there are lots of open source
tools that will do the same thing. I haven't used many of them but
here are a few that I found with a few minutes of digging:

The OPD has a list of these tools:

  Site Management Tools
  http://directory.google.com/Top/Computers/Software/Internet/Site_Management/

  Link Management Tools  
  http://directory.google.com/Top/Computers/Software/Internet/Site_Management/Link_Management/


Freshmeat.net lists open source tools under Link Checking. There are
34, but only a few will meet your needs.

  http://freshmeat.net/browse/244/?topic_id=244

Some possibles include:

SiteMapper (PHP)
  http://agent-source.com/sitemapper/

  SiteMapper.php was created to build a "site map" of a web site. It
  takes a given URL and spiders/crawls the local links found from
  there to build a single HTML page listing all links found. The
  resultant page is useful in the following ways:

Sitemapper.pl (Perl)
  http://www.cpan.org/modules/by-module/LWP/sitemapper-1.019.readme
  http://www.cpan.org/modules/by-module/LWP/sitemapper-1.019.tar.gz
  
  sitemapper.pl is a simple perl script which generated an HTML site
  map from a given URL. It does this by traversing the site, getting
  the home page, extracting links from it, getting all the pages
  linked, and so on.

nSite (Perl?)
  http://www.horsburgh.com/h_nsite.html
 
  nSite generates site maps for a given WWW site. It walks a site
  from the root URL and generates an HTML, TEXT, or XML link page
  which illustrates the structure of the site.


Hope this helps....  
  
--karl  
  



More information about the Sigia-l mailing list