[Sigia-l] site spidering tools for content inventories

Jess McMullin jess.lists at nform.ca
Wed Jul 14 17:59:31 EDT 2004


Hi all,

I'm looking for site spidering tools to help automate content inventories.

I'd like to get a hierarchical list of urls, along with page titles and
existing metadata into an Excel sheet. The tool should be aware of hierarchy
and particularly duplicate links so that linking to a page in global
navigation doesn't result in the page appearing as a child of every page
with global navigation.

What tools have you used to help ease the mind numbing monotony of
inventories? What kind of experiences have you had? And for the 'look at the
archives' folks - been there, done that.

Things I've tried so far:

- Visio Professional produces visual sitemaps, doesn't scale well, can't
export to Excel without a VBA guru. Expensive, but already have it.

- Xenu is free, fast, but doesn't handle duplicates very well at all, so
doesn't produce useful hierarchy. Have to copy HTML into Excel and then
spend a lot of time eliminating dupes.
http://home.snafu.de/tilman/xenulink.html

- PowerMapper has a good algorithm for understanding hierarchy, but doesn't
seem to export to Excel in the demo. $299US for pro version.
http://www.powermapper.com

- Xtreeme SiteXpert - Seems to have ok hierarchy algorithm, but again no
export in demo. $69US
http://www.xtreeme.com/sitexpert/

- TheBrain can spider sites and understand hierarchy, but can't export to
Excel
Site seems AWOL for now.

Things I've found via list archives, Google, but have yet to try:

- LinkViewer - no export, $97US, want to charge $37 for upgrading 3.0 ->
3.1, limited formatting options. ugly.
http://www.gradetools.com/linkviewer/lv31help.htm

- The NIST WebMetrics tools are free, but require configuration on a server,
and would require perl hacking to generate something importable into Excel
http://zing.ncsl.nist.gov/WebTools/tech.html (specifically linklint and
convert-ll to spider an existing site).

- SiteMapper claims to offer export to HTML, CSV, and XML, but am unsure of
how well it understands hierarchy and duplicates - will try demo $40US
http://trellian.com/mapper/overview.htm

- Extract URL - no hierarchy, but grabs all URLs on site and can export to
Excel $49US
http://www.spadixbd.com/extracturl/

Anything else you'd suggest?

thanks

Jess


----------------------
Jess McMullin
nForm User Experience Consulting, Inc.
780.709.9396
www.nform.ca




More information about the Sigia-l mailing list