[Sigia-l] automated site mapping tools?

Andrew McNaughton andrew at scoop.co.nz
Mon Jun 24 08:12:15 EDT 2002


The returns from trying to extract anything more than the most obvious of
links from javascript are likely to be poor.

Another approach is to collect information on the links in the material
from referer logs.  Not all links which exist in the pages will actually
appear in the logs, but that may be a good thing.  The referer data will
give you information weighted by actual use, which may actually help you
make sense of what's important.

And of course there's no reason why you can't merge data from multiple
sources.

Andrew McNaughton


On Sun, 23 Jun 2002, Cathy Caron wrote:

> What you'll be able to use for this will depend partly on how this site is coded
> and how "all" you need your "all pages" to be.  When searching for a link
> checker over the last few months, I found that there are lots of programs,
> including open source that will find most links, but very, very few that can
> properly spider complex Javascript and DHTML links.  If this site has links in
> arrays in js files, DHTML menus etc., it is very unlikely you will get a
> complete list from any of the open source, or in fact most of the commercial
> programs.  If you need to be certain you have all the pages, and if you only
> need a single shot at the contents of the site, and not ongoing link checking, I
> would download a trial version of Watchfire (www.watchfire.com) which is very
> aggressive about searching complex links and allows the use of regular
> expressions to define what a link is.  I am not connected with this company in
> any way of course, well, except for having bought their product.
>
> Cathy Caron
> The VIA Group
>
> > I'm presented with a site with 7,000+ static HTML files which has grown,
> > um, organically over the years.
> >
> > What are the recommendations for software which will crawl the site and
> > produce a list of all pages and all links on those pages? It doesn't
> > necessarily need to produce pretty hierarchical diagrams (we're not even
> > certain if the site is truly hierarchical).
>
>
> ------------
> When replying, please *trim your post* as much as possible.
>
> *Plain text, please; NO Attachments
>
> ASIST SIG IA website: http://www.asis.org/SIG/SIGIA/index.html
> _______________________________________________
> Sigia-l mailing list -- post to: Sigia-l at asis.org
> Changes to subscription: http://mail.asis.org/mailman/listinfo/sigia-l
>




More information about the Sigia-l mailing list