[Sigia-l] Scaling content inventories

Margaret Hanley mairead at yahoo.com
Tue Oct 29 08:29:25 EST 2002


I feel very well qualified to talk about content
inventories and analysis at the moment.

I am managing the content audit, analysis and modelling of
the BBC web site. It contains (we think) 1M pages, 1000 web
sites and growing.

We are handling the sampling issue by doing a site sudit
first - looking at the sites, main sections for normal and
unusual content. By identifying this content, we can then
choose to do a really detailed content analysis of the
content that is different or slightly different to the norm
to add to our ever increasing "content object" library.

To cover ourselves, to identify content that's not linked
or hard to find, we also do a disk scrape (for unlinked
content) and web crawl (linked content) so we can see if we
are missing vital info. We actually have really large
anmounts of content that are not linked, sometimes up to
2/3 of the total content. 

It seems to be working.

Mags

 --- Donna Maurer <donna at maadmob.net> wrote: > I have a
feeling (and only a bit of data to support it),
> that once you get 
> over some number, the content is likely to have more
> consistency 
> than my 5000 different page Intranet, and may be able to
> be listed 
> out as a block (eg if there are minutes for weekly
> meetings for the 
> past 3 years, you probably don't need to list all of
> them).
> 
> There are some times when you just don't need to know
> every page - 
> in my case I did because it all has to either move to a
> new system or 
> be deleted - I can't miss anything.
> 
> Boy this was a time consuming process, but boy it was
> worthwhile.
> 
> Donna
> (of the famous content inventory)
> 
> 
> 
> On 28 Oct 2002 at 11:54, Peter VanDijck wrote:
> 
> > Content inventories are time intensive (DonnaM says 500
> pages a day:
> >
>
http://www.maadmob.net/donna/blog/archive/000035.html#000035)
> and nessecary (no
> > spell checking on this machine). What are your
> strategies for scaling them up?
> > What if you have not 5000 but 50.000 pages? At 500
> pages/day/person, that would
> > take 4 people a full month. What elements of the CI can
> be automated? For what
> > parts do you *need* IA's to look at it? How does the
> client fit in? What bits
> > can be done by temps? How do you assure accuracy?
> > PeterV
> > http://poorbuthappy.com/ease
> 
> ------------
> When replying, please *trim your post* as much as
> possible.
> *Plain text, please; NO Attachments
> 
> ASIST Annual Meeting:
> http://www.asis.org/Conferences/AM02/index.html
> 
> ASIST SIG IA website:
> http://www.asis.org/SIG/SIGIA/index.html
> Searchable list archive:  
> http://www.info-arch.org/lists/sigia-l/
> ________________________________________
> Sigia-l mailing list -- post to: Sigia-l at asis.org
> Changes to subscription:
http://mail.asis.org/mailman/listinfo/sigia-l 

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com



More information about the Sigia-l mailing list