[Sigia-l] Scaling content inventories (long)

Mike.Steckel at SEMATECH.Org Mike.Steckel at SEMATECH.Org
Mon Oct 28 14:38:17 EST 2002


We just worked on one of these. I asked my co-worker if she could come up with
some ideas, they are below, hope they are helpful:

A couple of suggestions, experience from having completed an inventory of about
3000 pages recently.

1.  Decide what information you want to know about every page before you ever
start the inventory.  

This isn't as simple as it seems.  You need to look at quite a few pages for
variants before you can make up your mind.  Or consult with someone who does
have a good grasp of all the page types in the system.  Even then you may change
your mind.  I didn't realize I was going to need functional owner of the pages
until late in the process -- I had recorded the person who was the author of the
page...but not the department or division who actually "owned" the page (or the
content).   I had to go back through the list and figure that out.

2.  Look for blocks of batch-posted pages.  Ignore all but one and use it as the
example. 

Although I had about 3K total, I only had to actually review about 1.5K by hand.
If I had wanted, I could have pointed our link checking software against only
those directories to make sure there weren't a bunch of broken links inside
those spaces.  If I found a page (or several pages) that worked as indexes to a
large set, I actually counted the number of entries.  For example, "TOC for
Preventive Maintenance Procures, links to 75 procedures" and I checked the
links, but didn't record every URL.

3.  Decide what the purpose of the inventory is.

In our case the inventory is serving a large "clean up the internal site"
effort, so it was important to see every page--to note the last update date, to
note the author of the page, and to note any broken links on the page.  Sections
of the completed inventory were then given to the owners so that they could see
redundancy, outdated information, and broken links.  If you have other purposes,
then you may be able to stop at a fourth or fifth level page and not continue
down the tree.  I have to admit though that some interesting messes are
discovered only by going on an exhaustive search.  One of our big problems was
that people "retired" pages from the system, but never really deleted the files.
They swore that there were no links left to these pages.  However, the pages
kept turning up in the results of a search.  The inventory discovered the
obscure links that had been missed in the removal of those pages.

4.  I would like to meet a person who could do 500 pages a day.

I could not possibly have done 500 pages in a single day.  This number may be
physically possible, but mentally I couldn't have kept at the task that long.
The longest I think I ever spent at this was about 4 hours straight.  The task
is horrendously tedious and my inventory included actually copying and pasting
the link (into an Excel file) to the page being inventoried.  I wanted to be
able to find that page again.... This takes a lot of flipping between screens
and quite a bit of keyboard work.  In our case we had lots of pages that were
actually links to Word or Excel files.  The URL for these can't just be copied
from the browser.  You either have to key it in again or open the source and
copy from there (after finding it).

Other thoughts about the questions:

Involving clients:  I can't imagine doing this without, at the minimum, having
someone available to answer questions.  In our case I had someone else on my
staff do about a third of the inventory.  This person already had a lot of
background knowledge about the company but didn't know the website very well.  I
answered a lot of relevant questions at the beginning. 



-----Original Message-----
From: Peter VanDijck [mailto:pvandijck at lds.com]
Sent: Monday, October 28, 2002 10:54 AM
To: 'sigia-l at asis.org'
Subject: [Sigia-l] Scaling content inventories


Content inventories are time intensive (DonnaM says 500 pages a day:
http://www.maadmob.net/donna/blog/archive/000035.html#000035) and nessecary (no
spell checking on this machine). What are your strategies for scaling them up?
What if you have not 5000 but 50.000 pages? At 500 pages/day/person, that would
take 4 people a full month. What elements of the CI can be automated? For what
parts do you *need* IA's to look at it? How does the client fit in? What bits
can be done by temps? How do you assure accuracy?
PeterV
http://poorbuthappy.com/ease

------------
When replying, please *trim your post* as much as possible.
*Plain text, please; NO Attachments

ASIST Annual Meeting:
http://www.asis.org/Conferences/AM02/index.html

ASIST SIG IA website: http://www.asis.org/SIG/SIGIA/index.html
Searchable list archive:   http://www.info-arch.org/lists/sigia-l/
________________________________________
Sigia-l mailing list -- post to: Sigia-l at asis.org
Changes to subscription: http://mail.asis.org/mailman/listinfo/sigia-l



More information about the Sigia-l mailing list