[Sigia-l] maintaining a content inventory

Wed Feb 15 21:36:37 EST 2006

On 16/2/06 12:06 PM, "Seth Earley" <seth at earley.com> wrote:
> Managing content data for a migration would likely require integration with
> the content management application.  It's not just the metadata that you are
> tracking but the actual content assets, correct?

Just the locations of the rendered content assets, ie. the web pages. Any
attempt at migration magic would be a nightmare: there are some dozen or so
sites, running on about 4 different CMS, and there's even a couple of sites
which are not CMS driven. The situation is that the state government has
rolled five or so depts into one new dept, and now need to do the same with
their legacy websites.

Developing the new IA is a separate project (although taking some input from
the inventory). Migration would be a whole new project too.

> Schemalogic Schemaserver would not have this functionality off the shelf.  I
> would also say that Factiva's Synaptica product, though a very good taxonomy
> management tool would not necessarily have this functionality out of the
> box.  Both tools have strengths and weaknesses and are good at certain
> aspects of taxonomies and thesauri but it sounds more like you are tracking
> assets and metadata and requiring a transformation from one repository to
> another.

Almost. One of the things the client wants out of this exercise is a
comparison between the existing content and the IA of the new site. For the
non-allergic: one of the deliverables is a gap analysis. Additionally,
knowing that this migration will likely take months, they want to be able to
update the content inventory in track with any changes to the legacy
websites.

They also want to be able to generate some reports based on the meta data,
such as a subject index. They probably could make good use of a weighted
site map too, i.e. a high level overview of the logical structure, but with
indicators as to how many pages actually exist in various branches.

> A company that I was following a couple of years ago is Cambridge Docs
> (http://www.cambridgedocs.com/) I just took a look at their web site and it
> looks like they are evolving their product.
> 
> Their approach is to convert documents to XML and tag them during the
> conversion process.  These tagged documents can then be imported into your
> content management system with metadata applied.
> 
> It sounds like you are inventorying content and tagging for a later
> conversion or import.  It may make sense to convert the content and apply
> metadata prior to the migration.

Yep, doing an inventory to assist a later migration.

Converting the content into another repository is sadly not an option here
for various client driven reasons.

> Are you also trying to manage metadata terms independently of content?

Not particularly. The thesauri and classification schemes are already well
established and in use in the organisation or externally, eg ORG thesaurus,
waste data classification, information line categories, web and intranet
categorisation systems, image classification scheme (photo library), APT,
POEO activity types, internal business systems, AGLS schema, natural
resources data directory, ANZLIC, DEH subject classification, ASIC industry
codes etc.

Not every site would make use of all those thesauri and schemes. Some are
specific to just one of the sites. It's likely they've settled on a selected
subset of terms for some of those thesauri. Many are already encoded into
the rendered html.

e.