[Sigia-l] Proper HTML structure and IA (was Word HTML)

Wed Mar 5 23:22:08 EST 2003

On 3/5/03 2:45 PM, "Andrew H Otwell" <andrew at heyotwell.com> wrote:

> 
>> RE: [Sigia-l] Word HTML - money were my mouth is (was WhenShoulda Manual be
>> Web-based?)
> 
> This thread has degenerated into a pretty uninteresting discussion of HTML
> and petty arguing. Can you guys perhaps take it off list?  Or at least stop
> including such long and nested quotes from previous posts? Or perhaps you
> could stop and restate how the conversation relates to IA?
> 
> andrew

Andrew you ask a very good question in how well structured HTML relates to
IA.

There is relevance with some of this discussion.  As IAs we depend on
information structure to help the user find and make use of the information
we are working with.  Many IAs work on a site level information structure
(micro IA), some move higher to Enterprise IA that incorporates information
throughout an organization on intranets/extranets, and some are Macro IAs
that work on information structures between companies.

One of the common elements in all of these is information structure.  There
is a sublevel of abstraction from even the micro IA's level of abstraction.
This may be an atomic IA.  There are many information repositories that
house HTML documents.  Documents that validate to an standard (the W3C is
the only one around for HTML, that I am aware of) and make use of the
semantic markup that HTML is intended to have, find there are many methods
of easily finding and accessing micro content.

Information not only has its intended use, but lives other lives.  A
document that uses a the blockquote properly and includes the *cite*
attribute to link to the quote's source has a richer framework for that
information.  The blockquote is a wonderful tag that is often not used
properly (often used just to indent information by many designers), but can
quickly build a repository of quotes and annotations of those quotes with a
little scripting.  

A well formatted HTML data table can be highlighted and copied and pasted
into Microsoft Excel for a user's own desired manipulations and
calculations.  The proper structuring of this information lends itself for
information reuse.  A little scripting can grab all the headers in a HTML
document and create a table of contents with hierarchical properties in
place, which can be used as a reference to a repository of documents.  The
scraped information can be put into a database easily and spit back out in a
content management system after it has been cataloged with an algorithm.

The best thing is HTML facilitates free interaction with the information and
eases its reuse.  A browser is all that is needed on any operating system
and platform (handheld, tablet, mobile phone, laptop, desktop, audio reader,
or even plain text interface.  HTML having a standard and a free tool that
allows users to validate the structure (the markup may still be
inconsistent, but there is a starting place that puts the information in a
usable playing field).  If one uses XHTMl, which enforces a more structured
markup with better structure that greatly eases the scraping of information.

A couple years ago I was able scrape a small site with 20 or so reports in
it, each about 10 to 25 pages printed.  Since the HTML pages were well
structured and used heading tags properly, I was not only able to build a
hierarchy of the reports, but I build a catalog that cross-referenced
subjects across the reports.  The script was written in about 20 minutes and
the results were tucked into a database and that eased access to the micro
content (the sets of paragraphs under the headers) and had the ability to
create the hierarchical relationships with other information.  We quickly
had a resource that eased analysis of various views on the subjects in the
reports.  It helped greatly that the documents were well edited and made use
of headers and subheaders. The ability to see similar discussions in reports
next to each other on a page that repurposed that information allowed the
researchers to find relationships to be explored that would have been more
obscured in the reports original formats.

I hope this helps.

All the best,
Thomas