[Sigia-l] Word HTML - money were my mouth is (was When Should a Manual be Web-based?)

Thomas Vander Wal list at vanderwal.net
Mon Mar 3 08:24:16 EST 2003


On 2/28/03 8:15 PM, "Boniface Lau" <boniface_lau at compuserve.com> wrote:
> It is one thing to generate web pages without HTML error. But it is
> quite another to have the generated HTML deemed "valid" with regard to
> a published version of HTML.

True, the best test is the W3C validator (http://validator.w3.org/)

> The Word-produced HTML does not have DTD declaration. That means its
> HTML is not targeted at the published versions of HTML. So, I do not
> expect the Word-produced HTML to be "valid" with regard to the
> published HTML versions.

Word generated Web pages do not come close to validating to any (X)HTML DTD.
I know quite a few folks that believe if that what Word generates is not
(X)HTML because of this.

> Mind you, web browsers do not require web pages with DTD declaration.
> Unless Word acted up, Internet Explorer, Netscape Navigator, and Opera
> have no problem with the Word-produced HTML.

Ah, now you are hitting on the problem with Word.  Word concerns itself with
visual presentation.  It has done an admirable job building good visual
presentations of the information.  A Web browser can show a visual
presentation well, but that if far from being a good Web page or good
(X)HTML.

HTML contains a lot of semantic mark-up that helps understand the
information, for more than just visual presentation.  The header,
blockquote, cite, paragraph, ordered lists, unordered lists, tables, etc.
tags have semantic meaning that helps the user understand the information.
Those viewing the Web visually do not necessarily see a difference.

Why is this important?  One is accessibility, those that can not see the
pages visually or are having their pages read to them as they drive can have
the variants in the markup stated or have tonal differences from the
semantic markup.

The other advantage is information reuse.  An (X)HTML document that has
proper markup can be scraped to pull the headers for a table of contents,
build a quote library (from blockquotes), collapse the document into an
expandable hierarchy (using the proper heading tags), or parse the document
for reuse in a mobile device.  These are just a handful of examples that
using not only valid (X)HTML, but also proper semantic markup.

Word (nor other MS Office products, of which Word is by far the best) is not
built to do this at this time.  From rumors, it does not seem that the next
version will hit the mark either with making valid (X)HTML, even thought it
is purportedly built on XML.

> Regardless of how much people may want to bash Word, I hope they can
> be fair-minded.

I think most are being fair to Word, it is a product built to create
documents that are visually conveyed.  We can not and should not expect Word
or any other application to output information in a format that its
underling structure does not support well.  Microsoft has done a solid job
in trying to get Word to build (X)HTML, but in reality they are not even
close.  As stated (or intimated) above, (X)HTML is not about the visual
presentation of the information it is about the structure of the
information.  The job of presentation is in the CSS, which can handle
screen, print, audible, tty, etc. presentations.  Word does not come close
to getting the structure right, yet, which is the problem.

All the best,
Thomas
-- 
www.vanderwal.net

The future is mine, not Microsoft's





More information about the Sigia-l mailing list