[Sigia-l] Word HTML - money were my mouth is (was When Should a Manual be Web-based?)

Thomas Vander Wal list at vanderwal.net
Sat Mar 1 16:47:26 EST 2003


On 3/1/03 1:27 PM, "Chris Chandler" <chrischandler67 at earthlink.net> wrote:

> 
> "list" (listera's baby brother?) wrote:
> 
>> This is fine if you did not care about the headers or the proper bullets
>> or outlines being converted over.  We have tried this tools and it still
>> needs touching up by hand to reapply the information structure.
>> 
>> I manage a team that deals with this problem on a daily basis.  We have
>> clients that develop information documents in MS Word and have finely
>> sculpted information for print.  Saving to HTML turns this into an
>> information blob that needs somebody to go into the HTML and add the
>> proper structures so that the information is properly structured and
>> many of the lines of the information sculpture are seen again.  We
>> figure between 7 developers that perform this work there are many $100k
>> spent that should not need to be spent if Word did the conversion well.
> 
> 
> Just out of curiosity -- do your content developers use styles properly when
> they create a word document? i.e. are all
> the top level headings marked as Heading One (etc,) in Word? Or is every style
> in the document "normal style" with hard
> coded formatting?

The developers only convert the Word documents from the 4,000 plus people in
the organization.  Most users don't use the styles well in Word, even after
training.  One of the problems we run into is using the default styles in
Word with two different styles.  Word will only convert one of them as a
header.  Lets say you have a heading 2 that has the italics as default and
modify it later in the document to not have heading 2 with italics.  In the
conversion process to HTML only one of the two will be converted as a
heading the other is a paragraph with styles.  Making any style modification
inside a heading in Word will cause it not to be converted as a heading.

Word is in the hands of the content/subject area experts.  Most of them are
not Word experts and many have only modified Word to turn off functionality
that annoys them, like how Word creates outlines.  Many, 75%, treat Word as
a typewriter and use tab and bullet image rather than

Word flat out fails with tables and nested outlines.  The must be recreated
by hand in HTML.  The conversions are a large problem when you have to meet
accessibility requirements and the semantic markup conveys information that
can be used for people with disabilities.  Roughly 60% of our work includes
tables and outlines.  Most folks in the organization use Office 2k, which is
better than Office XP in converting.

Word is not a tagged or structured document creation application.  Word is a
style-based application that places style by location and not starting and
ending tags.  This approach worked very well for those who only cared about
style on a printed page, but it is a large task to assign semantic meaning
to styles when the tools does not permit that functionality.  Conversely
WordPerfect is/was a tagged document creation tool.  This approach did not
work so well with a perfect print style environment.  WordPerfect how ever
does get tagging, much like HTML markup, correct to a larger degree than
Word.  Unfortunately WP changed to more of a Word approach in version 9 and
now produces HTML as poor as Word.

> I recently had to write six functional specifications for related, but
> slightly different sites. I discovered that the
> secret to making Word behave is to use styles, and only styles, when
> formatting documents. It is for example, the ONLY
> way to get word to properly do outline numbering.


I did the same thing in the late three to five years ago and it worked well
in a group of 20 people that were interested and learning and doing things
properly.  In large organizations it does not pass the laugh test, it is
much like waking a sleeping giant.  Word is a tool that creates printed
documents well, but does not have the capability to enforce structure.

> As with most things in our business, it takes a little time to properly set
> things up, but it makes the production
> process much easier. I haven't really experimented with the HTML side yet, but
> I hope that using styles properly in word
> will allow me to use styles properly in my HTML produced from that Word
> document (maybe this weekend I'll take a stab at
> that $80)

Hah, good luck.  Words styles are not only poorly created, but there is a
huge amount of bloat.  A structured two page document with seven style in
Word (no tables or outlines) creates a 18 to 35 KB file with 7 KB being just
style.  Now strip the document to its proper bare essential tags (usually
done with regular expressions in HomeSite or BBEdit) and create the same
styles and you have a 6 to 8KB file with 1KB or less of styles.  Not only
have you saved two-thirds of your bandwidth required (which could be a nice
raise) your HTML markup is not a joke (you can get another job showing the
wonderful markup), but the content is easy to maintain.

Good luck at the 80 Pounds (your monetary conversion is as good as Word's
HTML conversion ;-) ).  Give it a try and see if you can do what PhD's have
not been able to crack among the thousands of others trying to do the same.
Word creates a mess in its HTML conversions (Microsoft employees have
admitted as much and admit it will take a rewrite from the bottom up to fix
the problem).

> As one of the Word MVP's says: "Word is set up to enable the simplest fastest
> way to produce a document if you have no
> idea of what you want or what you are doing."

Yes if you want to print it or if you don't give a darn about the quality of
the information's structure.  Word is just another piece of mediocre
software (you thought MS stood for something else) for the masses.  I know
that folks as MS have been working to change the garbage HTML they output
from Word, but it often competes with other functionality and the burden of
having to rewrite much of how Word works from the bottom up.

All the best,
Thomas
-- 
www.vanderwal.net

The future is mine, not Microsoft's







More information about the Sigia-l mailing list