[Sigia-l] Converting static pages in a large intranet site to CM
Mark Burgess
mark-lists at pbdh.com
Tue Feb 10 21:28:05 EST 2004
At 1:36 PM -0500 2/10/04, Listera wrote:
>Since we still don't know what "converting" specifically means in this case,
>generically speaking, someone with full command of regular expressions
>and/or Perl should be able to automate the vast majority of all manual work
>here. You'd be amazed just how much you can pattern match and transform.
I had a similar task for a 700-page site, converting crufty
old-fashioned html into xhtml, structural markup, etc. At first I
tried using multi-file search and replace but the pages were just too
inconsistent, and to do the structural markup I needed to get into
each page one-by-one anyway. The replacing was somewhat slow too,
never letting me get into a good rhythm.
So I ended up doing the opposite -- multi-search/replace, one file at
a time. I used HTMLTidy to do the initial cleanup and then some Perl
filters (~90 regexp substitutions) to pick up where Tidy left off.
That part was all automated with AppleScript and BBEdit, so for most
pages I had all the bad code stripped/fixed/reformatted within the
first 10 seconds, leaving 5 - 10 minutes per page for markup and
re-templating, and another few minutes to fix links. For a while I
was adjusting the scripts and stylesheets for every page's new
exceptions but eventually I got it mostly covered. If I had a hundred
more pages I would have automated a few more things that I was
spot-scripting at the end.
At 9:02 PM -0600 2/9/04, Vic Case wrote:
>Does anyone have any tips or guidance to share about such a conversion effort?
Find yourself an experienced scripter. We think this kind of thing is *fun*!
--
Mark
More information about the Sigia-l
mailing list