[Sigia-l] Converting static pages in a large intranet site to CM

Mark Burgess mark-lists at pbdh.com
Tue Feb 10 21:28:05 EST 2004


At 1:36 PM -0500 2/10/04, Listera wrote:
>Since we still don't know what "converting" specifically means in this case,
>generically speaking, someone with full command of regular expressions
>and/or Perl should be able to automate the vast majority of all manual work
>here. You'd be amazed just how much you can pattern match and transform.

I had a similar task for a 700-page site, converting crufty 
old-fashioned html into xhtml, structural markup, etc. At first I 
tried using multi-file search and replace but the pages were just too 
inconsistent, and to do the structural markup I needed to get into 
each page one-by-one anyway. The replacing was somewhat slow too, 
never letting me get into a good rhythm.

So I ended up doing the opposite -- multi-search/replace, one file at 
a time. I used HTMLTidy to do the initial cleanup and then some Perl 
filters (~90 regexp substitutions) to pick up where Tidy left off. 
That part was all automated with AppleScript and BBEdit, so for most 
pages I had all the bad code stripped/fixed/reformatted within the 
first 10 seconds, leaving 5 - 10 minutes per page for markup and 
re-templating, and another few minutes to fix links. For a while I 
was adjusting the scripts and stylesheets for every page's new 
exceptions but eventually I got it mostly covered. If I had a hundred 
more pages I would have automated a few more things that I was 
spot-scripting at the end.


At 9:02 PM -0600 2/9/04, Vic Case wrote:
>Does anyone have any tips or guidance to share about such a conversion effort?

Find yourself an experienced scripter. We think this kind of thing is *fun*!


-- 
Mark



More information about the Sigia-l mailing list