On Sun, 28 Dec 2003, someone wrote:
A friend just sent me a text translation in norwegian, that she saved
with WORD 9, as an html file
Its similarity to HTML is misleading. Most of the rubbish is put
there (as I understand it) in order to be able to round-trip to Word
format.
In addition to the options mentioned by others, there _might_ be some
mileage in reading it back into Word, re-saving it as RTF, and then
using an RTF-to-HTML converter.
The value of doing that would be chiefly if the original had been
based on a meaningful template, with named styles which mean something
to HTML (heading-N, body text, bulleted-list, and so on) rather than
being "make it this big with that font" kind of DTP rubbish. Word has
been able to do _this_ job (stylesheeted logical markup) for at least
a decade, but most of its users haven't caught up with it yet: they
still use the damned thing as if it was an electric typewriter rather
than a real word processor. So, as I say, it depends on the technique
of the person who used Word in the first place, as to whether this
kind of approach makes any sense.
If there's no logical structure, then the advice you got from other
folks, such as Tidy or the Office cleanup tool, are surely less effort
- and the result will be no worse.
(Sometimes it's best to toss all the original formatting, and just
copy/paste the content into an appropriate template. Look for e.g
postings on the topic by Eric Jarvis for advice on how best to
organise the authoring of content in multiple languages - the secret
is to set out the method at the outset, rather than trying to re-work
arbitrary formats sent in by diverse contributors.)