By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,244 Members | 1,339 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,244 IT Pros & Developers. It's quick & easy.

going from code heavy WORD 9 html doc in Norwegian- to - normal HTML

P: n/a
hi

A friend just sent me a text translation in norwegian, that she saved
with WORD 9, as an html file

It's loaded with Microsoft code like this :

<p class=MsoNormal><span
style='font-size:10.0pt;mso-bidi-font-size:7.5pt;
font-family:"Courier New"'>send dine opplevelser og tanker om fred til
<o:p></o:p></span></p>

I want to get rid of that overbloat Microsoft stuff.

is there a stripper tool, or webpages somewhere where I can get the
raw text with the right HTML for norwegian characters [ ie in the form
&entityname ]

I just need it to be as simple as possible as I will use my own
stylesheet on this text for my formating

thanks.

Richard

Jul 20 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
someone wrote:
I want to get rid of that overbloat Microsoft stuff.

is there a stripper tool, or webpages somewhere where I can get the
raw text with the right HTML for norwegian characters [ ie in the form
&entityname ]


http://tidy.sf.net/ should be able to cope with it.

Make sure you RTFM to enable the extra powerful Word fixing routines.

--
David Dorward <http://dorward.me.uk/>
Jul 20 '05 #2

P: n/a
ry***@yahooyahoo.com (someone) wrote:
I want to get rid of that overbloat Microsoft stuff.


You could use Tidy, which was mentioned here, and which is available as
part of the HTML-Kit software too.

Alternatively, you could get "Office 2000 HTML Filter 2.0", a free and
not too big (250 kB) addition to Office, available from
http://www.microsoft.com (sorry, no direct URL, since the site is a
mess, but try to use the software name in a site search there).
Then you can open the file and "Export To Compact HTML" from Word.
It removes most of the nonsense.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #3

P: n/a
On Sun, 28 Dec 2003 16:55:05 GMT, someone declared in
comp.infosystems.www.authoring.html:
hi
G'day.

A friend just sent me a text translation in norwegian, that she saved
with WORD 9, as an html file

I want to get rid of that overbloat Microsoft stuff.


Try
http://www.microsoft.com/downloads/d...DBEE-3FBD-482C
-83B0-96FB79B74DED&displaylang=EN (watch wrapping). Doesn't do quite as
good a job as Tidy, but is easier to use.

Alternately, Tidy is available as part of HTML-Kit,
http://www.chami.com/html-kit/

HTH

--
Mark Parnell
http://www.clarkecomputers.com.au
Jul 20 '05 #4

P: n/a
On Sun, 28 Dec 2003, someone wrote:
A friend just sent me a text translation in norwegian, that she saved
with WORD 9, as an html file


Its similarity to HTML is misleading. Most of the rubbish is put
there (as I understand it) in order to be able to round-trip to Word
format.

In addition to the options mentioned by others, there _might_ be some
mileage in reading it back into Word, re-saving it as RTF, and then
using an RTF-to-HTML converter.

The value of doing that would be chiefly if the original had been
based on a meaningful template, with named styles which mean something
to HTML (heading-N, body text, bulleted-list, and so on) rather than
being "make it this big with that font" kind of DTP rubbish. Word has
been able to do _this_ job (stylesheeted logical markup) for at least
a decade, but most of its users haven't caught up with it yet: they
still use the damned thing as if it was an electric typewriter rather
than a real word processor. So, as I say, it depends on the technique
of the person who used Word in the first place, as to whether this
kind of approach makes any sense.

If there's no logical structure, then the advice you got from other
folks, such as Tidy or the Office cleanup tool, are surely less effort
- and the result will be no worse.

(Sometimes it's best to toss all the original formatting, and just
copy/paste the content into an appropriate template. Look for e.g
postings on the topic by Eric Jarvis for advice on how best to
organise the authoring of content in multiple languages - the secret
is to set out the method at the outset, rather than trying to re-work
arbitrary formats sent in by diverse contributors.)
Jul 20 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.