473,385 Members | 1,907 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

going from code heavy WORD 9 html doc in Norwegian- to - normal HTML

hi

A friend just sent me a text translation in norwegian, that she saved
with WORD 9, as an html file

It's loaded with Microsoft code like this :

<p class=MsoNormal><span
style='font-size:10.0pt;mso-bidi-font-size:7.5pt;
font-family:"Courier New"'>send dine opplevelser og tanker om fred til
<o:p></o:p></span></p>

I want to get rid of that overbloat Microsoft stuff.

is there a stripper tool, or webpages somewhere where I can get the
raw text with the right HTML for norwegian characters [ ie in the form
&entityname ]

I just need it to be as simple as possible as I will use my own
stylesheet on this text for my formating

thanks.

Richard

Jul 20 '05 #1
4 1673
someone wrote:
I want to get rid of that overbloat Microsoft stuff.

is there a stripper tool, or webpages somewhere where I can get the
raw text with the right HTML for norwegian characters [ ie in the form
&entityname ]


http://tidy.sf.net/ should be able to cope with it.

Make sure you RTFM to enable the extra powerful Word fixing routines.

--
David Dorward <http://dorward.me.uk/>
Jul 20 '05 #2
ry***@yahooyahoo.com (someone) wrote:
I want to get rid of that overbloat Microsoft stuff.


You could use Tidy, which was mentioned here, and which is available as
part of the HTML-Kit software too.

Alternatively, you could get "Office 2000 HTML Filter 2.0", a free and
not too big (250 kB) addition to Office, available from
http://www.microsoft.com (sorry, no direct URL, since the site is a
mess, but try to use the software name in a site search there).
Then you can open the file and "Export To Compact HTML" from Word.
It removes most of the nonsense.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #3
On Sun, 28 Dec 2003 16:55:05 GMT, someone declared in
comp.infosystems.www.authoring.html:
hi
G'day.

A friend just sent me a text translation in norwegian, that she saved
with WORD 9, as an html file

I want to get rid of that overbloat Microsoft stuff.


Try
http://www.microsoft.com/downloads/d...DBEE-3FBD-482C
-83B0-96FB79B74DED&displaylang=EN (watch wrapping). Doesn't do quite as
good a job as Tidy, but is easier to use.

Alternately, Tidy is available as part of HTML-Kit,
http://www.chami.com/html-kit/

HTH

--
Mark Parnell
http://www.clarkecomputers.com.au
Jul 20 '05 #4
On Sun, 28 Dec 2003, someone wrote:
A friend just sent me a text translation in norwegian, that she saved
with WORD 9, as an html file


Its similarity to HTML is misleading. Most of the rubbish is put
there (as I understand it) in order to be able to round-trip to Word
format.

In addition to the options mentioned by others, there _might_ be some
mileage in reading it back into Word, re-saving it as RTF, and then
using an RTF-to-HTML converter.

The value of doing that would be chiefly if the original had been
based on a meaningful template, with named styles which mean something
to HTML (heading-N, body text, bulleted-list, and so on) rather than
being "make it this big with that font" kind of DTP rubbish. Word has
been able to do _this_ job (stylesheeted logical markup) for at least
a decade, but most of its users haven't caught up with it yet: they
still use the damned thing as if it was an electric typewriter rather
than a real word processor. So, as I say, it depends on the technique
of the person who used Word in the first place, as to whether this
kind of approach makes any sense.

If there's no logical structure, then the advice you got from other
folks, such as Tidy or the Office cleanup tool, are surely less effort
- and the result will be no worse.

(Sometimes it's best to toss all the original formatting, and just
copy/paste the content into an appropriate template. Look for e.g
postings on the topic by Eric Jarvis for advice on how best to
organise the authoring of content in multiple languages - the secret
is to set out the method at the outset, rather than trying to re-work
arbitrary formats sent in by diverse contributors.)
Jul 20 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Daven Nair | last post by:
Hi, I would like to know if Python supports codes similar to shell scripts: count=`ps -ef|grep "pattern"|wc -l` for count in `echo $count` do done fi
15
by: Steve | last post by:
Hi, I've been charged with investigating the possibilities of internationalizing our C++ libraries. std::strings are used all over the place, and unfortunately a mixture of...
8
by: Corrupted Mind | last post by:
I have just finished the K&R's book. And, I would like to know what to forget and add to the teaching of K&R? I ask this because I know that no book is perfect, nor complete. ( even if they are the...
1
by: Finn J Johnsen | last post by:
The issue is creating a norwegian C# group. (in Norwegian) Sjekker "tempen" om det finnes Norske C# - programmerere her på gruppa. Hvis du er interresert i en Norsk gruppe, så meld din interesse...
7
by: Fresh Air Rider | last post by:
Hi I understand that ASP.net 2.0 (Whidbey) is going to reduce coding by 70%. Surely this is going to de-skill or dumb down the developers task and open up the task to less qualified and...
1
by: Rikart Pettersen | last post by:
Hi I have problems with the Norwegian characters æøå disappearing in UserControls. I had the same problem for aspx pages, but when I changed the charset to utf-8 this solved the problem for aspx....
12
by: ishtar2020 | last post by:
Hi everybody I've been writing my very first application in Python and everything is running smoothly, except for a strange problem that pops up every once in a while. I'm sure is the kind of...
5
by: Andy | last post by:
I'm having trouble accessing an unmanaged long from a managed class in VC++.NET When I do, the contents of the variable seem to be mangled. If I access the same variable byte-by-byte, I get the...
2
by: joakim.hove | last post by:
Hello, I am having great problems writing norwegian characters æøå to file from a python application. My (simplified) scenario is as follows: 1. I have a web form where the user can enter his...
0
by: Frank Gallagher | last post by:
July 8 2008 Governments are far more corrupt than anyone would believe other than the members of Charter Democracy Force www.cdf.name who have a prodigious amount of irrefutable evidence and are...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.