Sascha wrote:
i've got kind of a strange problem i'm struggeling with here. this
page:
http://www.pohflepp.de/eavesdripping4.html
won't be recognized as html/xml by firefox.
You've already heard what you're doing wrong. Here's one way to fix it.
Start out by reading Joel Spolsky's great article, "The Absolute
Minimum Every Software Developer Absolutely, Positively Must Know About
Unicode and Character Sets (No Excuses!)". Get it here:
http://www.joelonsoftware.com/articles/Unicode.html
The other people who posted are correct that you have intermixed 3
different kinds of character encodings, and not been consistent at it.
View the source in Firefox ("View -> Page Source") to see what's going
on.
The basic problem you are having is realizing that HTML tags themselves
must be in 7-bit ASCII, while the content between the tags
<p>L.i.k.e. .t.h.i.s.</p>
should be in the encoding set that you have defined in the head portion
of the web document (UTF-8, UTF-16 or whatever). Taking a web page, and
artificially doing "save as UTF-16" will convert the tags themselves to
UTF-16, which is not what you want to do.
Since the text of your document is actually English, you probably
should be using the ISO-8859-1 encoding throughout. And as was already
pointed out, don't use an <xml ...> tag unless you are actually using
XML, which you are not (at least, not in this document!). Here's how I
would fix it quickly:
sed "1d; s/\x00//g" eavesdripping4.html | tidy >eaves_test.html
Then edit eaves_test.html in a good HTML editor. You can get sed (the
stream editor) from any good source of Unix text tools (in Windows, try
http://gnuwin32.sourceforge.net or
http://unxutils.sourceforge.net).
And you can get HTML-Tidy, which cleans up and reformats your HTML,
from
http://tidy.sourceforge.net .
Kind regards,
Eric Pement