By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,874 Members | 1,028 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,874 IT Pros & Developers. It's quick & easy.

what's wrong with this page?

P: n/a
hi

i've got kind of a strange problem i'm struggeling with here. this
page:

http://www.pohflepp.de/eavesdripping4.html

won't be recognized as html/xml by firefox. i've tried everything that
i could think of but still it doesn't work. looks great in safari,
though.

please have a look at it and tell me what i'm doing wrong!
thanks a lot
sascha

Aug 11 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
On 11/08/2005 11:28, pl*****@gmail.com wrote:
i've got kind of a strange problem i'm struggeling with here. this
page:

http://www.pohflepp.de/eavesdripping4.html


This is an encoding problem. Part of the document is encoded as
something compatible with US-ASCII, but other parts are trying to be
UTF-16. Add to this the fact that the XML prolog states the character
encoding is UTF-8, the server sends no character encoding information at
all, and a META element claims ISO-8859-1 (if the browser even gets that
far), and you have a real mess.

Choose an encoding (ISO-8859-1 is probably OK), stick to it, and fix
your server to send the charset parameter in the Content-Type header.

[snip]

On a different note, remove the XML prolog, the xmlns and xml:lang
attributes in the HTML start tag, and change your DOCTYPE to HTML 4.01
Strict. You aren't writing XHTML, so there's no point claiming
otherwise. Furthermore, replace the BR elements in your markup with CSS
margins on the relevant elements.

Mike

--
Michael Winter
Prefix subject with [News] before replying by e-mail.
Aug 11 '05 #2

P: n/a
> http://www.pohflepp.de/eavesdr ipping4.html
won't be recognized as html/xml by firefox


I don't really blame it! It's a right old mess.

How are you producing this page? It looks like automatically generated
code coming through UTF-16 at some points and UTF-8 (?) at others -
then just concatenated together.

I haven't seen a mess like that since I last used ObTree !

Aug 11 '05 #3

P: n/a
Sascha wrote:
i've got kind of a strange problem i'm struggeling with here. this
page:

http://www.pohflepp.de/eavesdripping4.html

won't be recognized as html/xml by firefox.


You've already heard what you're doing wrong. Here's one way to fix it.
Start out by reading Joel Spolsky's great article, "The Absolute
Minimum Every Software Developer Absolutely, Positively Must Know About
Unicode and Character Sets (No Excuses!)". Get it here:

http://www.joelonsoftware.com/articles/Unicode.html

The other people who posted are correct that you have intermixed 3
different kinds of character encodings, and not been consistent at it.
View the source in Firefox ("View -> Page Source") to see what's going
on.

The basic problem you are having is realizing that HTML tags themselves
must be in 7-bit ASCII, while the content between the tags

<p>L.i.k.e. .t.h.i.s.</p>

should be in the encoding set that you have defined in the head portion
of the web document (UTF-8, UTF-16 or whatever). Taking a web page, and
artificially doing "save as UTF-16" will convert the tags themselves to
UTF-16, which is not what you want to do.

Since the text of your document is actually English, you probably
should be using the ISO-8859-1 encoding throughout. And as was already
pointed out, don't use an <xml ...> tag unless you are actually using
XML, which you are not (at least, not in this document!). Here's how I
would fix it quickly:

sed "1d; s/\x00//g" eavesdripping4.html | tidy >eaves_test.html

Then edit eaves_test.html in a good HTML editor. You can get sed (the
stream editor) from any good source of Unix text tools (in Windows, try
http://gnuwin32.sourceforge.net or http://unxutils.sourceforge.net).
And you can get HTML-Tidy, which cleans up and reformats your HTML,
from http://tidy.sourceforge.net .

Kind regards,

Eric Pement

Aug 11 '05 #4

P: n/a
In <11**********************@g47g2000cwa.googlegroups .com>, on
08/11/2005
at 07:26 AM, pe*****@northpark.edu said:
The basic problem you are having is realizing that HTML tags
themselves must be in 7-bit ASCII, while the content between the tags <p>L.i.k.e. .t.h.i.s.</p> should be in the encoding set that you have defined in the head
portion of the web document (UTF-8, UTF-16 or whatever).


That's possible for UTF-8, since it encodes ASCII as itself, but how
could you encode the tages as ASCII and the remaining text as UTF-16?
Doesn't UTF-16 encode each character in one or more 16-bit bytes?

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to sp******@library.lspace.org

Aug 12 '05 #5

P: n/a
On 11/08/2005 15:26, pe*****@northpark.edu wrote:

[snip]
The basic problem you are having is realizing that HTML tags
themselves must be in 7-bit ASCII [...]


As someone else has noted, this cannot be true. UTF-16 is a perfectly
legitimate character encoding for transmitting HTML (as long as the
/server/ indicates this), and one certainly cannot represent tags in
7-bit ASCII using this scheme.

You also seem to be advocating mixing encoding schemes. Part of the
original problem was that the OP was doing just this, as well as giving
no reliable indication as to how the document was encoded.

I believe what you're confusing is the section in the HTML specification
that suggests what to do when character encoding information must be
obtained from the document itself, rather than the server. In this case,
"ASCII-valued bytes [must] stand for ASCII characters (at least until
the META element is parsed)." (5.2.2) In other words, until the user
agent encounters a META element that indicates the true encoding, it
should not be presented with anything but 7-bit ASCII. However, I would
not recommend this.

[snip]

Mike

--
Michael Winter
Prefix subject with [News] before replying by e-mail.
Aug 12 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.