Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old August 11th, 2005, 11:35 AM
plugimi@gmail.com
Guest
 
Posts: n/a
Default what's wrong with this page?

hi

i've got kind of a strange problem i'm struggeling with here. this
page:

http://www.pohflepp.de/eavesdripping4.html

won't be recognized as html/xml by firefox. i've tried everything that
i could think of but still it doesn't work. looks great in safari,
though.

please have a look at it and tell me what i'm doing wrong!


thanks a lot
sascha

  #2  
Old August 11th, 2005, 11:55 AM
Michael Winter
Guest
 
Posts: n/a
Default Re: what's wrong with this page?

On 11/08/2005 11:28, plugimi@gmail.com wrote:
[color=blue]
> i've got kind of a strange problem i'm struggeling with here. this
> page:
>
> http://www.pohflepp.de/eavesdripping4.html[/color]

This is an encoding problem. Part of the document is encoded as
something compatible with US-ASCII, but other parts are trying to be
UTF-16. Add to this the fact that the XML prolog states the character
encoding is UTF-8, the server sends no character encoding information at
all, and a META element claims ISO-8859-1 (if the browser even gets that
far), and you have a real mess.

Choose an encoding (ISO-8859-1 is probably OK), stick to it, and fix
your server to send the charset parameter in the Content-Type header.

[snip]

On a different note, remove the XML prolog, the xmlns and xml:lang
attributes in the HTML start tag, and change your DOCTYPE to HTML 4.01
Strict. You aren't writing XHTML, so there's no point claiming
otherwise. Furthermore, replace the BR elements in your markup with CSS
margins on the relevant elements.

Mike

--
Michael Winter
Prefix subject with [News] before replying by e-mail.
  #3  
Old August 11th, 2005, 01:15 PM
dingbat@codesmiths.com
Guest
 
Posts: n/a
Default Re: what's wrong with this page?

> http://www.pohflepp.de/eavesdr ipping4.html[color=blue]
> won't be recognized as html/xml by firefox[/color]

I don't really blame it! It's a right old mess.

How are you producing this page? It looks like automatically generated
code coming through UTF-16 at some points and UTF-8 (?) at others -
then just concatenated together.

I haven't seen a mess like that since I last used ObTree !

  #4  
Old August 11th, 2005, 03:35 PM
pemente@northpark.edu
Guest
 
Posts: n/a
Default Re: what's wrong with this page?

Sascha wrote:
[color=blue]
> i've got kind of a strange problem i'm struggeling with here. this
> page:
>
> http://www.pohflepp.de/eavesdripping4.html
>
> won't be recognized as html/xml by firefox.[/color]

You've already heard what you're doing wrong. Here's one way to fix it.
Start out by reading Joel Spolsky's great article, "The Absolute
Minimum Every Software Developer Absolutely, Positively Must Know About
Unicode and Character Sets (No Excuses!)". Get it here:

http://www.joelonsoftware.com/articles/Unicode.html

The other people who posted are correct that you have intermixed 3
different kinds of character encodings, and not been consistent at it.
View the source in Firefox ("View -> Page Source") to see what's going
on.

The basic problem you are having is realizing that HTML tags themselves
must be in 7-bit ASCII, while the content between the tags

<p>L.i.k.e. .t.h.i.s.</p>

should be in the encoding set that you have defined in the head portion
of the web document (UTF-8, UTF-16 or whatever). Taking a web page, and
artificially doing "save as UTF-16" will convert the tags themselves to
UTF-16, which is not what you want to do.

Since the text of your document is actually English, you probably
should be using the ISO-8859-1 encoding throughout. And as was already
pointed out, don't use an <xml ...> tag unless you are actually using
XML, which you are not (at least, not in this document!). Here's how I
would fix it quickly:

sed "1d; s/\x00//g" eavesdripping4.html | tidy >eaves_test.html

Then edit eaves_test.html in a good HTML editor. You can get sed (the
stream editor) from any good source of Unix text tools (in Windows, try
http://gnuwin32.sourceforge.net or http://unxutils.sourceforge.net).
And you can get HTML-Tidy, which cleans up and reformats your HTML,
from http://tidy.sourceforge.net .

Kind regards,

Eric Pement

  #5  
Old August 12th, 2005, 01:35 AM
Shmuel (Seymour J.) Metz
Guest
 
Posts: n/a
Default Re: what's wrong with this page?

In <1123770397.849263.173700@g47g2000cwa.googlegroups .com>, on
08/11/2005
at 07:26 AM, pemente@northpark.edu said:
[color=blue]
>The basic problem you are having is realizing that HTML tags
>themselves must be in 7-bit ASCII, while the content between the tags[/color]
[color=blue]
> <p>L.i.k.e. .t.h.i.s.</p>[/color]
[color=blue]
>should be in the encoding set that you have defined in the head
>portion of the web document (UTF-8, UTF-16 or whatever).[/color]

That's possible for UTF-8, since it encodes ASCII as itself, but how
could you encode the tages as ASCII and the remaining text as UTF-16?
Doesn't UTF-16 encode each character in one or more 16-bit bytes?

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@library.lspace.org

  #6  
Old August 12th, 2005, 12:35 PM
Michael Winter
Guest
 
Posts: n/a
Default Re: what's wrong with this page?

On 11/08/2005 15:26, pemente@northpark.edu wrote:

[snip]
[color=blue]
> The basic problem you are having is realizing that HTML tags
> themselves must be in 7-bit ASCII [...][/color]

As someone else has noted, this cannot be true. UTF-16 is a perfectly
legitimate character encoding for transmitting HTML (as long as the
/server/ indicates this), and one certainly cannot represent tags in
7-bit ASCII using this scheme.

You also seem to be advocating mixing encoding schemes. Part of the
original problem was that the OP was doing just this, as well as giving
no reliable indication as to how the document was encoded.

I believe what you're confusing is the section in the HTML specification
that suggests what to do when character encoding information must be
obtained from the document itself, rather than the server. In this case,
"ASCII-valued bytes [must] stand for ASCII characters (at least until
the META element is parsed)." (5.2.2) In other words, until the user
agent encounters a META element that indicates the true encoding, it
should not be presented with anything but 7-bit ASCII. However, I would
not recommend this.

[snip]

Mike

--
Michael Winter
Prefix subject with [News] before replying by e-mail.
 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles