By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,636 Members | 1,190 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,636 IT Pros & Developers. It's quick & easy.

unprintable characters in a javascript produced msgbox

P: n/a
I am wondering a bit about what I should see in a message box (or in a
webpage, for that matter) when I include an unprintable ASCII
character, say ASCII 255, in there. I experimented a bit on my PC
running Traditional Chinese Windows 98SE and found that the following
javascript code produced a message that seemed to have ASCII
represented as "y".

alert( 'the following char is ASCII FF: \xff. So what does it
look like to you?' );

I had this line in the <HEADsection of the relevant HTML file where
I put that javascript code:

<meta http-equiv='Content-Type' content='text/html; charset=Big5-
HKSCS'>

But even if I try to figure that into the picture, I still can't see
why it should come out as "y".

Can anybody please enlighten this thick mind?
Jul 2 '08 #1
Share this Question
Share on Google+
9 Replies


P: n/a
emrefan wrote:
I am wondering a bit about what I should see in a message box (or in a
webpage, for that matter) when I include an unprintable ASCII character,
say ASCII 255, in there.
The (7-bit US-)ASCII character set ranges from code points 0 (0x00) to 127
(0x7F). Everything else is _not_ part of (US-)ASCII code:

<http://en.wikipedia.org/wiki/ASCII>
I experimented a bit on my PC running Traditional Chinese Windows 98SE
and found that the following javascript code produced a message that
seemed to have ASCII represented as "y".char
You are getting the LATIN SMALL LETTER Y WITH DIAERESIS character ("ΓΏ"; note
that there are two dots in the ascent) because this is the character at code
point U+00FF in the Unicode character set as defined in the Unicode
Standard, versions 2.1 and later (a conforming implementation of ECMAScript
Edition 3 must implement the latter), and at code point 255 (0xFF) of
several other character sets, most notably ISO/IEC 8859-1 and Windows-1252:

<http://en.wikipedia.org/wiki/ISO/IEC_8859-1#Related_character_maps>
<http://unicode.org/>
<http://www.ecmascript.org/>
alert( 'the following char is ASCII FF: \xff. So what does it look like
to you?' );
Should be window.alert(...) so as to rely less on the UA's scope chain.
I had this line in the <HEADsection of the relevant HTML file where I
put that javascript code:

<meta http-equiv='Content-Type' content='text/html; charset=Big5- HKSCS'>

But even if I try to figure that into the picture, I still can't see why
it should come out as "y".
The display behavior for the code point 0xFF of the *proposed* character
encoding Big5-HKSCS (which uses the Big5 Character Set with Hong Kong
Supplementary Character Set), even if written properly, is undefined:

<http://en.wikipedia.org/wiki/Big5#HKSCS>
<http://www.iana.org/assignments/charset-reg/>

You should also check the HTTP response message's headers for a
`Content-Type' header that says differently, for it takes precedence then:

<http://www.w3.org/TR/1999/REC-html401-19991224/charset.html#h-5.2.2>
HTH

PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Jul 2 '08 #2

P: n/a
Bart Van der Donck wrote:
emrefan wrote:
>I am wondering a bit about what I should see in a message box (or in a
webpage, for that matter) ...

Character encoding in message boxes or web pages are two totally
different things.
Not true.
> alert( 'the following char is ASCII FF: \xff. So what does it
look like to you?' );

This always looks the same for everyone, namely a y with an umlaut on.
No other display is possible here.
You are mistaken. The \x string literal escape sequence may or may not
specify a Unicode character, depending on the ECMAScript implementation.
>I had this line in the <HEADsection of the relevant HTML file where
I put that javascript code:

<meta http-equiv='Content-Type' content='text/html; charset=Big5-
HKSCS'>

That line does not affect javascript's internal code point table (like
eg. \xff).
It could affect it if there was no corresponding HTTP header present that
says otherwise. There is no "javascript", BTW.
It defines which character set must be used on the web page.
Unless a corresponding HTTP header is present that says otherwise. There
are no "web pages", BTW.
PointedEars
--
var bugRiddenCrashPronePieceOfJunk = (
navigator.userAgent.indexOf('MSIE 5') != -1
&& navigator.userAgent.indexOf('Mac') != -1
) // Plone, register_function.js:16
Jul 2 '08 #3

P: n/a
Thomas 'PointedEars' Lahn wrote:
Bart Van der Donck wrote:
>Character encoding in message boxes or web pages are two totally
different things.

Not true.
It is true, because the character encoding is done at a different
level. Message boxes -like in this example- are actually much easier.
There can only be one possible representation. But when trying to
write y-umlaut in a web page, you have a bunch of possibilities, on
the top of my head, at least 10 - for which of course some are more
preferred than others.
>>* * *alert( 'the following char is ASCII FF: \xff. So what does it
look like to you?' );
>This always looks the same for everyone, namely a y with an umlaut on.
No other display is possible here.

You are mistaken. *The \x string literal escape sequence may or may not
specify a Unicode character, depending on the ECMAScript implementation.
But I was only saying that alert('\xff') always shows y-umlaut in any
browser. y-umlaut is the character that is tied to code point 255 in
any ECMAScript implementation.
>>* *<meta http-equiv='Content-Type' content='text/html; charset=Big5-
HKSCS'>

That line does not affect javascript's internal code point table (like
>eg. \xff).

It could affect it if there was no corresponding HTTP header present that
says otherwise. *
Untrue. The display of \x.. (and \u....) can never be influenced by
any HTTP-header. The notation is ASCII-safe, and is passed to the
javascript engine to tie it to a fixed character. I think you're
mixing up the character set of a web page with javascript's consistent
internal code point table.
There is no "javascript", BTW.
Is that so.
>It defines which character set must be used on the web page.

Unless a corresponding HTTP header is present that says otherwise.
That is far from sure, and could easily vary from browser to browser.
Anyway - it would be unwise to specify a charset on the web page that
contradicts the HTTP header (coder's fault, not browser's fault).
There are no "web pages", BTW.
Is that so :-)

--
Bart
Jul 2 '08 #4

P: n/a
Bart Van der Donck wrote:
Thomas 'PointedEars' Lahn wrote:
>Bart Van der Donck wrote:
>>Character encoding in message boxes or web pages are two totally
different things.
Not true.

It is true, because the character encoding is done at a different level.
Message boxes -like in this example- are actually much easier. There can
only be one possible representation.
You are mistaken. It depends on the user agent which characters are
supported in a message box. However, it has been observed that message
boxes use the character set of their document, regardless of the encoding
that the ECMAScript implementation supports. We have discussed this here
before.
But when trying to write y-umlaut in a web page, you have a bunch of
possibilities, on the top of my head, at least 10 - for which of course
some are more preferred than others.
I don't think the OP wanted to write "y-umlaut" at all.
>>>alert( 'the following char is ASCII FF: \xff. So what does it look
like to you?' );
This always looks the same for everyone, namely a y with an umlaut
on. No other display is possible here.
You are mistaken. The \x string literal escape sequence may or may not
specify a Unicode character, depending on the ECMAScript
implementation.

But I was only saying that alert('\xff') always shows y-umlaut in any
browser.
But you are dead wrong.
y-umlaut is the character that is tied to code point 255 in any
ECMAScript implementation.
However, there are implementations that do not support Unicode.
>>><meta http-equiv='Content-Type' content='text/html; charset=Big5-
HKSCS'>
That line does not affect javascript's internal code point table (like
>>eg. \xff).
It could affect it if there was no corresponding HTTP header present
that says otherwise.

Untrue. The display of \x.. (and \u....) can never be influenced by any
HTTP-header.
\x definitely can. Obviously, \u cannot.
The notation is ASCII-safe,
\x cannot be ASCII-safe as if it allows characters to be represented that
are outside the range of the ASCII character set.
>There is no "javascript", BTW.

Is that so.
Yes, there are different ECMAScript implementations (some of which don't
even deserve that designation), and versions thereof.
>>It defines which character set must be used on the web page.
Unless a corresponding HTTP header is present that says otherwise.

That is far from sure, and could easily vary from browser to browser.
It has been observed that user agents honor the Specification in that
regard. This was the reason why AddDefaultCharset was disabled in newer
Apache versions.
Anyway - it would be unwise to specify a charset on the web page that
contradicts the HTTP header (coder's fault, not browser's fault).
Nowadays, no argument there.
PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm>
Jul 2 '08 #5

P: n/a
Thomas 'PointedEars' Lahn wrote:
Bart Van der Donck wrote:
>Thomas 'PointedEars' Lahn wrote:
>>Bart Van der Donck wrote:
Character encoding in message boxes or web pages are two totally
different things.
Not true.
>It is true, because the character encoding is done at a different level.
Message boxes -like in this example- are actually much easier. There can
only be one possible representation.

You are mistaken. *It depends on the user agent which characters are
supported in a message box. *However, it has been observed that message
boxes use the character set of their document, regardless of the encoding
that the ECMAScript implementation supports. *We have discussed this here
before.
That is not the point here. It is clear that the original poster was
talking about alert('\xff') versus the encoding of y-umlaut in an HTML-
document. In that regard the representation of \xff has nothing to do
with the representation of y-umlaut outside javascript.

[...]
>But I was only saying that alert('\xff') always shows y-umlaut in any
browser.

But you are dead wrong.
Well, let's see then. Could you show a case where alert('\xff') does
not show y-umlaut ?
>y-umlaut is the character that is tied to code point 255 in any
ECMAScript implementation.

However, there are implementations that do not support Unicode.
Irrelevant. y-umlaut does not need Unicode at all.
>The display of \x.. (and \u....) can never be influenced by any
HTTP-header.

\x definitely can. *Obviously, \u cannot.
Let's see. Could you show an example where \x.. is displayed
differently depending on a varying HTTP-header ?
>The notation is ASCII-safe,

\x cannot be ASCII-safe as if it allows characters to be represented that
are outside the range of the ASCII character set.
That's why I said the *notation* is ASCII-safe. What is *represented*
by that notation, is a different job; that is decided by the
javascript engine.
>>There is no "javascript", BTW.
Is that so.

Yes, there are different ECMAScript implementations (some of which don't
even deserve that designation), and versions thereof.
That's like saying that cars don't exist, but only implementations of
fuel engines.

--
Bart
Jul 2 '08 #6

P: n/a
Bart Van der Donck wrote:
Thomas 'PointedEars' Lahn wrote:
>Bart Van der Donck wrote:
>>Thomas 'PointedEars' Lahn wrote:
Bart Van der Donck wrote:
Character encoding in message boxes or web pages are two totally
different things.
Not true.
It is true, because the character encoding is done at a different level.
Message boxes -like in this example- are actually much easier. There can
only be one possible representation.
You are mistaken. It depends on the user agent which characters are
supported in a message box. However, it has been observed that message
boxes use the character set of their document, regardless of the encoding
that the ECMAScript implementation supports. We have discussed this here
before.

That is not the point here. It is clear that the original poster was
talking about alert('\xff') versus the encoding of y-umlaut in an HTML-
document. In that regard the representation of \xff has nothing to do
with the representation of y-umlaut outside javascript.
Yes, it has.
[...]
>>But I was only saying that alert('\xff') always shows y-umlaut in any
browser.
But you are dead wrong.

Well, let's see then. Could you show a case where alert('\xff') does
not show y-umlaut ?
Wasting my time supporting your logical fallacy? I don't think so.

Ask something living in Bosnia, Croatia, Czech Republic, Hungaria, Poland,
Romania, Serbia, Slovakia, Slovenia, Malta, Estonia, Latvia, Lithuania,
Greenland, Bulgaria, Belarus, Russia, Macedonia, Greece, Israel, or any
other country where the character set designed for their main language does
not have "y-umlaut", as you put it (you really don't know what an umlaut
is), at decimal code point 255 (*except* with Unicode support), instead.
>>y-umlaut is the character that is tied to code point 255 in any
ECMAScript implementation.
However, there are implementations that do not support Unicode.

Irrelevant.
Not at all.
y-umlaut does not need Unicode at all.
True, it is also contained in ISO-8859-1. However, as ASCII does not
provide this character, if the \x string escape sequence is used and Unicode
support is not present, the locale encoding (or the encoding of the
document/file) must be used to determine which character to display for
decimal code points beyond 127. (If Unicode is not supported, "\uhhhh" is
interpreted as "uhhhh".)
>>The notation is ASCII-safe,
\x cannot be ASCII-safe as if it allows characters to be represented that
are outside the range of the ASCII character set.

That's why I said the *notation* is ASCII-safe.
It would seem whether that is true depends on how one defines "ASCII-safe".
What is *represented* by that notation, is a different job; that is
decided by the javascript engine.
See?
>>>There is no "javascript", BTW.
Is that so.
Yes, there are different ECMAScript implementations (some of which don't
even deserve that designation), and versions thereof.

That's like saying that cars don't exist, but only implementations of
fuel engines.
As a matter of fact, there are JavaScript and JScript versions that are not
fully ECMAScript-compliant, and therefore do not provide Unicode support.
PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f8*******************@news.demon.co.uk>
Jul 2 '08 #7

P: n/a
Bart Van der Donck wrote:
Thomas 'PointedEars' Lahn wrote:
>Bart Van der Donck wrote:
>>Thomas 'PointedEars' Lahn wrote:
Bart Van der Donck wrote:
Character encoding in message boxes or web pages are two totally
different things.
Not true.
It is true, because the character encoding is done at a different
level. Message boxes -like in this example- are actually much easier.
There can only be one possible representation.
You are mistaken. It depends on the user agent which characters are
supported in a message box. However, it has been observed that message
boxes use the character set of their document, regardless of the
encoding that the ECMAScript implementation supports. We have
discussed this here before.

That is not the point here. It is clear that the original poster was
talking about alert('\xff') versus the encoding of y-umlaut in an HTML-
document. In that regard the representation of \xff has nothing to do
with the representation of y-umlaut outside javascript.
Yes, it has.
[...]
>>But I was only saying that alert('\xff') always shows y-umlaut in any
browser.
But you are dead wrong.

Well, let's see then. Could you show a case where alert('\xff') does not
show y-umlaut ?
Wasting my time supporting your logical fallacy? I don't think so.

Ask someone living in Bosnia, Croatia, Czech Republic, Hungaria, Poland,
Romania, Serbia, Slovakia, Slovenia, Malta, Estonia, Latvia, Lithuania,
Greenland, Bulgaria, Belarus, Russia, Macedonia, Greece, Israel, or any
other country where the character set designed for their main language does
not have "y-umlaut", as you put it (you really don't know what an umlaut
is), at decimal code point 255 (*except* with Unicode support), instead.
>>y-umlaut is the character that is tied to code point 255 in any
ECMAScript implementation.
However, there are implementations that do not support Unicode.

Irrelevant.
Not at all.
y-umlaut does not need Unicode at all.
True, it is also contained in ISO-8859-1. However, as ASCII does not
provide this character, if the \x string escape sequence is used and Unicode
support is not present, the locale encoding (or the encoding of the
document/file) must be used to determine which character to display for
decimal code points beyond 127. (If Unicode is not supported, "\uhhhh" is
interpreted as "uhhhh" rather than a single character.)
>>The notation is ASCII-safe,
\x cannot be ASCII-safe as if it allows characters to be represented
that are outside the range of the ASCII character set.

That's why I said the *notation* is ASCII-safe.
It would seem whether that is true depends on how one defines "ASCII-safe".
What is *represented* by that notation, is a different job; that is
decided by the javascript engine.
See?
>>>There is no "javascript", BTW.
Is that so.
Yes, there are different ECMAScript implementations (some of which
don't even deserve that designation), and versions thereof.

That's like saying that cars don't exist, but only implementations of
fuel engines.
As a matter of fact, there are JavaScript and JScript versions that are not
fully ECMAScript-compliant, and therefore do not provide Unicode support.
PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f8*******************@news.demon.co.uk>
Jul 2 '08 #8

P: n/a
Thomas 'PointedEars' Lahn wrote:
Bart Van der Donck wrote:
>Could you show a case where alert('\xff') does
not show y-umlaut ?

Wasting my time supporting your logical fallacy? *I don't think so.

Ask something living in Bosnia, Croatia, Czech Republic, Hungaria, Poland,
Romania, Serbia, Slovakia, Slovenia, Malta, Estonia, Latvia, Lithuania,
Greenland, Bulgaria, Belarus, Russia, Macedonia, Greece, Israel, or any
other country where the character set designed for their main language does
not have "y-umlaut", as you put it (you really don't know what an umlaut
is), at decimal code point 255 (*except* with Unicode support), instead.
You are simply wrong; all of those will display y-umlaut with
alert('\xff'). You keep talking about Unicode but it has nothing to do
with it. As I said, just give me one example, and I'll be immediately
convinced of your point. But there is no such example.
>>>y-umlaut is the character that is tied to code point 255 in any
ECMAScript implementation.
However, there are implementations that do not support Unicode.
>Irrelevant.

Not at all.
>y-umlaut does not need Unicode at all.

True, it is also contained in ISO-8859-1. *However, as ASCII does not
provide this character, if the \x string escape sequence is used and Unicode
support is not present, the locale encoding (or the encoding of the
document/file) must be used to determine which character to display for
decimal code points beyond 127. *
You just wrote the core of your misconception. In the (nowadays highly
unlikely) case that Unicode support would not be present in the
browser's script engine, the locale is NOT used as lookup-table for
\x. It's always the internal lookup table of the script engine. It has
nothing to do with the document or its encoding !

[...]
>That's why I said the *notation* is ASCII-safe.
It would seem whether that is true depends on how one defines "ASCII-safe"..
You have the nasty habit to give a silly twist to a position that you
cannot longer hold. ASCII-safe is code-point 0 to 127, as you
perfectly know. There is no room for other interpretations.
>What is *represented* by that notation, is a different job; that is
decided by the javascript engine.

See?
See what then ?
>>>>There is no "javascript", BTW.
Is that so.
Yes, there are different ECMAScript implementations (some of which don't
even deserve that designation), and versions thereof.
That's like saying that cars don't exist, but only implementations of
fuel engines.
As a matter of fact, there are JavaScript and JScript versions that are not
fully ECMAScript-compliant, and therefore do not provide Unicode support.
I'm not going to reply on your arguments like "there is no
javascript", "you don't know what an umlaut is", "web pages don't
exist", etc. I made my point clear enough. You already conveniently
snipped my question "Could you show an example where \x.. is displayed
differently depending on a varying HTTP-header" which was one of your
basic points.

--
Bart
Jul 2 '08 #9

P: n/a
Bart Van der Donck wrote:
Thomas 'PointedEars' Lahn wrote:
>Bart Van der Donck wrote:
>>Could you show a case where alert('\xff') does
not show y-umlaut ?
Wasting my time supporting your logical fallacy? I don't think so.

Ask something living in Bosnia, Croatia, Czech Republic, Hungaria, Poland,
Romania, Serbia, Slovakia, Slovenia, Malta, Estonia, Latvia, Lithuania,
Greenland, Bulgaria, Belarus, Russia, Macedonia, Greece, Israel, or any
other country where the character set designed for their main language does
not have "y-umlaut", as you put it (you really don't know what an umlaut
is), at decimal code point 255 (*except* with Unicode support), instead.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You are simply wrong; all of those will display y-umlaut with
alert('\xff').
No, they won't. ISO-8859-2, for example, does not have y with diaerhesis at
code point 255. Neither has ISO-8859-3 or any other ISO-8859-x family
encoding but ISO-8859-11. And I am not even mentioning more exotic
character sets and encodings.
You keep talking about Unicode but it has nothing to do with it.
You are mistaken, and I'm tired explaining to you why. There is *nothing*
in the ECMAScript Specification that specifies what should happen with \x
escape sequences if Unicode support is not there, because ECMAScript Ed. 1
already introduced Unicode support. However, as we know that there are
JavaScript and JScript versions that are not ECMAScript-compliant, that
therefore don't have Unicode support or the operating system's API they are
running on is not Unicode-compliant, it is locale/encoding-dependent what
happens with \x80 to \xFF then.
As I said, just give me one example, and I'll be immediately
convinced of your point. But there is no such example.
As I indicated, you are trying to shift the burden of proof and I will not
support that.
>>>>y-umlaut is the character that is tied to code point 255 in any
ECMAScript implementation.
However, there are implementations that do not support Unicode.
Irrelevant.
Not at all.
>>y-umlaut does not need Unicode at all.
True, it is also contained in ISO-8859-1. However, as ASCII does not
provide this character, if the \x string escape sequence is used and Unicode
support is not present, the locale encoding (or the encoding of the
document/file) must be used to determine which character to display for
decimal code points beyond 127.

You just wrote the core of your misconception. In the (nowadays highly
unlikely) case that Unicode support would not be present in the
browser's script engine, the locale is NOT used as lookup-table for
\x. It's always the internal lookup table of the script engine.
There is no "internal lookup table of the script engine", that is a fantasy
of yours. window.alert() especially, is a host object's method which
behavior is defined by the UA's API.
It has nothing to do with the document or its encoding !
If that were so, it would be *you* who would have to prove *that*, not
vice-versa.
PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm>
Jul 2 '08 #10

This discussion thread is closed

Replies have been disabled for this discussion.