473,320 Members | 1,884 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Bizarre JS brackets bug - mystery solved!

Afternoon,

In an earlier thread (http://tinyurl.com/5v4aa), I described a
problem I was having which was rather bizarrely solved by
changing the line:
"inputbox.value = numq+ag-cw-cc;"
to:
"inputbox.value = numq+(ag)-(cw)-(cc);"

This was needed in IE6 but not in any other browser I tried.
I have now solved the mystery of why inserting the brackets
removed the problem.

I used the age-old technique of removing everything else until
only the error remains. If you're interested in the two files
which eventually helped me to see the error, look at:
http://www.ex.ac.uk/cimt/dev/oddity/...ty-working.htm
http://www.ex.ac.uk/cimt/dev/oddity/...ity-faulty.htm

I will, however, explain the solution here.

IE6 is, I believe, the first version of the IE browser to have
"Auto-Select" for text encoding (character set) turned on by
default. When it loads the first of the above pages, it decides
that the encoding is "Western European (Windows)". When it
loads the second of the above pages, it decides that the
encoding is "Unicode (UTF-7)".

This process (and its arbitrary nature) is rather nicely illustrated
by the three examples below, which are all short. For full effect,
make sure you have Auto-Select turned on for text encoding if
you look at any of the web pages.

(1) http://www.ex.ac.uk/cimt/dev/oddity/...s-oddity-1.htm

<HTML>
<HEAD><TITLE>plus minus oddity 1</TITLE></HEAD>
<BODY>
foo+stuff-bar
</BODY>
</HTML>

This displays:
foo<oriental symbol>bar.
IE has decided that the document is Unicode (UTF-7).

(2) http://www.ex.ac.uk/cimt/dev/oddity/...s-oddity-2.htm

<HTML>
<HEAD><TITLE>plus minus oddity 2</TITLE></HEAD>
<BODY>
foo+stuff-bar<BR>
foo+ stuff -bar
</BODY>
</HTML>

This displays:
foo+stuff-bar
foo+ stuff -bar
IE has decided that this document is Western European (Windows).
How it has decided this is unclear to me. It contains the same first
line as example (1), but something in the second line makes it change
its mind. Perhaps it is the appearance of "stuff" without the "+"
directly in front?

(3) http://www.ex.ac.uk/cimt/dev/oddity/...s-oddity-3.htm

<HTML>
<HEAD><TITLE>plus minus oddity</TITLE></HEAD>
<META HTTP-EQUIV="Content-Type"
CONTENT="text/html; CHARSET=iso-8859-1">
<BODY>
foo+stuff-bar
</BODY>
</HTML>

This displays:
foo+stuff-bar
IE has correctly responded to my suggestion that this document is in
Western European (ISO) as specified in the META tag.

I'm sure that some of you will tell me that I should have always set
the character set for every HTML page I have ever written. If I had
done then I might never have discovered this IE6 "feature".

Anyway, I have learnt my lesson.

I can see two potential ongoing problems. Firstly, it seems odd (to
me) that the text-encoding has also been used to process the script
within the page. There will be plenty of occasions where a variable
is enclosed between a "+" and a "-", and each of these could
potentially lead to an error. Do people script in non-latin charsets?

What makes the problem worse is that the way in which IE decides
the encoding depends fairly arbitrarily on things which appear *later*
in the code and/or page. Removing a working section of code might
remove the problem, but not because there was a fault in that section
of code.

Anyway, there is an easy solution.
Make sure the text-encoding is specified on every page.

Al

Jul 23 '05 #1
8 1545
On Thu, 30 Sep 2004 16:00:28 +0100, Al Reynolds <aj******@bat400.com>
wrote:

[snip]
Do people script in non-latin charsets?


I don't know if they do, but I presume that the potential is there.
Identifiers can legally contain Unicode characters from certain code
groups, and string literals can contain any Unicode character (and I'm not
referring to escape sequences). For them to be properly processed, I
assume that the character set must be set correctly.

[snip]

Mike

--
Michael Winter
Replace ".invalid" with ".uk" to reply by e-mail.
Jul 23 '05 #2
Al Reynolds wrote:
I can see two potential ongoing problems. Firstly, it seems odd (to
me) that the text-encoding has also been used to process the script
within the page.
The script within the page is just part of the page. If the page is
encoded a specific way, then the text between the <script> and </script>
tags will be encoded the same way.
Anyway, there is an easy solution.
Make sure the text-encoding is specified on every page.


Indeed.
Anyway, this may be of passing interest to you: <url:
http://zsigri.tripod.com/fontboard/cjk/utf7.html />

Using some guess work and the URL above, I've arrived at a partial
solution to your question about why IE sometimes decides to Auto-Select
UTF-7 and sometimes it does not. Here it is:

If all "+" characters on a page are only followed by characters from the
Base64 alphabet up to the next "-" character, the page is assumed to be
UTF-7. If even a single "+" character on the page is followed by a
character not from the Base64 alphabet, the page is assumed to not be
UTF-7. As a result:

abc ++++- def would be UTF-7; but
abc +<b>+++</b>- would not

However, this does not explain everything, otherwise: for (var i = 0; i <
length; ++i-b) { ... } would cause problems (assuming no other occurances
of "+" on the page), but it does not.

--
Grant Wagner <gw*****@agricoreunited.com>
comp.lang.javascript FAQ - http://jibbering.com/faq

Jul 23 '05 #3
VK
> Anyway, there is an easy solution.
Make sure the text-encoding is specified on every page.


I don't think it always helps. How about situations when you really need a
script-powered page in Unicode? - Online dictionaries and language lessons
just to name the first.

Also I'm out of any ideas how the "+stuff-" literal might be interpreted as
a Korean syllabic symbol (Unicode value B2DB).

I think this is a bug ("+stuff-" = \u45787) and this is so called "unwanted
behavior" for the whole situation.

IMHO this should be definitely reported to Washington (I mean to the state
of, not DC :-)
Jul 23 '05 #4
On Fri, 1 Oct 2004 15:12:34 +0200, "VK" <sc**********@yahoo.com>
wrote:
Anyway, there is an easy solution.
Make sure the text-encoding is specified on every page.
I don't think it always helps. How about situations when you really need a
script-powered page in Unicode? - Online dictionaries and language lessons
just to name the first.


There is no problem with scripting in IE in UTF-8 or Mozilla, even
script using utf-8 chars as variables work fine - Older Opera and
others have problems, but none in literals.

If the encoding is specifed there's no problem at all, just ensure you
specify an encoding, don't let it be guessed, as IE will guess wrong.
I think this is a bug ("+stuff-" = \u45787) and this is so called "unwanted
behavior" for the whole situation.


No, anything the browser does in response to an invalid document that
it has to fix-up is luck if it works or not - don't risk to luck and
you won't have a problem. For your bug above, a legitimate UTF-7
document would have a complementary bug - you can't deal with both.

Just include a proper charset!

Jim.
Jul 23 '05 #5
On Fri, 1 Oct 2004 15:12:34 +0200, VK <sc**********@yahoo.com> wrote:
Anyway, there is an easy solution.
Make sure the text-encoding is specified on every page.
I don't think it always helps. How about situations when you really need
a script-powered page in Unicode? - Online dictionaries and language
lessons just to name the first.


[Theory]
Declare the document with its correct character set and place the script
in a separate file. If necessary, specify the charset attribute on the
SCRIPT element.
[/Theory]

Not having written documents in other character sets, I don't know how
effective that will be. However, it seems to be the technically correct
approach.
Also I'm out of any ideas how the "+stuff-" literal might be interpreted
as a Korean syllabic symbol (Unicode value B2DB).
"+stuff-" literal? What are you referring to?
[...] \u45787 [...]


Unicode escape sequences use hexadecimal, not decimal.

[snip]

Mike

--
Michael Winter
Replace ".invalid" with ".uk" to reply by e-mail.
Jul 23 '05 #6
VK
> [Theory]
Declare the document with its correct character set and place the script
in a separate file. If necessary, specify the charset attribute on the
SCRIPT element.
[/Theory]
The theory is good and it's the first what came in my head too. But how to
deal with all this inline little onEvent stuff? (like
"...onChange=update(this.form, this.form)"
It looks like in Unicode it may be transformed in a unpredictable way.
"+stuff-" literal? What are you referring to?


I'm referring to http://www.ex.ac.uk/cimt/dev/oddity/...ty-working.htm
from the original posting.
The character sequence (let's stick to this term) "foo+stuff-bar" has been
transformed into "foo[Korean symbol]bar".
Why? And what else may happen with your script on a unicode page? Maybe
"x+y=z" can become a Japanese text in some circumstances?

[...] \u45787 [...]


Unicode escape sequences use hexadecimal, not decimal.


It depends. Unicode consortium publish all its tables in hex values.
Nevertheless if you need to use Unicode chars in non-unicode document (for
scripting for example), you have to use \u-sequences (\u+digital code
value).
Again - I'm not saying it's a crucial default, but it is definitely an issue
to be addressed in new IE releases.
Jul 23 '05 #7
On Fri, 1 Oct 2004 16:29:12 +0200, VK <sc**********@yahoo.com> wrote:
[Theory]
Declare the document with its correct character set and place the
script in a separate file. If necessary, specify the charset attribute
on the SCRIPT element.
[/Theory]
The theory is good and it's the first what came in my head too. But how
to deal with all this inline little onEvent stuff? (like
"...onChange=update(this.form, this.form)"
It looks like in Unicode it may be transformed in a unpredictable way.


That is a possibility. However, you could add the listeners through the
script itself. The only problem here is that old browsers won't be able to
use such pages as getting a reference to anything other than form controls
depends on getElementById (or similar).
"+stuff-" literal? What are you referring to?


I'm referring to
http://www.ex.ac.uk/cimt/dev/oddity/...ty-working.htm
from the original posting.
The character sequence (let's stick to this term) "foo+stuff-bar" has
been transformed into "foo[Korean symbol]bar".


Oh, I see. I thought you were referring to some strange non-standard
character entity.
Why?
From UTF-7 Definition, RFC 2152 - UTF-7 A Mail-Safe Transformation Format
of Unicode:

The "+" signals that subsequent octets are to be interpreted as
elements of the Modified Base64 alphabet until a character not in
that alphabet is encountered. Such characters include control
characters such as carriage returns and line feeds; thus, a Unicode
shifted sequence always terminates at the of a line [sic]. As a
special case, if the sequence terminates with the character "-"
(US-ASCII decimal 45) then that character is absorbed; other
terminating characters are not absorbed and are processed normally.

So in the sequence, +...-, that entire string is replaced by the value of
.... in the Base64 alphabet. The question is why IE decides the page is
UTF-7.

[snip]
> [...] \u45787 [...]


Unicode escape sequences use hexadecimal, not decimal.


It depends. Unicode consortium publish all its tables in hex values.
Nevertheless if you need to use Unicode chars in non-unicode document
(for scripting for example), you have to use
\u-sequences (\u+digital code value).


A script can be a Unicode document. Though identifiers much come from a
limited alphabet, string literals can contain any Unicode character.

Unicode escape sequences in string literals within scripts *do* require
hexadecimal characters. HTML entity references can use either decimal or
hexadecimal (decimal is probably safer).
Again - I'm not saying it's a crucial default, but it is definitely an
issue to be addressed in new IE releases.


However, Microsoft only seem to be issuing security updates. The next full
release will only be available in Longhorn, or so I've read.

Mike

--
Michael Winter
Replace ".invalid" with ".uk" to reply by e-mail.
Jul 23 '05 #8
On Fri, 1 Oct 2004 16:29:12 +0200, "VK" <sc**********@yahoo.com>
wrote:
[Theory]
Declare the document with its correct character set and place the script
in a separate file. If necessary, specify the charset attribute on the
SCRIPT element.
[/Theory]
The theory is good and it's the first what came in my head too. But how to
deal with all this inline little onEvent stuff? (like
"...onChange=update(this.form, this.form)"
It looks like in Unicode it may be transformed in a unpredictable way.


It's not, current browsers have excellent unicode support, you've just
got to declare the character set so it knows!
Why? And what else may happen with your script on a unicode page? Maybe
"x+y=z" can become a Japanese text in some circumstances?
no, not if you correctly declare the encoding, it simply cannot
happen.
It depends. Unicode consortium publish all its tables in hex values.
Nevertheless if you need to use Unicode chars in non-unicode document (for
scripting for example), you have to use \u-sequences (\u+digital code
value).
Please read the specifications, Michael was entirely correct:

\uhhhh - Unicode character represented by the four-digit hexadecimal
number hhhh.
Again - I'm not saying it's a crucial default, but it is definitely an issue
to be addressed in new IE releases.


There's no bug, the bug is in your code.

Jim.
Jul 23 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Nicolae Fieraru | last post by:
Hi All, I have a lot of problems with the web site www.ggsurf.com.au I host on www.gnxonline.com and I want to find out if it is my own problem or theirs. I try to use session cookies and it...
8
by: Al Reynolds | last post by:
Afternoon, I have just finished fixing one of my scripts after it started generating odd errors on IE6 on WinXP Service Pack 2. For info, the IE Version is:...
5
by: Keith Wilby | last post by:
If I have an mdw file for a secure database, and in the same folder I have a bat file with the same name, are there any circumstances when this bat file will execute? eg: C:\db\CSS.mdw...
6
by: GaryDave | last post by:
My school registration database has not been quite right after a recent compact and repair (done while I was away). Though most of the many forms and subforms are working normally, one form in...
36
by: Rolloffle | last post by:
A short time ago my fiancée Kimmy found out that she had gotten pregnant. We had a long, hard talk about what to do, if anything. I was in favour of her getting an abortion, though she was...
3
by: Fin | last post by:
Index properties in C++ class libraries (.NET) apper as set_ and get_ methods when used in C# To test this out, I changed the example from section "13.2 Indexed Properties" in MSDN, and placed the...
10
by: Thorben Grosser | last post by:
Hello dear Newsgroup, my problem seems somehow silly, but after some googeling, I don't find a solution. The point is: I have an multiple select field to which I add values using some...
3
by: Peter | last post by:
Hi! I am having some very strange behavior with my databound controls. It's taken a long time to isolate exactly what is provoking the problem, but I'm still leagues away from solving it. I...
35
by: bukzor | last post by:
I've found some bizzare behavior when using mutable values (lists, dicts, etc) as the default argument of a function. I want to get the community's feedback on this. It's easiest to explain with...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.