Connecting Tech Pros Worldwide Forums | Help | Site Map

Encodings of javascript

Taras_96
Guest
 
Posts: n/a
#1: Oct 17 '08
Hi everyone,

I have a couple of questions regarding encodings of javascript.

Say I have a function getCodePoints in foo.js:
-----------------------------------------------------------
function getCodePoints(str)
{
for(var i = 0; i < str.length; i++)
{
alert(str.charCodeAt(i));
}
}
-----------------------------------------------------------

In my html, which is saved in *UCS-4 (4 bytes per character)*, I
write:

<body onload="getCodePoints('hello');">

How does javascript know that the string it is given is encoded using
UCS-4? Is it transcoded by the browser into javascript's native UTF-16
before giving it to the javascript function?

Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:

function insertString()
{
window.document.getElementById('bar').innerHTML = '¼Ø';
}

And the HTML page which is the target for insertion itslef is encoded
in UTF-8:

<script charset="UCS-4" src="theSource.js"></script<!-- telling the
browser that the JS is encoded in UCS-4 -->
<body>
<div id='bar'>
</div>
</body>

How does the string get inserted into the HTML with UTF-8 encoding? Is
it again something the browser does (ie, it does something along the
lines of convert the UCS-4 string into javascript's native UTF-16, and
then when writing it into the DOM converts it into UTF-8)?

Thanks

Taras

Thomas 'PointedEars' Lahn
Guest
 
Posts: n/a
#2: Oct 18 '08

re: Encodings of javascript


Taras_96 wrote:
Quote:
I have a couple of questions regarding encodings of javascript.
>
Say I have a function getCodePoints in foo.js:
-----------------------------------------------------------
function getCodePoints(str)
{
for (var i = 0; i < str.length; i++)
{
alert(str.charCodeAt(i));
}
}
-----------------------------------------------------------
>
In my html, which is saved in *UCS-4 (4 bytes per character)*, I write:
>
<body onload="getCodePoints('hello');">
>
How does javascript know that the string it is given is encoded using
UCS-4?
It does not. It would appear the user agent's parser would pass the content
of the `onload' attribute, after recoding, as the body of a function to the
script engine, which would be called as the property of the object that
caused the creation of the event (with some exceptions), on event.
Quote:
Is it transcoded by the browser into javascript's native UTF-16 before
giving it to the javascript function?
In a sense.
Quote:
Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:
>
function insertString()
{
window.document.getElementById('bar').innerHTML = '¼Ø';
^^
Quote:
}
What character should we observe there?
Quote:
And the HTML page
There are no "HTML pages" other than in a page-wise output medium (such as a
presentation or a printout). You are referring to HTML *documents*.
Quote:
which is the target for insertion itslef is encoded in UTF-8:
>
<script charset="UCS-4" src="theSource.js"></script<!-- telling the
browser that the JS is encoded in UCS-4 -->
Iff supported. You should provide a proper Content-Type header for the
external resource to make sure it is parsed correctly.
Quote:
<body<div id='bar'</div</body>
>
How does the string get inserted into the HTML with UTF-8 encoding?
It does not. The DOM API uses the DOMString type, which must be UTF-16(LE)
encoded. Although `innerHTML' is a proprietary property, I would presume
that it has a setter that accepts a DOMString value and performs recoding
according to the detected character encoding of the document if required.

<http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-C74D1578>
<http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-C74D1578>
<http://www.w3.org/TR/DOM-Level-2-HTML/>
Quote:
Is it again something the browser does
The DOM implementation is a part of the layout engine which is a part of the
browser.
Quote:
(ie, it does something along the lines of convert the UCS-4 string into
javascript's native UTF-16, and then when writing it into the DOM
converts it into UTF-8)?
Probably, yes. If the browser is Open Source, you can UTSL to confirm that.


HTH

PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Taras_96
Guest
 
Posts: n/a
#3: Oct 24 '08

re: Encodings of javascript


Hi Thomas,

Sounds like I was on the right track

On Oct 18, 12:04*am, Thomas 'PointedEars' Lahn <PointedE...@web.de>
wrote:
Quote:
Taras_96 wrote:
.....
Quote:
>
Quote:
Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:
>
Quote:
function insertString()
{
* window.document.getElementById('bar').innerHTML = '¼Ø';
>
* * * * * * * * * * * * * * * * * * * * * * * * * * * *^^
>
Quote:
}
>
What character should we observe there?
It doesn't really matter.. the character was supposed to be a 1/4
character followed by a phi, I just wanted some non ASCII characters.
Quote:
>
Quote:
And the HTML page
>
There are no "HTML pages" other than in a page-wise output medium (such as a
presentation or a printout). *You are referring to HTML *documents*.
Fair enough :)
Quote:
>
Iff supported. *You should provide a proper Content-Type header for the
external resource to make sure it is parsed correctly.
True

Quote:
Quote:
converts it into UTF-8)?
>
Probably, yes. *If the browser is Open Source, you can UTSL to confirm that.
>
HTH
Helped a lot... thanks!

Taras
Closed Thread