By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,793 Members | 1,252 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,793 IT Pros & Developers. It's quick & easy.

Encodings of javascript

P: n/a
Hi everyone,

I have a couple of questions regarding encodings of javascript.

Say I have a function getCodePoints in foo.js:
-----------------------------------------------------------
function getCodePoints(str)
{
for(var i = 0; i < str.length; i++)
{
alert(str.charCodeAt(i));
}
}
-----------------------------------------------------------

In my html, which is saved in *UCS-4 (4 bytes per character)*, I
write:

<body onload="getCodePoints('hello');">

How does javascript know that the string it is given is encoded using
UCS-4? Is it transcoded by the browser into javascript's native UTF-16
before giving it to the javascript function?

Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:

function insertString()
{
window.document.getElementById('bar').innerHTML = '';
}

And the HTML page which is the target for insertion itslef is encoded
in UTF-8:

<script charset="UCS-4" src="theSource.js"></script<!-- telling the
browser that the JS is encoded in UCS-4 -->
<body>
<div id='bar'>
</div>
</body>

How does the string get inserted into the HTML with UTF-8 encoding? Is
it again something the browser does (ie, it does something along the
lines of convert the UCS-4 string into javascript's native UTF-16, and
then when writing it into the DOM converts it into UTF-8)?

Thanks

Taras
Oct 17 '08 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Taras_96 wrote:
I have a couple of questions regarding encodings of javascript.

Say I have a function getCodePoints in foo.js:
-----------------------------------------------------------
function getCodePoints(str)
{
for (var i = 0; i < str.length; i++)
{
alert(str.charCodeAt(i));
}
}
-----------------------------------------------------------

In my html, which is saved in *UCS-4 (4 bytes per character)*, I write:

<body onload="getCodePoints('hello');">

How does javascript know that the string it is given is encoded using
UCS-4?
It does not. It would appear the user agent's parser would pass the content
of the `onload' attribute, after recoding, as the body of a function to the
script engine, which would be called as the property of the object that
caused the creation of the event (with some exceptions), on event.
Is it transcoded by the browser into javascript's native UTF-16 before
giving it to the javascript function?
In a sense.
Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:

function insertString()
{
window.document.getElementById('bar').innerHTML = '¼Ø';
^^
}
What character should we observe there?
And the HTML page
There are no "HTML pages" other than in a page-wise output medium (such as a
presentation or a printout). You are referring to HTML *documents*.
which is the target for insertion itslef is encoded in UTF-8:

<script charset="UCS-4" src="theSource.js"></script<!-- telling the
browser that the JS is encoded in UCS-4 -->
Iff supported. You should provide a proper Content-Type header for the
external resource to make sure it is parsed correctly.
<body<div id='bar'</div</body>

How does the string get inserted into the HTML with UTF-8 encoding?
It does not. The DOM API uses the DOMString type, which must be UTF-16(LE)
encoded. Although `innerHTML' is a proprietary property, I would presume
that it has a setter that accepts a DOMString value and performs recoding
according to the detected character encoding of the document if required.

<http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-C74D1578>
<http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-C74D1578>
<http://www.w3.org/TR/DOM-Level-2-HTML/>
Is it again something the browser does
The DOM implementation is a part of the layout engine which is a part of the
browser.
(ie, it does something along the lines of convert the UCS-4 string into
javascript's native UTF-16, and then when writing it into the DOM
converts it into UTF-8)?
Probably, yes. If the browser is Open Source, you can UTSL to confirm that.
HTH

PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Oct 17 '08 #2

P: n/a
Hi Thomas,

Sounds like I was on the right track

On Oct 18, 12:04*am, Thomas 'PointedEars' Lahn <PointedE...@web.de>
wrote:
Taras_96 wrote:
.....
>
Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:
function insertString()
{
* window.document.getElementById('bar').innerHTML = '';

* * * * * * * * * * * * * * * * * * * * * * * * * * * *^^
}

What character should we observe there?
It doesn't really matter.. the character was supposed to be a 1/4
character followed by a phi, I just wanted some non ASCII characters.
>
And the HTML page

There are no "HTML pages" other than in a page-wise output medium (such as a
presentation or a printout). *You are referring to HTML *documents*.
Fair enough :)
>
Iff supported. *You should provide a proper Content-Type header for the
external resource to make sure it is parsed correctly.
True

converts it into UTF-8)?

Probably, yes. *If the browser is Open Source, you can UTSL to confirm that.

HTH
Helped a lot... thanks!

Taras
Oct 24 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.