Taras_96 wrote:
I have a couple of questions regarding encodings of javascript.
Say I have a function getCodePoints in foo.js:
-----------------------------------------------------------
function getCodePoints(str)
{
for (var i = 0; i < str.length; i++)
{
alert(str.charCodeAt(i));
}
}
-----------------------------------------------------------
In my html, which is saved in *UCS-4 (4 bytes per character)*, I write:
<body onload="getCodePoints('hello');">
How does javascript know that the string it is given is encoded using
UCS-4?
It does not. It would appear the user agent's parser would pass the content
of the `onload' attribute, after recoding, as the body of a function to the
script engine, which would be called as the property of the object that
caused the creation of the event (with some exceptions), on event.
Is it transcoded by the browser into javascript's native UTF-16 before
giving it to the javascript function?
In a sense.
Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:
function insertString()
{
window.document.getElementById('bar').innerHTML = '¼Ø';
^^
}
What character should we observe there?
And the HTML page
There are no "HTML pages" other than in a page-wise output medium (such as a
presentation or a printout). You are referring to HTML *documents*.
which is the target for insertion itslef is encoded in UTF-8:
<script charset="UCS-4" src="theSource.js"></script<!-- telling the
browser that the JS is encoded in UCS-4 -->
Iff supported. You should provide a proper Content-Type header for the
external resource to make sure it is parsed correctly.
<body<div id='bar'</div</body>
How does the string get inserted into the HTML with UTF-8 encoding?
It does not. The DOM API uses the DOMString type, which must be UTF-16(LE)
encoded. Although `innerHTML' is a proprietary property, I would presume
that it has a setter that accepts a DOMString value and performs recoding
according to the detected character encoding of the document if required.
<http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-C74D1578>
<http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-C74D1578>
<http://www.w3.org/TR/DOM-Level-2-HTML/>
Is it again something the browser does
The DOM implementation is a part of the layout engine which is a part of the
browser.
(ie, it does something along the lines of convert the UCS-4 string into
javascript's native UTF-16, and then when writing it into the DOM
converts it into UTF-8)?
Probably, yes. If the browser is Open Source, you can UTSL to confirm that.
HTH
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee