Taras_96 wrote:
Quote:
I have a couple of questions regarding encodings of javascript.
>
Say I have a function getCodePoints in foo.js:
-----------------------------------------------------------
function getCodePoints(str)
{
for (var i = 0; i < str.length; i++)
{
alert(str.charCodeAt(i));
}
}
-----------------------------------------------------------
>
In my html, which is saved in *UCS-4 (4 bytes per character)*, I write:
>
<body onload="getCodePoints('hello');">
>
How does javascript know that the string it is given is encoded using
UCS-4?
It does not. It would appear the user agent's parser would pass the content
of the `onload' attribute, after recoding, as the body of a function to the
script engine, which would be called as the property of the object that
caused the creation of the event (with some exceptions), on event.
Quote:
Is it transcoded by the browser into javascript's native UTF-16 before
giving it to the javascript function?
In a sense.
Quote:
Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:
>
function insertString()
{
window.document.getElementById('bar').innerHTML = '¼Ø';
^^
What character should we observe there?
There are no "HTML pages" other than in a page-wise output medium (such as a
presentation or a printout). You are referring to HTML *documents*.
Quote:
which is the target for insertion itslef is encoded in UTF-8:
>
<script charset="UCS-4" src="theSource.js"></script<!-- telling the
browser that the JS is encoded in UCS-4 -->
Iff supported. You should provide a proper Content-Type header for the
external resource to make sure it is parsed correctly.
Quote:
<body<div id='bar'</div</body>
>
How does the string get inserted into the HTML with UTF-8 encoding?
It does not. The DOM API uses the DOMString type, which must be UTF-16(LE)
encoded. Although `innerHTML' is a proprietary property, I would presume
that it has a setter that accepts a DOMString value and performs recoding
according to the detected character encoding of the document if required.
<http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-C74D1578>
<http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-C74D1578>
<http://www.w3.org/TR/DOM-Level-2-HTML/>
Quote:
Is it again something the browser does
The DOM implementation is a part of the layout engine which is a part of the
browser.
Quote:
(ie, it does something along the lines of convert the UCS-4 string into
javascript's native UTF-16, and then when writing it into the DOM
converts it into UTF-8)?
Probably, yes. If the browser is Open Source, you can UTSL to confirm that.
HTH
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee