473,382 Members | 1,658 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Encodings of javascript

Hi everyone,

I have a couple of questions regarding encodings of javascript.

Say I have a function getCodePoints in foo.js:
-----------------------------------------------------------
function getCodePoints(str)
{
for(var i = 0; i < str.length; i++)
{
alert(str.charCodeAt(i));
}
}
-----------------------------------------------------------

In my html, which is saved in *UCS-4 (4 bytes per character)*, I
write:

<body onload="getCodePoints('hello');">

How does javascript know that the string it is given is encoded using
UCS-4? Is it transcoded by the browser into javascript's native UTF-16
before giving it to the javascript function?

Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:

function insertString()
{
window.document.getElementById('bar').innerHTML = '¼Ø';
}

And the HTML page which is the target for insertion itslef is encoded
in UTF-8:

<script charset="UCS-4" src="theSource.js"></script<!-- telling the
browser that the JS is encoded in UCS-4 -->
<body>
<div id='bar'>
</div>
</body>

How does the string get inserted into the HTML with UTF-8 encoding? Is
it again something the browser does (ie, it does something along the
lines of convert the UCS-4 string into javascript's native UTF-16, and
then when writing it into the DOM converts it into UTF-8)?

Thanks

Taras
Oct 17 '08 #1
2 1532
Taras_96 wrote:
I have a couple of questions regarding encodings of javascript.

Say I have a function getCodePoints in foo.js:
-----------------------------------------------------------
function getCodePoints(str)
{
for (var i = 0; i < str.length; i++)
{
alert(str.charCodeAt(i));
}
}
-----------------------------------------------------------

In my html, which is saved in *UCS-4 (4 bytes per character)*, I write:

<body onload="getCodePoints('hello');">

How does javascript know that the string it is given is encoded using
UCS-4?
It does not. It would appear the user agent's parser would pass the content
of the `onload' attribute, after recoding, as the body of a function to the
script engine, which would be called as the property of the object that
caused the creation of the event (with some exceptions), on event.
Is it transcoded by the browser into javascript's native UTF-16 before
giving it to the javascript function?
In a sense.
Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:

function insertString()
{
window.document.getElementById('bar').innerHTML = '¼Ø';
^^
}
What character should we observe there?
And the HTML page
There are no "HTML pages" other than in a page-wise output medium (such as a
presentation or a printout). You are referring to HTML *documents*.
which is the target for insertion itslef is encoded in UTF-8:

<script charset="UCS-4" src="theSource.js"></script<!-- telling the
browser that the JS is encoded in UCS-4 -->
Iff supported. You should provide a proper Content-Type header for the
external resource to make sure it is parsed correctly.
<body<div id='bar'</div</body>

How does the string get inserted into the HTML with UTF-8 encoding?
It does not. The DOM API uses the DOMString type, which must be UTF-16(LE)
encoded. Although `innerHTML' is a proprietary property, I would presume
that it has a setter that accepts a DOMString value and performs recoding
according to the detected character encoding of the document if required.

<http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-C74D1578>
<http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-C74D1578>
<http://www.w3.org/TR/DOM-Level-2-HTML/>
Is it again something the browser does
The DOM implementation is a part of the layout engine which is a part of the
browser.
(ie, it does something along the lines of convert the UCS-4 string into
javascript's native UTF-16, and then when writing it into the DOM
converts it into UTF-8)?
Probably, yes. If the browser is Open Source, you can UTSL to confirm that.
HTH

PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Oct 17 '08 #2
Hi Thomas,

Sounds like I was on the right track

On Oct 18, 12:04*am, Thomas 'PointedEars' Lahn <PointedE...@web.de>
wrote:
Taras_96 wrote:
.....
>
Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:
function insertString()
{
* window.document.getElementById('bar').innerHTML = '¼Ø';

* * * * * * * * * * * * * * * * * * * * * * * * * * * *^^
}

What character should we observe there?
It doesn't really matter.. the character was supposed to be a 1/4
character followed by a phi, I just wanted some non ASCII characters.
>
And the HTML page

There are no "HTML pages" other than in a page-wise output medium (such as a
presentation or a printout). *You are referring to HTML *documents*.
Fair enough :)
>
Iff supported. *You should provide a proper Content-Type header for the
external resource to make sure it is parsed correctly.
True

converts it into UTF-8)?

Probably, yes. *If the browser is Open Source, you can UTSL to confirm that.

HTH
Helped a lot... thanks!

Taras
Oct 24 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: R. Rajesh Jeba Anbiah | last post by:
Here is a nice code to detect utf-8 <http://in2.php.net/utf8_encode#39986> But, I couldn't find out the logic behind the script. If anyone knows that please share. Particularly I would like to...
9
by: P | last post by:
Hi, How would one get a list of encodings that can be passed to "".encode() I know base64,utf8,latin,ascii work and these are listed in the encodings and codecs modules, but where can one...
27
by: John Roth | last post by:
PEP 263 is marked finished in the PEP index, however I haven't seen the specified Phase 2 in the list of changes for 2.4 which is when I expected it. Did phase 2 get cancelled, or is it just not...
5
by: F. GEIGER | last post by:
I'm on WinXP, Python 2.3. I don't have problems with umlauts (ä, ö, ü and their uppercase instances) in my wxPython-GUIs, when displayed as static texts. But when filling controls with text...
9
by: Safalra | last post by:
The idea here is relatively simple: a java program (I'm using JDK1.4 if that makes a difference) that loads an HTML file, removes invalid characters (or replaces them in the case of common ones...
10
by: Bugs | last post by:
I believe I read in a relatively recent thread that the reason python24.dll is so large compared to previous releases is that all the language encodings are linked into the library? Are there...
40
by: apprentice | last post by:
Hello, I'm writing an class library that I imagine people from different countries might be interested in using, so I'm considering what needs to be provided to support foreign languages,...
13
by: mario | last post by:
Hello! i stumbled on this situation, that is if I decode some string, below just the empty string, using the mcbs encoding, it succeeds, but if I try to encode it back with the same encoding it...
3
by: Philip Semanchuk | last post by:
On Nov 9, 2008, at 7:00 PM, News123 wrote: Look under the heading "Standard Encodings": http://docs.python.org/library/codecs.html Note that both the page you found (which appears to be a...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.