469,645 Members | 1,684 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,645 developers. It's quick & easy.

Trouble with document.write and UTF-8

Can someone explain why I don't seem unable to use document.write to
produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ?

I've tried everyway I've been able to find to tell the browser I'm
trying to print UTF-8 and still no luck. I'd like the first 2 tries to
match the second two tries as far as output.

<HTML>
<meta http-equiv="Content-Type" content="application/x-script;
charset=UTF-8">
<SCRIPT language="javascript" charset="UTF-8">
var out = "UTF-8 nbsp:\xC2\xA0:Unicode:\uC2A0:Unicode:\u00A0:HTML
nbsp:&nbsp;"
document.open("text/html; charset=UTF-8");
document.write(out);
var i =0;
while (i <out.length){
document.write("<br>"+i+" "+out.charAt(i)+" "+out.charCodeAt(i));
i++;
}
document.close();document.charset="UTF-8";
</SCRIPT>
</HTML>

The output looks like this:
UTF-8 nbsp:Â :Unicode:*:Unicode: :HTML nbsp:
0 U 85
1 T 84
2 F 70
3 - 45
4 8 56
5 32
6 n 110
7 b 98
8 s 115
9 p 112
10 : 58
11 Â 194
12 160
13 : 58
14 U 85
15 n 110
16 i 105
17 c 99
18 o 111
19 d 100
20 e 101
21 : 58
22 * 49824
23 : 58
24 U 85
25 n 110
26 i 105
27 c 99
28 o 111
29 d 100
30 e 101
31 : 58
32 160
33 : 58
34 H 72
35 T 84
36 M 77
37 L 76
38 32
39 n 110
40 b 98
41 s 115
42 p 112
43 : 58
44 & 38
45 n 110
46 b 98
47 s 115
48 p 112
49 ; 59

Thanks!

Nov 11 '05 #1
1 14909
st************@yahoo.com wrote:
Can someone explain why I don't seem unable to use document.write to
produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ?
The Unicode 4.1 character at code point 0xC2A0 is an (unnamed) Hangul
syllable, as can be seen at <http://www.unicode.org/charts/PDF/UAC00.pdf>
I've tried everyway I've been able to find to tell the browser I'm
trying to print UTF-8 and still no luck. I'd like the first 2 tries to
match the second two tries as far as output.

<HTML>
<meta http-equiv="Content-Type" content="application/x-script;
charset=UTF-8">
Pardon? This is supposed to be an HTML document, is it not? So the basic
Content-Type should be text/html. And if that HTML document were UTF-8
encoded, you would not have to escape Unicode anyway. So you want to
change the `charset' parameter to ISO-8859-1 and the like, definitely
no UTF encoding.

And there is no known MIME-like label as 'application/x-script'.
I wonder how you got the idea.

You probably meant

<meta http-equiv="Content-Script-Type"
content="application/javascript; charset=UTF-8">

as described in the Informal RFC "Scripting Media Types", which is,
however, not yet used by user agents.
<SCRIPT language="javascript" charset="UTF-8">
The `language' attribute is deprecated in HTML4, the `type' attribute
is #REQUIRED. The `charset' attribute is for linked resources, i.e.
useful only in combination with the `src' attribute.

<script type="application/javascript">

See <http://www.w3.org/TR/html4/interact/scripts.html#edef-SCRIPT>
and <http://validator.w3.org/>.
var out = "UTF-8 nbsp:\xC2\xA0:Unicode:\uC2A0:Unicode:\u00A0:HTML
nbsp:&nbsp;"
You need to understand what UTF and Unicode are and how UTF works,
see <http://www.unicode.org/faq/>.
document.open("text/html; charset=UTF-8");
There is no specified argument for the HTMLDocument::open() method.
Therefore, Mozilla/5.0 based user agents will ignore it if you provide
one.

<https://bugzilla.mozilla.org/show_bug.cgi?id=73409>
document.charset="UTF-8";
There is no document.charset property, hence you are creating one here.
The output looks like this:
[...]


Works as designed.

Summary: You should definitely drink more tea[tm] when coding.
PointedEars
Nov 11 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

19 posts views Thread by Svennglenn | last post: by
reply views Thread by Tor Hovland | last post: by
1 post views Thread by Bartek | last post: by
6 posts views Thread by Daniel Walzenbach | last post: by
6 posts views Thread by Just Me | last post: by
10 posts views Thread by whisher | last post: by
10 posts views Thread by =?Utf-8?B?YzY3NjIyOA==?= | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.