Connecting Tech Pros Worldwide Forums | Help | Site Map

Trouble with document.write and UTF-8

stevelooking41@yahoo.com
Guest
 
Posts: n/a
#1: Nov 11 '05
Can someone explain why I don't seem unable to use document.write to
produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ?

I've tried everyway I've been able to find to tell the browser I'm
trying to print UTF-8 and still no luck. I'd like the first 2 tries to
match the second two tries as far as output.

<HTML>
<meta http-equiv="Content-Type" content="application/x-script;
charset=UTF-8">
<SCRIPT language="javascript" charset="UTF-8">
var out = "UTF-8 nbsp:\xC2\xA0:Unicode:\uC2A0:Unicode:\u00A0:HTML
nbsp:&nbsp;"
document.open("text/html; charset=UTF-8");
document.write(out);
var i =0;
while (i <out.length){
document.write("<br>"+i+" "+out.charAt(i)+" "+out.charCodeAt(i));
i++;
}
document.close();document.charset="UTF-8";
</SCRIPT>
</HTML>

The output looks like this:
UTF-8 nbsp: :Unicode:ìŠ*:Unicode: :HTML nbsp:
0 U 85
1 T 84
2 F 70
3 - 45
4 8 56
5 32
6 n 110
7 b 98
8 s 115
9 p 112
10 : 58
11 Â 194
12 160
13 : 58
14 U 85
15 n 110
16 i 105
17 c 99
18 o 111
19 d 100
20 e 101
21 : 58
22 ìŠ* 49824
23 : 58
24 U 85
25 n 110
26 i 105
27 c 99
28 o 111
29 d 100
30 e 101
31 : 58
32 160
33 : 58
34 H 72
35 T 84
36 M 77
37 L 76
38 32
39 n 110
40 b 98
41 s 115
42 p 112
43 : 58
44 & 38
45 n 110
46 b 98
47 s 115
48 p 112
49 ; 59

Thanks!


Thomas 'PointedEars' Lahn
Guest
 
Posts: n/a
#2: Nov 11 '05

re: Trouble with document.write and UTF-8


stevelooking41@yahoo.com wrote:
[color=blue]
> Can someone explain why I don't seem unable to use document.write to
> produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ?[/color]

The Unicode 4.1 character at code point 0xC2A0 is an (unnamed) Hangul
syllable, as can be seen at <http://www.unicode.org/charts/PDF/UAC00.pdf>
[color=blue]
> I've tried everyway I've been able to find to tell the browser I'm
> trying to print UTF-8 and still no luck. I'd like the first 2 tries to
> match the second two tries as far as output.
>
> <HTML>
> <meta http-equiv="Content-Type" content="application/x-script;
> charset=UTF-8">[/color]

Pardon? This is supposed to be an HTML document, is it not? So the basic
Content-Type should be text/html. And if that HTML document were UTF-8
encoded, you would not have to escape Unicode anyway. So you want to
change the `charset' parameter to ISO-8859-1 and the like, definitely
no UTF encoding.

And there is no known MIME-like label as 'application/x-script'.
I wonder how you got the idea.

You probably meant

<meta http-equiv="Content-Script-Type"
content="application/javascript; charset=UTF-8">

as described in the Informal RFC "Scripting Media Types", which is,
however, not yet used by user agents.
[color=blue]
> <SCRIPT language="javascript" charset="UTF-8">[/color]

The `language' attribute is deprecated in HTML4, the `type' attribute
is #REQUIRED. The `charset' attribute is for linked resources, i.e.
useful only in combination with the `src' attribute.

<script type="application/javascript">

See <http://www.w3.org/TR/html4/interact/scripts.html#edef-SCRIPT>
and <http://validator.w3.org/>.
[color=blue]
> var out = "UTF-8 nbsp:\xC2\xA0:Unicode:\uC2A0:Unicode:\u00A0:HTML
> nbsp:&nbsp;"[/color]

You need to understand what UTF and Unicode are and how UTF works,
see <http://www.unicode.org/faq/>.
[color=blue]
> document.open("text/html; charset=UTF-8");[/color]

There is no specified argument for the HTMLDocument::open() method.
Therefore, Mozilla/5.0 based user agents will ignore it if you provide
one.

<https://bugzilla.mozilla.org/show_bug.cgi?id=73409>
[color=blue]
> document.charset="UTF-8";[/color]

There is no document.charset property, hence you are creating one here.
[color=blue]
> The output looks like this:
> [...][/color]

Works as designed.

Summary: You should definitely drink more tea[tm] when coding.


PointedEars
Closed Thread


Similar JavaScript / Ajax / DHTML bytes