Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old July 24th, 2005, 12:49 AM
enrique
Guest
 
Posts: n/a
Default reading Big5 as ASCII: bad idea?

Our server-side software is reading in Big5-encoded data as ASCII when
the web pages are generated. It seems to work most of the time, since
the HTML meta tag is declaring Big5 as the charset. However, every now
and then certain ASCII characters, like the quote (") for example, gets
read in and creates Javascript errors when the browser renders them.

I think this is a direct side effect of processing our Big5-encoded
files as ASCII. Can anyone confirm my suspicions on this?

I'm thinking perhaps the software should be reading these files as
binary Big5-encoded, instead of ASCII and then the Chinese content
won't be converted to HTML-reserved characters like the troublesome
quote.

Another question: I see in the character chart for Big5 that Latin
letters and characters are supported, including the quote character.
Will a Big5-encoded quote character (not the ASCII quote) cause
Javascript issues as well? I'm hoping that so long as Chinese-only
content is contained in HTML tags using the "lang" attribute set
appropriately (for Chinese), the browser won't attempt to render the
Big5-encoded quote as a Javascript string delimiter.

Thank you.

epp

  #2  
Old July 24th, 2005, 12:49 AM
Harlan Messinger
Guest
 
Posts: n/a
Default Re: reading Big5 as ASCII: bad idea?

enrique wrote:[color=blue]
> Our server-side software is reading in Big5-encoded data as ASCII when
> the web pages are generated. It seems to work most of the time, since
> the HTML meta tag is declaring Big5 as the charset. However, every now
> and then certain ASCII characters, like the quote (") for example, gets
> read in and creates Javascript errors when the browser renders them.
>
> I think this is a direct side effect of processing our Big5-encoded
> files as ASCII. Can anyone confirm my suspicions on this?[/color]

Big 5 doesn't have anything to do with it. Even if the data was ASCII
and was being read as ASCII, if you're generating code like

var address = "58 "Q" Street";

because a data field reads

58 "Q" Street

then you will break the Javascript. On *any* platform this is an issue
with data that contains symbols that are also used within the code to
delimit that data. They need to be escaped. In the example above, you'd
have to have

var address = "58 \"Q\" Street";

Switching to single quotes as the string delimiter wouldn't do any good,
because then you'd have

var name = 'Florence O'Malley';

which you'd have to change to

var name = "Florence O\'Malley';

[follow-ups to comp.lang.javascript]
 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles