By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,384 Members | 1,886 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,384 IT Pros & Developers. It's quick & easy.

double-byte character

P: n/a
is it possible to detect any double-byte character in the text? thanks.

tony
Oct 2 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a


tony wong wrote:
is it possible to detect any double-byte character in the text? thanks.


Since JavaScript 1.3 (in Netscape 4.06) and JScript 4 (in IE 4) the
strings in JavaScript are sequences of Unicode characters, you can
access any character in a string with
string.charAt(index)
and the Unicode character code of any character in a string with
string.charCodeAt(index)
There is no byte type in JavaScript 1.x and there is no access to the
internal byte representation of an Unicode character or a complete string.
The internal string representation choosen is usually UTF-16 so in that
sense all characters are double byte characters. But as said, as a
scripter you deal with sequences of Unicode characters and the internal
encoding in bytes does not matter for scripting.

--

Martin Honnen
http://JavaScript.FAQTs.com/
Oct 2 '05 #2

P: n/a
tony wong <x3*@netvigator.com> wrote in message news:43******@127.0.0.1...
is it possible to detect any double-byte character in the text? thanks.


If you mean to detect the presence of any character whose hi-byte is non-zero:

if( /[\u0100-\uffff]/.test( text ) )
...

--
S.C.
Oct 2 '05 #3

P: n/a
Martin Honnen wrote:
The internal string representation choosen is usually UTF-16 so
in that sense all characters are double byte characters.


No, they are not. I thought a similar thing before (about UTF-8),
but this is not how UTF works. Additional code units (surrogate
pairs) are used if needed for a character, i.e. all Unicode
characters beyond code point 0xFFFF are represented in UTF-16/UCS2
by two 16-bit words or four bytes each.

<http://www.unicode.org/faq/basic_q.html#19>
<http://en.wikipedia.org/wiki/UTF-16/UCS-2>
PointedEars
Oct 16 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.