Thomas 'PointedEars' Lahn :
"€".length === 1
Should be, since '€' (U+20AC) is represented as a single UTF-16 code
point, but it is not, e.g., in spidermonkey, which obviously uses UTF-8:
jse = "€"
€
jse.length
3
jsfor (i = 0; i < e.length; i++) {print(e.charCodeAt(i).toString(16))}
e2
82
ac
But then, OP mentions UTF-8 in the subject line.
>How to count specials like 1 char?
The same way. ECMAScript 3 implementations use UTF-16 encoded strings.
RTFM.
Hmmm. Is there *any* implementation that actually respects the requirement
of UTF-16?
Besides, even assuming UTF-16, some "language specific" characters (whatever
that means...) take up more than one code point. Some characters may even
use one or more code points according to whether one uses decomposition
or not, e.g., 'é' is either U+00E9 or U+0065 U+0301.
Short of testing each successive octet (if the implementation uses UTF-8)
or code point (if the implementation is correct according to the specs)
to see what kind of character it is, I have so far been unable to answer
the OP's question.
--
Johannes
"Quand on dit c'est un Johannes, cela vaut autant que ce que maintenant
on appelle un pédant" (H. Estienne, in É. Littré, /Dictionnaire de la
langue française/, art. PÉDANT)