473,466 Members | 1,639 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Counting utf-8 characters -special characters

I have character counter for textarea wich counting the characters.
Special character needs same place as two normal characters because of
16-bit encoding.
Counter is counting -2 when special character is added like some
language specific char.

How to count specials like 1 char?
tnx

Sep 19 '07 #1
3 2766
Thomas 'PointedEars' Lahn wrote:
majna wrote:
>I have character counter for textarea wich counting the characters.
Special character needs same place as two normal characters because of
16-bit encoding.

It doesn't.
>Counter is counting -2 when special character is added like some
language specific char.
Should have been -1. But even if most implementations would not be
UTF-16 safe, that would not have sufficed. UTF-16 does not mean that
the representation of a glyph in that encoding requires always only
16 bits:

http://www.unicode.org/faq/utf_bom.html#6
"€".length === 1
Windows(-1252). Hmpf. Make that "€" any Unicode glyph (such as "â‚*")
and it is still true.
PointedEars
--
var bugRiddenCrashPronePieceOfJunk = (
navigator.userAgent.indexOf('MSIE 5') != -1
&& navigator.userAgent.indexOf('Mac') != -1
) // Plone, register_function.js:16
Sep 19 '07 #2
Thomas 'PointedEars' Lahn :
"€".length === 1
Should be, since '€' (U+20AC) is represented as a single UTF-16 code
point, but it is not, e.g., in spidermonkey, which obviously uses UTF-8:

jse = "€"
€
jse.length
3
jsfor (i = 0; i < e.length; i++) {print(e.charCodeAt(i).toString(16))}
e2
82
ac

But then, OP mentions UTF-8 in the subject line.
>How to count specials like 1 char?
The same way. ECMAScript 3 implementations use UTF-16 encoded strings.
RTFM.
Hmmm. Is there *any* implementation that actually respects the requirement
of UTF-16?

Besides, even assuming UTF-16, some "language specific" characters (whatever
that means...) take up more than one code point. Some characters may even
use one or more code points according to whether one uses decomposition
or not, e.g., 'é' is either U+00E9 or U+0065 U+0301.

Short of testing each successive octet (if the implementation uses UTF-8)
or code point (if the implementation is correct according to the specs)
to see what kind of character it is, I have so far been unable to answer
the OP's question.

--
Johannes
"Quand on dit c'est un Johannes, cela vaut autant que ce que maintenant
on appelle un pédant" (H. Estienne, in É. Littré, /Dictionnaire de la
langue française/, art. PÉDANT)
Sep 19 '07 #3
Thomas 'PointedEars' Lahn :

[My version of SpiderMonkey uses UTF-8]
Probably due to your SpiderMonkey build. It works just fine since
Mozilla/4.0.
It does indeed in my version of Firefox. Serves me right for sticking with
obsolete command-line tools :-) It would appear that if I want a good
stand-alone ECMAScript interpreter, I have to compile it myself.
http://www.unicode.org/faq/char_combmark.html#2
Excellent, thanks a lot.

--
Johannes
"Quand on dit c'est un Johannes, cela vaut autant que ce que maintenant
on appelle un pédant" (H. Estienne, in É. Littré, /Dictionnaire de la
langue française/, art. PÉDANT)
Sep 19 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Roy W. Andersen | last post by:
I've been searching google about this for days but can't find anything, so I'm hoping someone here can help me out. I'm trying to create zip-files without needing the zip-file extension in PHP,...
3
by: Barry Olly | last post by:
Hi, I'm working on a mini content management system and need help with dealing with special characters. The input are taken from html form which are then stored into a varchar column in...
5
by: Sakharam Phapale | last post by:
Hi All, I am using an API function, which takes file path as an input. When file path contains special characters (@,#,$,%,&,^, etc), API function gives an error as "Unable to open input file"....
17
by: Carl Mercier | last post by:
Hi, Is it possible to use special characters like \n or \t in a VB.NET string, just like in C#? My guess is NO, but maybe there's something I don't know. If it's not possible, does anybody...
8
by: david.lindsay.green | last post by:
Hello all, I am quite new a web scripting and making web pages in general and I have stumbled across a problem I have as yet been unable to solve. I am trying to take the contents of a textarea box...
5
by: Doc | last post by:
Hello! I'm experiencing a little problem counting the number of characters in a textarea on a html page. This is the content type of my HTML document content="text/html; charset=iso-8859-1" ...
3
by: cheesecaker | last post by:
When I INSERT special characters into my MySQL database, they become distorted. For example, an accented e becomes "é". The database is set to utf8/utf8_general_ci, and so is the table and column...
3
KevinADC
by: KevinADC | last post by:
Purpose The purpose of this article is to discuss the difference between characters inside a character class and outside a character class and some special characters inside a character class....
2
by: chike_oji | last post by:
Hello, I am displaying data retrieved from a database as a httpresponse in a webform. I noticed that some characters such as the comma (,), display as special characters in the web browser...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.