469,578 Members | 1,731 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,578 developers. It's quick & easy.

Unicode troubles

I'm finishing a multiplatform collaborative realtime text editor (something
like SubEthaEdit but multiplatform and opensource) develloped using
Python+Twisted as a plugin for Leo.

Of course as the software run in different platforms in different places,
text encoding compatibility is an issue.
So the obvious choice was Tkencoding for client gui, unicode for system
internals and utf-8 for web outputs.
But I'm getting serious trouble using Tk and Unicode internals.

The system, being a text editor use string lenghts and position in the text
widget as parameters of most of the function critical algorithms.
Unfortunatelly I had discovered recently that some encoding does not provide
and equivalence between
num_of_chars/length_of_string/position_in_text_widget. As a result each time
someone press a non ascii key, the references are lose and the other clients
receive a soup of letters.

I had read on internet that Unicode was supposed to keep the relation
num_of_char/string_lenght (and thus the relation
string_length/num_of_char/position_in_text_widget). But this relation does
not occurs on all my machines.

Sometimes I get len(u"el") = 3 (the good result) and other times
len(u"el") = 4 (wrong result). These seems indiferent of the OS.

Could someone explain me this issue ? How I'm supposed to manage this
problem ? Do I have to compile python with special params to get unicode
chars and one length unit ?

Rodrigo Benenson.
Jul 18 '05 #1
1 1526
Rodrigo Benenson wrote:
Sometimes I get len(u"el") = 3 (the good result) and other times
len(u"el") = 4 (wrong result). These seems indiferent of the OS.

There are different ways to express "special" characters.
E.g. you can describe "" as a single character,
or as accent + "o".
What you want is the "canonical form".
Take a look at unicodedata.normalize (well, it came
new with Python 2.3)


Hope this helps,

Michael Radziej

Jul 18 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Rodrigo Benenson | last post: by
8 posts views Thread by Bill Eldridge | last post: by
4 posts views Thread by webdev | last post: by
2 posts views Thread by Neil Schemenauer | last post: by
9 posts views Thread by thijs.braem | last post: by
6 posts views Thread by gita ziabari | last post: by
reply views Thread by suresh191 | last post: by
4 posts views Thread by guiromero | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.