Hello all,
I am trying to internationaliz e my Tkinter program using gettext and
encountered various problems, so it looks like it's not a trivial
task.
After some "research" I made up a few rules for a concept that I hope
lets me avoid further encoding trouble, but I would feel more
confident if some of the experts here would have a look at the
thoughts I made so far and told me if I'm still going wrong somewhere
(BTW, the program is supposed to run on linux only). So here is what I
have so far:
1. use unicode instead of byte strings wherever possible. This can be
a little tricky, because in some situations I cannot know in advance
if a certain string is unicode or byte string; I wrote a helper module
for this which defines convenience methods for fail-safe
decoding/encoding of strings and a Tkinter.Unicode Var class which I
use to convert user input to unicode on the fly (see the code below).
2. so I will have to call gettext.install () with unicode=1
3. make sure to NEVER mix unicode and byte strings within one
expression
4. in order to maintain code readability it's better to risk excess
decode/encode cycles than having one too few.
5. file operations seem to be delicate; at least I got an error when I
passed a filename that contains special characters as unicode to
os.access(), so I guess that whenever I do file operations
(os.remove(), shutil.copy() ...) the filename should be encoded back
into system encoding before; The filename manipulations by the os.path
methods seem to be simply string manipulations so encoding the
filenames doesn't seem to be necessary.
6. messages that are printed to stdout should be encoded first, too;
the same with strings I use to call external shell commands.
############ file UnicodeHandler. py ############### ############### ####
# -*- coding: iso-8859-1 -*-
import Tkinter
import sys
import locale
import codecs
def _find_codec(enc oding):
# return True if the requested codec is available, else return
False
try:
codecs.lookup(e ncoding)
return 1
except LookupError:
print 'Warning: codec %s not found' % encoding
return 0
def _sysencoding():
# try to guess the system default encoding
try:
enc = locale.getprefe rredencoding(). lower()
if _find_codec(enc ):
print 'Setting locale to %s' % enc
return enc
except AttributeError:
# our python is too old, try something else
pass
enc = locale.getdefau ltlocale()[1].lower()
if _find_codec(enc ):
print 'Setting locale to %s' % enc
return enc
# the last try
enc = sys.stdin.encod ing.lower()
if _find_codec(enc ):
print 'Setting locale to %s' % enc
return enc
# aargh, nothing good found, fall back to latin1 and hope for the
best
print 'Warning: cannot find usable locale, using latin-1'
return 'iso-8859-1'
sysencoding = _sysencoding()
def fsdecode(input, errors='strict' ):
'''Fail-safe decodes a string into unicode.'''
if not isinstance(inpu t, unicode):
return unicode(input, sysencoding, errors)
return input
def fsencode(input, errors='strict' ):
'''Fail-safe encodes a unicode string into system default
encoding.'''
if isinstance(inpu t, unicode):
return input.encode(sy sencoding, errors)
return input
class UnicodeVar(Tkin ter.StringVar):
def __init__(self, master=None, errors='strict' ):
Tkinter.StringV ar.__init__(sel f, master)
self.errors = errors
self.trace('w', self._str2unico de)
def _str2unicode(se lf, *args):
old = self.get()
if not isinstance(old, unicode):
new = fsdecode(old, self.errors)
self.set(new)
############### ############### ############### ############### ###########
So before I start to mess up all of my code, maybe someone can give me
a hint if I still forgot something I should keep in mind or if I am
completely wrong somewhere.
Thanks in advance
Michael 15 1546
Michael: 5. file operations seem to be delicate; at least I got an error when I passed a filename that contains special characters as unicode to os.access(), so I guess that whenever I do file operations (os.remove(), shutil.copy() ...) the filename should be encoded back into system encoding before;
This can lead to failure on Windows when the true Unicode file name can
not be encoded in the current system encoding.
Neil
"Neil Hodgson" <nh******@bigpo nd.net.au> wrote in message news:<6O******* *************@n ews-server.bigpond. net.au>... Michael:
5. file operations seem to be delicate; at least I got an error when I passed a filename that contains special characters as unicode to os.access(), so I guess that whenever I do file operations (os.remove(), shutil.copy() ...) the filename should be encoded back into system encoding before;
This can lead to failure on Windows when the true Unicode file name can not be encoded in the current system encoding.
Neil
Like I said, it's only supposed to run on linux; anyway, is it likely
that problems will arise when filenames I have to handle have
basically three sources:
1. already existing files
2. automatically generated filenames, which result from adding an
ascii-only suffix to an existing filename (like xy --> xy_bak2)
3. filenames created by user input
?
If yes, how to avoid these?
Any hints are appreciated
Michael
Michael: Like I said, it's only supposed to run on linux; anyway, is it likely that problems will arise when filenames I have to handle have basically three sources: ... 3. filenames created by user input
Have you worked out how you want to handle user input that is not
representable in the encoding? It is easy for users to input any characters
into a Unicode enabled UI either through invoking an input method or by
copying and pasting from another application or character chooser applet.
Neil
"Neil Hodgson" <nh******@bigpo nd.net.au> wrote in message news:<Pq******* **********@news-server.bigpond. net.au>... Michael:
Like I said, it's only supposed to run on linux; anyway, is it likely that problems will arise when filenames I have to handle have basically three sources: ... 3. filenames created by user input
Have you worked out how you want to handle user input that is not representable in the encoding? It is easy for users to input any characters into a Unicode enabled UI either through invoking an input method or by copying and pasting from another application or character chooser applet.
Neil
As I must admit, no. I just couldn't figure out that someone will really do this.
I guess I could add a test like (pseudo code):
try:
test = fsdecode(input) # convert to unicode
test.encode(sys encoding)
except:
# show a message box with something like "Invalid file name"
Please tell me if you find any other possible gotchas.
Thanks so far
Michael
klappnase wrote: Hello all,
I am trying to internationaliz e my Tkinter program using gettext and encountered various problems, so it looks like it's not a trivial task.
Considered that you decided to support old python versions, it's true.
Unicode support has gradually improved. If you choose to target old
python version, basically you're dealing with years old unicode
support.
After some "research" I made up a few rules for a concept that I hope lets me avoid further encoding trouble, but I would feel more confident if some of the experts here would have a look at the thoughts I made so far and told me if I'm still going wrong somewhere (BTW, the program is supposed to run on linux only). So here is what I have so far:
1. use unicode instead of byte strings wherever possible. This can be a little tricky, because in some situations I cannot know in advance if a certain string is unicode or byte string; I wrote a helper module for this which defines convenience methods for fail-safe decoding/encoding of strings and a Tkinter.Unicode Var class which I use to convert user input to unicode on the fly (see the code below).
I've never used tkinter, but I heard good things about it. Are you
sure it's not you who made it to return byte string sometimes?
Anyway, your idea is right, make IO libraries always return unicode.
3. make sure to NEVER mix unicode and byte strings within one expression
As a rule of thumb you should convert byte strings into unicode
strings at input and back to byte strings at output. This way
the core of your program will have to deal only with unicode
strings.
4. in order to maintain code readability it's better to risk excess decode/encode cycles than having one too few.
I don't think so. Either you need decode/encode or you don't.
5. file operations seem to be delicate;
You should be ready to handle unicode errors at file operations as
well as for example ENAMETOOLONG error. Any file system with path
argument can throw it, I don't think anything changed here with
introduction of unicode. For example access can return 11 (on
my linux system) error codes, consider unicode error to be twelveth.
at least I got an error when I passed a filename that contains special characters as unicode to os.access(), so I guess that whenever I do file operations (os.remove(), shutil.copy() ...) the filename should be encoded back into system encoding before;
I think python 2.3 handles that for you. (I'm not sure about the
version)
If you have to support older versions, you have to do it yourself.
6. messages that are printed to stdout should be encoded first, too; the same with strings I use to call external shell commands.
If you use stdout as dump device just install the encoder in the
beginning of your program, something like
sys.stdout = codecs.getwrite r(...) ...
sys.stderr = codecs.getwrite r(...) ...
Serge.
"Serge Orlov" <Se*********@gm ail.com> wrote in message news:<11******* **************@ o13g2000cwo.goo glegroups.com>. .. I've never used tkinter, but I heard good things about it. Are you sure it's not you who made it to return byte string sometimes?
Yes, I used a Tkinter.StringV ar to keep track of the contents of an
Entry widget; as long as I entered only ascii characters get() returns
a byte string, as soon as a special character is entered it returns
unicode.
Anyway, my UnicodeVar() class seems to be a handy way to avoid
problems here. 4. in order to maintain code readability it's better to risk excess decode/encode cycles than having one too few.
I don't think so. Either you need decode/encode or you don't.
I use a bunch of modules that contain helper functions for frequently
repeated tasks. So it sometimes happens for example that I call one of
my module functions to convert user input into unicode and then call
the next module function to convert it back to byte string to start
some file operation; that's what I meant with "excess decode/encode
cycles". However, trying to avoid these ended in totally messing up
the code. 5. file operations seem to be delicate;
You should be ready to handle unicode errors at file operations as well as for example ENAMETOOLONG error. Any file system with path argument can throw it, I don't think anything changed here with introduction of unicode. For example access can return 11 (on my linux system) error codes, consider unicode error to be twelveth.
at least I got an error when I passed a filename that contains special characters as unicode to os.access(), so I guess that whenever I do file operations (os.remove(), shutil.copy() ...) the filename should be encoded back into system encoding before;
I think python 2.3 handles that for you. (I'm not sure about the version) If you have to support older versions, you have to do it yourself.
I am using python-2.3.4 and get unicode errors: f = os.path.join(u'/home/pingu/phonoripper', u'\xc3\u20ac') os.path.isfile( f)
True os.access(f, os.R_OK)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeEr ror: 'ascii' codec can't encode characters in position
24-25: ordinal not in range(128) f = f.encode('iso-8859-15') os.access(f, os.R_OK)
True
Thanks for the feedback
Michael
klappnase wrote: I am using python-2.3.4 and get unicode errors:
f = os.path.join(u'/home/pingu/phonoripper', u'\xc3\u20ac') os.path.isf ile(f) True os.access(f , os.R_OK)
Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeEr ror: 'ascii' codec can't encode characters in position 24-25: ordinal not in range(128)
That's apparently a bug in os.access, which doesn't support Unicode file
names. As a work around, do
def access(name, mode, orig=os.access) :
try:
return orig(name, mode)
except UnicodeError:
return orig(name.encod e(sys.getfilesy stemencoding(), mode))
os.access=acces s
Apparently, access is used so rarely that nobody has noticed yet (or
didn't bother to report). os.path.isfile( ) builds on os.stat(), which
does support Unicode file names.
Regards,
Martin
"Martin v. Löwis" <ma****@v.loewi s.de> wrote in message news:<42******* *************** @news.freenet.d e>... That's apparently a bug in os.access, which doesn't support Unicode file names. As a work around, do
def access(name, mode, orig=os.access) : try: return orig(name, mode) except UnicodeError: return orig(name.encod e(sys.getfilesy stemencoding(), mode)) os.access=acces s
Apparently, access is used so rarely that nobody has noticed yet (or didn't bother to report). os.path.isfile( ) builds on os.stat(), which does support Unicode file names.
Regards, Martin
Ah, thanks!
Now another question arises: you use sys.getfilesyst emencoding() to
encode the
file name, which looks like it's the preferred method. However when I
tried to
find out how this works I got a little confused again (from the
library reference):
getfilesystemen coding()
Return the name of the encoding used to convert Unicode filenames into
system file names, or None if the system default encoding is used. The
result value depends on the operating system:
(...)
* On Unix, the encoding is the user's preference according to the
result of nl_langinfo(COD ESET), or None if the nl_langinfo(COD ESET)
failed.
On my box (mandrake-10.1) sys.getfilesyst emencoding() returns
'ISO-8859-15',
however : locale.nl_langi nfo(locale.CODE SET)
'ANSI_X3.4-1968'
Anyway, my app currently runs with python-2.2 and I would like to keep
it that way if possible, so I wonder which is the preferred
replacement for sys.getfilesyst emencoding() on versions < 2.3 , or in
particular, will the method I use to determine "sysencodin g" I
described in my original post be safe or are there any traps I missed
(it's supposed to run on linux only)?
Thanks and best regards
Michael
Michael:
on my box, (winXP SP2), sys.getfilesyst emencoding() returns 'mbcs'.
If you post your revised solution to this unicode problem, I'd be
delighted to test it on Windows. I'm working on a Tkinter front-end
for Vivian deSmedt's rsync.py and would like to address the issue of
accented characters in folder names.
thanks
Stewart
stewart dot midwinter at gmail dot com This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: ProgDario |
last post by:
HI,
I downloaded and installed the I18N pear package, but the link on the
doc referring to the DB is broken.
Where can I find the I18N DB? Without it I can't make it work!
Thanks in advance.
:)ario
|
by: Logan |
last post by:
Is it possible to tell the wxPython widgets (e.g. file dialogs)
to use another language (instead of English)?
Thanks in advance for any hints!
--
mailto: logan@phreaker(NoSpam).net
|
by: Albretch |
last post by:
.. Can you define the Character Set for particular tables instead of
databases?
. Which DBMSs would let you do that?
. How do you store in a DBMS i18n'ed users' from input, coming over
the web (basically from everywhere) store it and properly serve it
back to users, . . .?
. Can you point me to info on this?
I would preferably use...
|
by: Guido Wesdorp |
last post by:
Hi!
I've just released a JavaScript library to allow internationalizing
JavaScript code and/or to do HTML translation from JavaScript. It's a
first release, and it doesn't have all the features I'm interested in
(e.g. it doesn't support domains, although I don't think that's much of
a problem in most JavaScript applications, and it uses a...
|
by: Laszlo Zsolt Nagy |
last post by:
Hello,
I wonder if there is a standard for making i18n in Python projects. I
have several Python projects that are internationalized. I also have
Python packages with i18n. But it is still not clean to me what is the
recommended way to do it. Currently, I use a module called
'localization.py' with this code:
from i18n_domain import...
| |
by: Darren Davison |
last post by:
Hi,
I have a documentation tool based on Java and XSLT that I want to add i18n
capability to. There are around 8 stylesheets that process a Source
generated by the Java code and some of the static labels across the
stylesheets are the same.
Ideally I'd like to import a set of variables into each template, and
preferably based on an XSLT...
|
by: Alan J. Flavell |
last post by:
OK, I guess I'm about ready to expose this page for public discussion:
http://ppewww.ph.gla.ac.uk/~flavell/charset/i18n-weft.html
Please concentrate on the content. I'm well aware that my old
stylesheet is in need of modernisation, but this isn't the moment
to get sidetracked by that.
If anyone is previewing IE7 (which I am not), they...
|
by: i18n-bounces |
last post by:
Your mail to 'I18n' with the subject
Mail Delivery (failure i18n@mova.org)
Is being held until the list moderator can review it for approval.
The reason it is being held:
Post by non-member to a members-only list
|
by: fyleow |
last post by:
I just spent hours trying to figure out why even after I set my SQL
table attributes to UTF-8 only garbage kept adding into the database.
Apparently you need to execute "SET NAMES 'utf8'" before inserting into
the tables.
Does anyone have experience working with other languages using Django
or Turbogears? I just need to be able to retrieve...
|
by: Donn Ingle |
last post by:
Hi,
I have been going spare looking for a tutorial or howto from my pov as a
total beginner to i18n.
I understand that one must use gettext, but there seems to be no good info
about *how* one uses it.
What command line utilities does one use to:
1. make a .pot file
2. make a .mo file
Are there specific Python aspects to the above, or is it...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it. ...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...
| |