I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing
umlauts do not work as I would expect. Here is my example: a = 'äöü' b = '123' print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)
äöü äöü
123 123
I would expect, that the displayed width of a or b is the same: 5 characters.
I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
print len(a), len(b)
6 3
I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte??? What is the right way
to print strings containing umlauts in a tabular way (same field width)?
Thanks!
--
Joerg Lehmann 4 3921
Upgrading to 2.3 will probablt solve this problem. I am using 2.3 and here
is what I get when I try it. a = 'äöü' len (a)
3
b = '123' print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)
äöü äöü
123 123
"Joerg Lehmann" <jo***********@mail.com> wrote in message
news:91**************************@posting.google.c om... I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings
containing umlauts do not work as I would expect. Here is my example: a = 'äöü' b = '123' print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b) äöü äöü 123 123
I would expect, that the displayed width of a or b is the same: 5
characters. I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3: print len(a), len(b) 6 3
I have tried to set the encoding in site.py to 'latin-1', but it did not
change my results. Is there no way to store umlauts in 1 byte??? What is the
right way to print strings containing umlauts in a tabular way (same field width)?
Thanks! -- Joerg Lehmann
If you work with Unicode strings instead of byte strings in the utf-8
encoding, you'll get the desired results for characters in the german
character set: b = '123' a = u'\344\366\374' print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")
äöü äöü
123 123
However, this isn't good enough in general. For instance, in the
presence of Unicode combining characters, you won't get what you want: u = u'\N{COMBINING DIAERESIS}' a = 'a%so%su%s' % (u,u,u) print a.encode("utf-8")
äöü print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")
äöü äöü
123 123
You'll also run into problems with characters that have "Wide" or
"Ambiguous" East Asian Width properties in Unicode. For example, a = u'\N{FULLWIDTH LATIN SMALL LETTER U}' * 3 print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")
uuu uuu
123 123
Jeff
Joerg Lehmann wrote: I am using Python 2.2.3 (Fedora Core 1). ... I have tried to set the encoding in site.py to 'latin-1', but it did not change my results. Is there no way to store umlauts in 1 byte???
There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.
What is the right way to print strings containing umlauts in a tabular way (same field width)?
As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).
Regards,
Martin
"Martin v. Löwis" <ma****@v.loewis.de> wrote in message news:<c0*************@news.t-online.com>... Joerg Lehmann wrote: I am using Python 2.2.3 (Fedora Core 1). ... I have tried to set the encoding in site.py to 'latin-1', but it did not change my results. Is there no way to store umlauts in 1 byte???
There is, but Fedora Core 1 does not use it. Instead, it uses an encoding where an umlaut character needs two bytes (namely, UTF-8). Changing site.py does not change the way your system represents these characters.
What is the right way to print strings containing umlauts in a tabular way (same field width)?
As Jeff explains: In the specific case, using Unicode strings would help. He is also right that, in general, it is very difficult to find out how many columns a single character uses, as some characters have width 0, and other characters have width 2 (in a mono-spaced terminal; for variable-spaced output, adding space characters to achieve formatting will never work reliably).
Regards, Martin
I have found a fix myself, I'm not sure if this is "the right way",
but it solves my problem:
I changed the settings in /etc/sysconfig/i18ln from UTF-8 to
ISO-8859-1:
LANG="en_US.ISO-8859-1"
SUPPORTED="en_US.ISO-8859-1:en_US:en"
SYSFONT="latarcyrheb-sun16"
This fixed my problem, Umlauts are stored in one byte now.
Thanks for your inspirations.
PS: Installing Python 2.3 (rpm for Fedora from www.python.org) did not
help.
--
Joerg Lehmann This discussion thread is closed Replies have been disabled for this discussion. Similar topics
5 posts
views
Thread by F. GEIGER |
last post: by
|
14 posts
views
Thread by Marcin Ciura |
last post: by
|
2 posts
views
Thread by Kitkat |
last post: by
|
1 post
views
Thread by WJA |
last post: by
|
2 posts
views
Thread by Dmitri Shvetsov |
last post: by
|
reply
views
Thread by Nico Grubert |
last post: by
|
8 posts
views
Thread by DierkErdmann |
last post: by
|
2 posts
views
Thread by Artie |
last post: by
|
13 posts
views
Thread by damonwischik |
last post: by
| | | | | | | | | | |