By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,621 Members | 1,074 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,621 IT Pros & Developers. It's quick & easy.

Print formatted Strings with Umlauts

P: n/a
I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing
umlauts do not work as I would expect. Here is my example:
a = ''
b = '123'
print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)
123 123

I would expect, that the displayed width of a or b is the same: 5 characters.
I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
print len(a), len(b)

6 3

I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte??? What is the right way
to print strings containing umlauts in a tabular way (same field width)?

Thanks!
--
Joerg Lehmann
Jul 18 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Upgrading to 2.3 will probablt solve this problem. I am using 2.3 and here
is what I get when I try it.
a = ''
len (a) 3
b = '123'
print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)


123 123


"Joerg Lehmann" <jo***********@mail.com> wrote in message
news:91**************************@posting.google.c om...
I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing umlauts do not work as I would expect. Here is my example:
a = ''
b = '123'
print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)
123 123

I would expect, that the displayed width of a or b is the same: 5 characters. I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
print len(a), len(b)
6 3

I have tried to set the encoding in site.py to 'latin-1', but it did not

change my results. Is there no way to store umlauts in 1 byte??? What is the right way to print strings containing umlauts in a tabular way (same field width)?

Thanks!
--
Joerg Lehmann

Jul 18 '05 #2

P: n/a
If you work with Unicode strings instead of byte strings in the utf-8
encoding, you'll get the desired results for characters in the german
character set:
b = '123'
a = u'\344\366\374'
print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8") äöü äöü
123 123

However, this isn't good enough in general. For instance, in the
presence of Unicode combining characters, you won't get what you want: u = u'\N{COMBINING DIAERESIS}'
a = 'a%so%su%s' % (u,u,u)
print a.encode("utf-8") äöü print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8") äöü äöü
123 123
You'll also run into problems with characters that have "Wide" or
"Ambiguous" East Asian Width properties in Unicode. For example, a = u'\N{FULLWIDTH LATIN SMALL LETTER U}' * 3
print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")

uuu uuu
123 123

Jeff

Jul 18 '05 #3

P: n/a
Joerg Lehmann wrote:
I am using Python 2.2.3 (Fedora Core 1). ...
I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte???
There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.
What is the right way
to print strings containing umlauts in a tabular way (same field width)?


As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).

Regards,
Martin

Jul 18 '05 #4

P: n/a
"Martin v. Lwis" <ma****@v.loewis.de> wrote in message news:<c0*************@news.t-online.com>...
Joerg Lehmann wrote:
I am using Python 2.2.3 (Fedora Core 1). ...
I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte???


There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.
What is the right way
to print strings containing umlauts in a tabular way (same field width)?


As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).

Regards,
Martin


I have found a fix myself, I'm not sure if this is "the right way",
but it solves my problem:

I changed the settings in /etc/sysconfig/i18ln from UTF-8 to
ISO-8859-1:

LANG="en_US.ISO-8859-1"
SUPPORTED="en_US.ISO-8859-1:en_US:en"
SYSFONT="latarcyrheb-sun16"

This fixed my problem, Umlauts are stored in one byte now.

Thanks for your inspirations.

PS: Installing Python 2.3 (rpm for Fedora from www.python.org) did not
help.
--
Joerg Lehmann
Jul 18 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.