473,406 Members | 2,705 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Print formatted Strings with Umlauts

I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing
umlauts do not work as I would expect. Here is my example:
a = 'äöü'
b = '123'
print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b) äöü äöü
123 123

I would expect, that the displayed width of a or b is the same: 5 characters.
I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
print len(a), len(b)

6 3

I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte??? What is the right way
to print strings containing umlauts in a tabular way (same field width)?

Thanks!
--
Joerg Lehmann
Jul 18 '05 #1
4 4269
Upgrading to 2.3 will probablt solve this problem. I am using 2.3 and here
is what I get when I try it.
a = 'äöü'
len (a) 3
b = '123'
print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)

äöü äöü
123 123


"Joerg Lehmann" <jo***********@mail.com> wrote in message
news:91**************************@posting.google.c om...
I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing umlauts do not work as I would expect. Here is my example:
a = 'äöü'
b = '123'
print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b) äöü äöü
123 123

I would expect, that the displayed width of a or b is the same: 5 characters. I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
print len(a), len(b)
6 3

I have tried to set the encoding in site.py to 'latin-1', but it did not

change my results. Is there no way to store umlauts in 1 byte??? What is the right way to print strings containing umlauts in a tabular way (same field width)?

Thanks!
--
Joerg Lehmann

Jul 18 '05 #2
If you work with Unicode strings instead of byte strings in the utf-8
encoding, you'll get the desired results for characters in the german
character set:
b = '123'
a = u'\344\366\374'
print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8") äöü äöü
123 123

However, this isn't good enough in general. For instance, in the
presence of Unicode combining characters, you won't get what you want: u = u'\N{COMBINING DIAERESIS}'
a = 'a%so%su%s' % (u,u,u)
print a.encode("utf-8") äöü print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8") äöü äöü
123 123
You'll also run into problems with characters that have "Wide" or
"Ambiguous" East Asian Width properties in Unicode. For example, a = u'\N{FULLWIDTH LATIN SMALL LETTER U}' * 3
print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")

uuu uuu
123 123

Jeff

Jul 18 '05 #3
Joerg Lehmann wrote:
I am using Python 2.2.3 (Fedora Core 1). ...
I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte???
There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.
What is the right way
to print strings containing umlauts in a tabular way (same field width)?


As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).

Regards,
Martin

Jul 18 '05 #4
"Martin v. Löwis" <ma****@v.loewis.de> wrote in message news:<c0*************@news.t-online.com>...
Joerg Lehmann wrote:
I am using Python 2.2.3 (Fedora Core 1). ...
I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte???


There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.
What is the right way
to print strings containing umlauts in a tabular way (same field width)?


As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).

Regards,
Martin


I have found a fix myself, I'm not sure if this is "the right way",
but it solves my problem:

I changed the settings in /etc/sysconfig/i18ln from UTF-8 to
ISO-8859-1:

LANG="en_US.ISO-8859-1"
SUPPORTED="en_US.ISO-8859-1:en_US:en"
SYSFONT="latarcyrheb-sun16"

This fixed my problem, Umlauts are stored in one byte now.

Thanks for your inspirations.

PS: Installing Python 2.3 (rpm for Fedora from www.python.org) did not
help.
--
Joerg Lehmann
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: F. GEIGER | last post by:
I'm on WinXP, Python 2.3. I don't have problems with umlauts (ä, ö, ü and their uppercase instances) in my wxPython-GUIs, when displayed as static texts. But when filling controls with text...
14
by: Marcin Ciura | last post by:
Here is a pre-PEP about print that I wrote recently. Please let me know what is the community's opinion on it. Cheers, Marcin PEP: XXX Title: Print Without Intervening Space Version:...
2
by: Kitkat | last post by:
Hi, i hope my english is good enough to explain my problem. Okay, I have a html-file with a image. But i don't want to save or print the html-file with the image. I want to save or print a...
1
by: WJA | last post by:
A user of one of my databases is having the following problem. When they open any report in print preview that is formatted for landscape, it will display in portrait view when first opened. If...
2
by: Dmitri Shvetsov | last post by:
Hi All, Who prints RTF files or streams from C#? Can you give a hint or a good advice? How to do it easier? Regards, Dmitri Shvetsov
0
by: Nico Grubert | last post by:
Hi there, I wrote a short python script that sends an email using python's email module and I am using Python 2.3.5. The problem is, that umlauts are not displayed properly in some email...
8
by: DierkErdmann | last post by:
Hi ! I know that this topic has been discussed in the past, but I could not find a working solution for my problem: sorting (lists of) strings containing special characters like "ä", "ü",......
2
by: Artie | last post by:
Hi, I've searched the web but can't find a solution to an apparently really simple problem. My app contains an HTML string and I need to be able to invoke the Print Dialog to print the HTML...
13
by: damonwischik | last post by:
I'd like to print out a unicode string. I'm running Python inside Emacs, which understands utf-8, so I want to force Python to send utf-8 to sys.stdout. From what I've googled, I think I need...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.