472,796 Members | 1,507 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,796 software developers and data experts.

Print formatted Strings with Umlauts

I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing
umlauts do not work as I would expect. Here is my example:
a = 'äöü'
b = '123'
print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b) äöü äöü
123 123

I would expect, that the displayed width of a or b is the same: 5 characters.
I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
print len(a), len(b)

6 3

I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte??? What is the right way
to print strings containing umlauts in a tabular way (same field width)?

Thanks!
--
Joerg Lehmann
Jul 18 '05 #1
4 4221
Upgrading to 2.3 will probablt solve this problem. I am using 2.3 and here
is what I get when I try it.
a = 'äöü'
len (a) 3
b = '123'
print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)

äöü äöü
123 123


"Joerg Lehmann" <jo***********@mail.com> wrote in message
news:91**************************@posting.google.c om...
I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing umlauts do not work as I would expect. Here is my example:
a = 'äöü'
b = '123'
print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b) äöü äöü
123 123

I would expect, that the displayed width of a or b is the same: 5 characters. I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
print len(a), len(b)
6 3

I have tried to set the encoding in site.py to 'latin-1', but it did not

change my results. Is there no way to store umlauts in 1 byte??? What is the right way to print strings containing umlauts in a tabular way (same field width)?

Thanks!
--
Joerg Lehmann

Jul 18 '05 #2
If you work with Unicode strings instead of byte strings in the utf-8
encoding, you'll get the desired results for characters in the german
character set:
b = '123'
a = u'\344\366\374'
print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8") äöü äöü
123 123

However, this isn't good enough in general. For instance, in the
presence of Unicode combining characters, you won't get what you want: u = u'\N{COMBINING DIAERESIS}'
a = 'a%so%su%s' % (u,u,u)
print a.encode("utf-8") äöü print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8") äöü äöü
123 123
You'll also run into problems with characters that have "Wide" or
"Ambiguous" East Asian Width properties in Unicode. For example, a = u'\N{FULLWIDTH LATIN SMALL LETTER U}' * 3
print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")

uuu uuu
123 123

Jeff

Jul 18 '05 #3
Joerg Lehmann wrote:
I am using Python 2.2.3 (Fedora Core 1). ...
I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte???
There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.
What is the right way
to print strings containing umlauts in a tabular way (same field width)?


As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).

Regards,
Martin

Jul 18 '05 #4
"Martin v. Löwis" <ma****@v.loewis.de> wrote in message news:<c0*************@news.t-online.com>...
Joerg Lehmann wrote:
I am using Python 2.2.3 (Fedora Core 1). ...
I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte???


There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.
What is the right way
to print strings containing umlauts in a tabular way (same field width)?


As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).

Regards,
Martin


I have found a fix myself, I'm not sure if this is "the right way",
but it solves my problem:

I changed the settings in /etc/sysconfig/i18ln from UTF-8 to
ISO-8859-1:

LANG="en_US.ISO-8859-1"
SUPPORTED="en_US.ISO-8859-1:en_US:en"
SYSFONT="latarcyrheb-sun16"

This fixed my problem, Umlauts are stored in one byte now.

Thanks for your inspirations.

PS: Installing Python 2.3 (rpm for Fedora from www.python.org) did not
help.
--
Joerg Lehmann
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: F. GEIGER | last post by:
I'm on WinXP, Python 2.3. I don't have problems with umlauts (ä, ö, ü and their uppercase instances) in my wxPython-GUIs, when displayed as static texts. But when filling controls with text...
14
by: Marcin Ciura | last post by:
Here is a pre-PEP about print that I wrote recently. Please let me know what is the community's opinion on it. Cheers, Marcin PEP: XXX Title: Print Without Intervening Space Version:...
2
by: Kitkat | last post by:
Hi, i hope my english is good enough to explain my problem. Okay, I have a html-file with a image. But i don't want to save or print the html-file with the image. I want to save or print a...
1
by: WJA | last post by:
A user of one of my databases is having the following problem. When they open any report in print preview that is formatted for landscape, it will display in portrait view when first opened. If...
2
by: Dmitri Shvetsov | last post by:
Hi All, Who prints RTF files or streams from C#? Can you give a hint or a good advice? How to do it easier? Regards, Dmitri Shvetsov
0
by: Nico Grubert | last post by:
Hi there, I wrote a short python script that sends an email using python's email module and I am using Python 2.3.5. The problem is, that umlauts are not displayed properly in some email...
8
by: DierkErdmann | last post by:
Hi ! I know that this topic has been discussed in the past, but I could not find a working solution for my problem: sorting (lists of) strings containing special characters like "ä", "ü",......
2
by: Artie | last post by:
Hi, I've searched the web but can't find a solution to an apparently really simple problem. My app contains an HTML string and I need to be able to invoke the Print Dialog to print the HTML...
13
by: damonwischik | last post by:
I'd like to print out a unicode string. I'm running Python inside Emacs, which understands utf-8, so I want to force Python to send utf-8 to sys.stdout. From what I've googled, I think I need...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: erikbower65 | last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps: 1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal. 2. Connect to...
0
linyimin
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
0
by: erikbower65 | last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA: 1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
0
by: Taofi | last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same This are my field names ID, Budgeted, Actual, Status and Differences ...
14
DJRhino1175
by: DJRhino1175 | last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this - If...
5
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.