Print formatted Strings with Umlauts

Joerg Lehmann

I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing
umlauts do not work as I would expect. Here is my example:

a = 'äöü'
b = '123'
print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b) äöü äöü
123 123

I would expect, that the displayed width of a or b is the same: 5 characters.
I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
print len(a), len(b)

6 3

I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte??? What is the right way
to print strings containing umlauts in a tabular way (same field width)?

Thanks!
--
Joerg Lehmann

Jul 18 '05 #1

Subscribe Post Reply

4269

Amy G

Upgrading to 2.3 will probablt solve this problem. I am using 2.3 and here
is what I get when I try it.

a = 'äöü'
len (a) 3
b = '123'
print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)

äöü äöü
123 123

"Joerg Lehmann" <jo***********@mail.com> wrote in message
news:91**************************@posting.google.c om...
I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing umlauts do not work as I would expect. Here is my example:
a = 'äöü'
b = '123'
print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b) äöü äöü
123 123

I would expect, that the displayed width of a or b is the same: 5 characters. I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
print len(a), len(b)
6 3

I have tried to set the encoding in site.py to 'latin-1', but it did not

change my results. Is there no way to store umlauts in 1 byte??? What is the right way to print strings containing umlauts in a tabular way (same field width)?

Thanks!
--
Joerg Lehmann

Jul 18 '05 #2

Jeff Epler

If you work with Unicode strings instead of byte strings in the utf-8
encoding, you'll get the desired results for characters in the german
character set:

b = '123'
a = u'\344\366\374'
print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8") Ã¤Ã¶Ã¼ Ã¤Ã¶Ã¼
123 123

However, this isn't good enough in general. For instance, in the
presence of Unicode combining characters, you won't get what you want: u = u'\N{COMBINING DIAERESIS}'
a = 'a%so%su%s' % (u,u,u)
print a.encode("utf-8") Ã¤Ã¶Ã¼ print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8") Ã¤Ã¶Ã¼ Ã¤Ã¶Ã¼
123 123
You'll also run into problems with characters that have "Wide" or
"Ambiguous" East Asian Width properties in Unicode. For example, a = u'\N{FULLWIDTH LATIN SMALL LETTER U}' * 3
print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")

ï½•ï½•ï½• ï½•ï½•ï½•
123 123

Jeff

Jul 18 '05 #3

Martin v. Löwis

Joerg Lehmann wrote:

I am using Python 2.2.3 (Fedora Core 1). ...
I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte???
There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.
What is the right way
to print strings containing umlauts in a tabular way (same field width)?

As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).

Regards,
Martin

Jul 18 '05 #4

Joerg Lehmann

"Martin v. Löwis" <ma****@v.loewis.de> wrote in message news:<c0*************@news.t-online.com>...

Joerg Lehmann wrote:
I am using Python 2.2.3 (Fedora Core 1). ...
I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte???

There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.
What is the right way
to print strings containing umlauts in a tabular way (same field width)?

As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).

Regards,
Martin

I have found a fix myself, I'm not sure if this is "the right way",
but it solves my problem:

I changed the settings in /etc/sysconfig/i18ln from UTF-8 to
ISO-8859-1:

LANG="en_US.ISO-8859-1"
SUPPORTED="en_US.ISO-8859-1:en_US:en"
SYSFONT="latarcyrheb-sun16"

This fixed my problem, Umlauts are stored in one byte now.

Thanks for your inspirations.

PS: Installing Python 2.3 (rpm for Fedora from www.python.org) did not
help.
--
Joerg Lehmann

Jul 18 '05 #5

Similar topics

Umlauts, encodings, sitecustomize.py

by: F. GEIGER | last post by:

I'm on WinXP, Python 2.3. I don't have problems with umlauts (ä, ö, ü and their uppercase instances) in my wxPython-GUIs, when displayed as static texts. But when filling controls with text...

Python

pre-PEP: Print Without Intervening Space

by: Marcin Ciura | last post by:

Here is a pre-PEP about print that I wrote recently. Please let me know what is the community's opinion on it. Cheers, Marcin PEP: XXX Title: Print Without Intervening Space Version:...

Python

Save or print files in javascript

by: Kitkat | last post by:

Hi, i hope my english is good enough to explain my problem. Okay, I have a html-file with a image. But i don't want to save or print the html-file with the image. I want to save or print a...

Javascript

Strange Print Preview Behaviour

by: WJA | last post by:

A user of one of my databases is having the following problem. When they open any report in print preview that is formatted for landscape, it will display in portrait view when first opened. If...

Microsoft Access / VBA

RTF print

by: Dmitri Shvetsov | last post by:

Hi All, Who prints RTF files or streams from C#? Can you give a hint or a good advice? How to do it easier? Regards, Dmitri Shvetsov

C# / C Sharp

Python's email module - problem with umlauts in some email clients

by: Nico Grubert | last post by:

Hi there, I wrote a short python script that sends an email using python's email module and I am using Python 2.3.5. The problem is, that umlauts are not displayed properly in some email...

Python

Sorting strings containing special characters (german 'Umlaute')

by: DierkErdmann | last post by:

Hi ! I know that this topic has been discussed in the past, but I could not find a working solution for my problem: sorting (lists of) strings containing special characters like "ä", "ü",......

Python

Use PrintDialog to print out formatted HTML

by: Artie | last post by:

Hi, I've searched the web but can't find a solution to an apparently really simple problem. My app contains an HTML string and I need to be able to invoke the Print Dialog to print the HTML...

C# / C Sharp

How to print a unicode string?

by: damonwischik | last post by:

I'd like to print out a unicode string. I'm running Python inside Emacs, which understands utf-8, so I want to force Python to send utf-8 to sys.stdout. From what I've googled, I think I need...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA