I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing
umlauts do not work as I would expect. Here is my example: a = 'äöü' b = '123' print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)
äöü äöü
123 123
I would expect, that the displayed width of a or b is the same: 5 characters.
I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
print len(a), len(b)
6 3
I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte??? What is the right way
to print strings containing umlauts in a tabular way (same field width)?
Thanks!
--
Joerg Lehmann 4 4269
Upgrading to 2.3 will probablt solve this problem. I am using 2.3 and here
is what I get when I try it. a = 'äöü' len (a)
3
b = '123' print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)
äöü äöü
123 123
"Joerg Lehmann" <jo***********@mail.com> wrote in message
news:91**************************@posting.google.c om... I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings
containing umlauts do not work as I would expect. Here is my example: a = 'äöü' b = '123' print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b) äöü äöü 123 123
I would expect, that the displayed width of a or b is the same: 5
characters. I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3: print len(a), len(b) 6 3
I have tried to set the encoding in site.py to 'latin-1', but it did not
change my results. Is there no way to store umlauts in 1 byte??? What is the
right way to print strings containing umlauts in a tabular way (same field width)?
Thanks! -- Joerg Lehmann
If you work with Unicode strings instead of byte strings in the utf-8
encoding, you'll get the desired results for characters in the german
character set: b = '123' a = u'\344\366\374' print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")
äöü äöü
123 123
However, this isn't good enough in general. For instance, in the
presence of Unicode combining characters, you won't get what you want: u = u'\N{COMBINING DIAERESIS}' a = 'a%so%su%s' % (u,u,u) print a.encode("utf-8")
äöü print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")
äöü äöü
123 123
You'll also run into problems with characters that have "Wide" or
"Ambiguous" East Asian Width properties in Unicode. For example, a = u'\N{FULLWIDTH LATIN SMALL LETTER U}' * 3 print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")
uuu uuu
123 123
Jeff
Joerg Lehmann wrote: I am using Python 2.2.3 (Fedora Core 1). ... I have tried to set the encoding in site.py to 'latin-1', but it did not change my results. Is there no way to store umlauts in 1 byte???
There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.
What is the right way to print strings containing umlauts in a tabular way (same field width)?
As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).
Regards,
Martin
"Martin v. Löwis" <ma****@v.loewis.de> wrote in message news:<c0*************@news.t-online.com>... Joerg Lehmann wrote: I am using Python 2.2.3 (Fedora Core 1). ... I have tried to set the encoding in site.py to 'latin-1', but it did not change my results. Is there no way to store umlauts in 1 byte???
There is, but Fedora Core 1 does not use it. Instead, it uses an encoding where an umlaut character needs two bytes (namely, UTF-8). Changing site.py does not change the way your system represents these characters.
What is the right way to print strings containing umlauts in a tabular way (same field width)?
As Jeff explains: In the specific case, using Unicode strings would help. He is also right that, in general, it is very difficult to find out how many columns a single character uses, as some characters have width 0, and other characters have width 2 (in a mono-spaced terminal; for variable-spaced output, adding space characters to achieve formatting will never work reliably).
Regards, Martin
I have found a fix myself, I'm not sure if this is "the right way",
but it solves my problem:
I changed the settings in /etc/sysconfig/i18ln from UTF-8 to
ISO-8859-1:
LANG="en_US.ISO-8859-1"
SUPPORTED="en_US.ISO-8859-1:en_US:en"
SYSFONT="latarcyrheb-sun16"
This fixed my problem, Umlauts are stored in one byte now.
Thanks for your inspirations.
PS: Installing Python 2.3 (rpm for Fedora from www.python.org) did not
help.
--
Joerg Lehmann This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: F. GEIGER |
last post by:
I'm on WinXP, Python 2.3.
I don't have problems with umlauts (ä, ö, ü and their uppercase instances)
in my wxPython-GUIs, when displayed as static texts. But when filling
controls with text...
|
by: Marcin Ciura |
last post by:
Here is a pre-PEP about print that I wrote recently.
Please let me know what is the community's opinion on it.
Cheers,
Marcin
PEP: XXX
Title: Print Without Intervening Space
Version:...
|
by: Kitkat |
last post by:
Hi, i hope my english is good enough to explain my problem.
Okay,
I have a html-file with a image.
But i don't want to save or print the html-file with the image.
I want to save or print a...
|
by: WJA |
last post by:
A user of one of my databases is having the following problem. When
they open any report in print preview that is formatted for landscape,
it will display in portrait view when first opened. If...
|
by: Dmitri Shvetsov |
last post by:
Hi All,
Who prints RTF files or streams from C#? Can you give a hint or a good
advice? How to do it easier?
Regards,
Dmitri Shvetsov
|
by: Nico Grubert |
last post by:
Hi there,
I wrote a short python script that sends an email using python's email
module and I am using Python 2.3.5.
The problem is, that umlauts are not displayed properly in some email...
|
by: DierkErdmann |
last post by:
Hi !
I know that this topic has been discussed in the past, but I could not
find a working solution for my problem: sorting (lists of) strings
containing special characters like "ä", "ü",......
|
by: Artie |
last post by:
Hi,
I've searched the web but can't find a solution to an apparently
really simple problem.
My app contains an HTML string and I need to be able to invoke the
Print Dialog to print the HTML...
|
by: damonwischik |
last post by:
I'd like to print out a unicode string.
I'm running Python inside Emacs, which understands utf-8, so I want to
force Python to send utf-8 to sys.stdout.
From what I've googled, I think I need...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
| |