I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing
umlauts do not work as I would expect. Here is my example: a = 'äöü' b = '123' print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)
äöü äöü
123 123
I would expect, that the displayed width of a or b is the same: 5 characters.
I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
print len(a), len(b)
6 3
I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte??? What is the right way
to print strings containing umlauts in a tabular way (same field width)?
Thanks!
--
Joerg Lehmann 4 4221
Upgrading to 2.3 will probablt solve this problem. I am using 2.3 and here
is what I get when I try it. a = 'äöü' len (a)
3
b = '123' print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)
äöü äöü
123 123
"Joerg Lehmann" <jo***********@mail.com> wrote in message
news:91**************************@posting.google.c om... I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings
containing umlauts do not work as I would expect. Here is my example: a = 'äöü' b = '123' print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b) äöü äöü 123 123
I would expect, that the displayed width of a or b is the same: 5
characters. I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3: print len(a), len(b) 6 3
I have tried to set the encoding in site.py to 'latin-1', but it did not
change my results. Is there no way to store umlauts in 1 byte??? What is the
right way to print strings containing umlauts in a tabular way (same field width)?
Thanks! -- Joerg Lehmann
If you work with Unicode strings instead of byte strings in the utf-8
encoding, you'll get the desired results for characters in the german
character set: b = '123' a = u'\344\366\374' print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")
äöü äöü
123 123
However, this isn't good enough in general. For instance, in the
presence of Unicode combining characters, you won't get what you want: u = u'\N{COMBINING DIAERESIS}' a = 'a%so%su%s' % (u,u,u) print a.encode("utf-8")
äöü print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")
äöü äöü
123 123
You'll also run into problems with characters that have "Wide" or
"Ambiguous" East Asian Width properties in Unicode. For example, a = u'\N{FULLWIDTH LATIN SMALL LETTER U}' * 3 print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")
uuu uuu
123 123
Jeff
Joerg Lehmann wrote: I am using Python 2.2.3 (Fedora Core 1). ... I have tried to set the encoding in site.py to 'latin-1', but it did not change my results. Is there no way to store umlauts in 1 byte???
There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.
What is the right way to print strings containing umlauts in a tabular way (same field width)?
As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).
Regards,
Martin
"Martin v. Löwis" <ma****@v.loewis.de> wrote in message news:<c0*************@news.t-online.com>... Joerg Lehmann wrote: I am using Python 2.2.3 (Fedora Core 1). ... I have tried to set the encoding in site.py to 'latin-1', but it did not change my results. Is there no way to store umlauts in 1 byte???
There is, but Fedora Core 1 does not use it. Instead, it uses an encoding where an umlaut character needs two bytes (namely, UTF-8). Changing site.py does not change the way your system represents these characters.
What is the right way to print strings containing umlauts in a tabular way (same field width)?
As Jeff explains: In the specific case, using Unicode strings would help. He is also right that, in general, it is very difficult to find out how many columns a single character uses, as some characters have width 0, and other characters have width 2 (in a mono-spaced terminal; for variable-spaced output, adding space characters to achieve formatting will never work reliably).
Regards, Martin
I have found a fix myself, I'm not sure if this is "the right way",
but it solves my problem:
I changed the settings in /etc/sysconfig/i18ln from UTF-8 to
ISO-8859-1:
LANG="en_US.ISO-8859-1"
SUPPORTED="en_US.ISO-8859-1:en_US:en"
SYSFONT="latarcyrheb-sun16"
This fixed my problem, Umlauts are stored in one byte now.
Thanks for your inspirations.
PS: Installing Python 2.3 (rpm for Fedora from www.python.org) did not
help.
--
Joerg Lehmann This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: F. GEIGER |
last post by:
I'm on WinXP, Python 2.3.
I don't have problems with umlauts (ä, ö, ü and their uppercase instances)
in my wxPython-GUIs, when displayed as static texts. But when filling
controls with text...
|
by: Marcin Ciura |
last post by:
Here is a pre-PEP about print that I wrote recently.
Please let me know what is the community's opinion on it.
Cheers,
Marcin
PEP: XXX
Title: Print Without Intervening Space
Version:...
|
by: Kitkat |
last post by:
Hi, i hope my english is good enough to explain my problem.
Okay,
I have a html-file with a image.
But i don't want to save or print the html-file with the image.
I want to save or print a...
|
by: WJA |
last post by:
A user of one of my databases is having the following problem. When
they open any report in print preview that is formatted for landscape,
it will display in portrait view when first opened. If...
|
by: Dmitri Shvetsov |
last post by:
Hi All,
Who prints RTF files or streams from C#? Can you give a hint or a good
advice? How to do it easier?
Regards,
Dmitri Shvetsov
|
by: Nico Grubert |
last post by:
Hi there,
I wrote a short python script that sends an email using python's email
module and I am using Python 2.3.5.
The problem is, that umlauts are not displayed properly in some email...
|
by: DierkErdmann |
last post by:
Hi !
I know that this topic has been discussed in the past, but I could not
find a working solution for my problem: sorting (lists of) strings
containing special characters like "ä", "ü",......
|
by: Artie |
last post by:
Hi,
I've searched the web but can't find a solution to an apparently
really simple problem.
My app contains an HTML string and I need to be able to invoke the
Print Dialog to print the HTML...
|
by: damonwischik |
last post by:
I'd like to print out a unicode string.
I'm running Python inside Emacs, which understands utf-8, so I want to
force Python to send utf-8 to sys.stdout.
From what I've googled, I think I need...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: erikbower65 |
last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps:
1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal.
2. Connect to...
|
by: linyimin |
last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
|
by: erikbower65 |
last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA:
1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
|
by: Taofi |
last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same
This are my field names
ID, Budgeted, Actual, Status and Differences
...
|
by: DJRhino1175 |
last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this -
If...
|
by: DJRhino |
last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer)
If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _
310030356 Or 310030359 Or 310030362 Or...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: Mushico |
last post by:
How to calculate date of retirement from date of birth
| |