473,625 Members | 2,662 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Umlauts, encodings, sitecustomize.p y

I'm on WinXP, Python 2.3.

I don't have problems with umlauts (ä, ö, ü and their uppercase instances)
in my wxPython-GUIs, when displayed as static texts. But when filling
controls with text containing umlauts, or in the Python console, or when
writing to files umlauts are escaped:

Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on
win32
Type "help", "copyright" , "credits" or "license" for more information.
"ä" '\x84'


I have defined a sitecustomize.p y with these lines in it

import sys
sys.setdefaulte ncoding("iso-8859-1")

What else do I have to adjust?

Kind regards
Franz GEIGER


Jul 18 '05 #1
5 2926
F. GEIGER wrote:
I'm on WinXP, Python 2.3.

I don't have problems with umlauts (ä, ö, ü and their uppercase instances)
in my wxPython-GUIs, when displayed as static texts. But when filling
controls with text containing umlauts, or in the Python console, or when
writing to files umlauts are escaped:

Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on
win32
Type "help", "copyright" , "credits" or "license" for more information.
"ä"


'\x84'
I have defined a sitecustomize.p y with these lines in it

import sys
sys.setdefaulte ncoding("iso-8859-1")

What else do I have to adjust?


Try the line
# _*_ coding: latin1 _*_

at the very beginning (or at least after a #! line on Unix)
This works under Linux, at least.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
Jul 18 '05 #2

No matter what you do, you won't change this behavior:
chr(0x84) '\x84'

str.__repr__ always escapes characters in the range 0..31 and 127..255,
no matter what the locale is.
print chr(0x84)

will behave differently (it will write that byte to standard output,
followed by a newline)

You should note that chr(0x84) is *not* a-umlaut in iso-8859-1. That's
chr(0xe4). You may be using one of these Windows-specific encodings:
cp437.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp775.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp850.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp852.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp857.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp861.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp865.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBkPXwJd0 1MZaTXX0RAlH2AJ 9QlAt7j8TDvMxcy 4SrOyZIoTj1KgCd HZT2
7x1JhTR0w8+1zIa hIHhNMDc=
=W5ff
-----END PGP SIGNATURE-----

Jul 18 '05 #3
"Jeff Epler" <je****@unpytho nic.net> schrieb im Newsbeitrag
news:ma******** *************** *************** @python.org...
You should note that chr(0x84) is *not* a-umlaut in iso-8859-1. That's chr(0xe4). You may be using one of these Windows-specific encodings: cp437.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp775.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp850.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp852.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp857.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp861.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp865.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS


I'm not sure what you mean by this. Do mean I use one of these
accidentially? Or should I switch to one of these in my sitecutsomize.p y?

I'm a bit confused. When I let Python print an ä (umlaut a) by simply
entering the 1-char string "ä", it prints '\x84'. When I let a tiny script
print the umlauts, I get:

sys:1: DeprecationWarn ing: Non-ASCII character '\xe4' in file
D:\Project\SchA G\Programme.Pyt hon\test.py on line 1, but no encoding
declared;
see http://www.python.org/peps/pep-0263.html for details
These are Umlauts: õ÷³ and ?Í?.
These are Umlauts: ?Í? and õ÷³.
Press any key to exit...

There's the '\xe4' you are missing.
Thanks and kind regards
Franz GEIGER

P.S.: Do you know a site, where this whole matter is explained somehow?

P.P.S.: The script:

print "These are Umlauts: äöü and ÄÖÜ. "
s = "These are Umlauts: ÄÖÜ and äöü. "
print s
raw_input("Pres s any key to exit...")

Jul 18 '05 #4

"Helmut Jarausch" <ja******@skyne t.be> schrieb im Newsbeitrag
news:41******** ******@skynet.b e...
Try the line
# _*_ coding: latin1 _*_

at the very beginning (or at least after a #! line on Unix)
This works under Linux, at least.


Thank you, Helmut, I had already added
# -*- coding: iso-8859-1 -*-
to the scripts in question.

Kind regards
Franz GEIGER
Jul 18 '05 #5
On Tue, Nov 09, 2004 at 07:52:58PM +0100, F. GEIGER wrote:
"Jeff Epler" <je****@unpytho nic.net> schrieb im Newsbeitrag
news:ma******** *************** *************** @python.org...
You should note that chr(0x84) is *not* a-umlaut in iso-8859-1. That's

chr(0xe4). You may be using one of these Windows-specific encodings:
cp437.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp775.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp850.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp852.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp857.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp861.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
cp865.py: 0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS


I'm not sure what you mean by this. Do mean I use one of these
accidentially? Or should I switch to one of these in my sitecutsomize.p y?

I'm a bit confused. When I let Python print an � (umlaut a) by simply
entering the 1-char string "�", it prints '\x84'.


In the encoding iso-8859-1, the character chr(0xe4) is LATIN SMALL
LETTER A WITH DIAERESIS. chr(0x84) is not a printable character.

In the encodings I named above, chr(0x84) is LATIN SMALL LETTER A WITH
DIAERESIS.

Now, consider this program that creates a program:
def maker(filename, encoding, ch):
f = open(filename, "w")
f.write("# -*- coding: %s -*-\n" % encoding)
f.write("print '%s'\n" % ch)
if you call
maker("coded.py ", "iso-8859-1", "\xe4")
the created script will contain a byte string literal with the byte
'\xe4' in it. When you run the script, it will print that byte followed
by the byte '\n'. *In fact, this behavior (sequence of bytes written to
sys.stdout) doesn't depend on encoding, as long as
'\xe4'.decode(e ncoding).encode (encoding) == '\xe4'
which should hold true in almost all single-byte encodings.*

What you *see* when you run the script depends on the meaning your
terminal window ("DOS box") assigns to the byte sequence '\xe4\n'. On
mine, which expects output in UTF-8, I get a mark which indicates an
incomplete multi-byte character and then a newline. On yours, you
apparently get some other character, possibly LATIN SMALL LETTER O WITH
TILDE if your terminal uses cp770, cp850, or cp857.

Now, consider this program with a u''-string literal:
def umaker(filename , encoding, ch):
f = open(filename, "w")
f.write("# -*- coding: %s -*-\n" % encoding)
f.write("print u'%s'\n" % ch)
If you call
umaker("ucoded. py", "iso-8859-1", "\xe4")
the created script will again contain the literal byte "\xe4". When you
run the script, you may get an error that says
UnicodeError: ASCII encoding error: ordinal not in range(128)
this is because the string to be printed is a unicode string containing
the letter LATIN SMALL LETTER A WITH DIAERESIS, but Python believes the
terminal can only accept ASCII-encoded strings for display. In my
Python 2.3 on Unix, sys.stdout.enco ding is "UTF-8", and running
ucoded.py outputs the 3 byte sequence "\303\244\n ", which in UTF-8 is a
LATIN SMALL LETTER A WITH DIAERESIS followed by a carriage return.

I suspect that wxpython is like tkinter: It is designed so that
u''-strings (unicode strings) can be given as arguments anywhere strings
can, and that internally the necessary steps are taken to find the
proper glyphs in the font to display that string. Otherwise, there may
be a particular encoding assumed for all byte strings, which will have
no relationship to the -*- coding -*- of your scripts.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBkSOeJd0 1MZaTXX0RAh43AJ 9VpG9OSuU9KoyGh 99sByzaaAEx+gCf YYUl
4SS1dlgoIfe4W2o PQ4R488E=
=ekFI
-----END PGP SIGNATURE-----

Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
12068
by: Tobias | last post by:
Hi! I had Apache 2.0.47 and PHP 4.3.0 DEV running successfully on a W2k Server. For some reason, I couldn't get PHP to read XML-Attributes with the DOM XML -functions. So I thought, it would be time to update PHP to a newer version. So I simply replaces the old php-files with the new ones and of course kept the php.ini. But now, PHP doesn't work anymore. As soon as I request a page with php code, I get a "document contains no data"...
3
5046
by: Markus Weber | last post by:
Hi, we use htmlMimeMail-2.5.1 (http://www.phpguru.org/mime.mail.html) to send mails. If I send an e-mail with the subject "Das Öl - Öl Öl - Ö Ä Ü ß - test test" I will receive an e-mail with the subject "Das Öl - ÖlÖl - ÖÄÜß - test test". Some spaces have been removed.
4
4292
by: Joerg Lehmann | last post by:
I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing umlauts do not work as I would expect. Here is my example: >>> a = 'äöü' >>> b = '123' >>> print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b) äöü äöü 123 123 I would expect, that the displayed width of a or b is the same: 5 characters.
1
2213
by: Moritz Beller | last post by:
Hello! Given an array of chars such as char array = "Umlauts öäü" (that is definitely able to handle those special characters) a conversion to string returns in getting ripped of the special characters öäü. Obviously, this is something I don't wanna see happen, but otherwise, I don't even have a clue on how to avoid it. Any help urgently required ;-)
6
5606
by: peter pilsl | last post by:
postgres 7.3.2 I store unicode-data in postgresql. The data is retrieved via webinterfaces, processed with perl and then stored in postgresql (and viceversa). All is going nice with one problem. If performing a "select * order by field"-query the result is not what I expected. German umlauts (ie: Ö) are stored as doublechars ("Ö" is "Ö") and only the first char seems to be taken into account when sorting.
3
2403
by: Axel Dahmen | last post by:
Hi, I've created an aspx page taking a query string argument as parameter. Although I've correctly escaped umlauts in the query string, they do not appear in the QueryString collection. If I give: http://myComp/myPage.aspx?q=G%FCnther Request.QueryString yields "Gnther" instead of "Günther". What's happening?
6
2025
by: Raphael.Benedet | last post by:
Hello, For my application, I would like to execute an SQL query like this: self.dbCursor.execute("INSERT INTO track (name, nbr, idartist, idalbum, path) VALUES ('%s', %s, %s, %s, '%s')" % (track, nbr, idartist, idalbum, path)) where the different variables are returned by the libtagedit python bindings as Unicode. Every time I execute this, I get an exception like this:
1
3969
by: Roberto Rocco | last post by:
Hello, I'm using VS 2005 and I need to send a mail body which contains german umlauts (ä,ö,ü). When I receive the mail in Outlook 2003 (english operating system) I always get a '|' or other similar symbols instead of the expected umlaut. I'm using Sstem.Net.Mail and I already tried something like this: string body = "this is some sample äöüÄÖÜß HTML text";
0
1626
by: Nico Grubert | last post by:
Hi there, I wrote a short python script that sends an email using python's email module and I am using Python 2.3.5. The problem is, that umlauts are not displayed properly in some email clients: + On a windows machine running thunderbird 1.0.2 umlauts are displayed properly. The email header contains "Content-type: text/plain; charset=utf-8"
0
8251
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8182
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8688
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8635
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
6115
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4085
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4188
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1800
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1496
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.