Strange problems with encoding

Sebastian Meyer

Hi newsgroup,

i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)

When i work with this, i always get the message
UniCode Error: ASCII decoding error : ordinal not in range(128)

Yes i have googled, i searched the faq, manual and python library and
searched all known soruces of information. I played with the python
builtin function encode to enforce the rigth encoding, but the error
stays the same. I ve read a lot about UniCode and internal conversion
about Strings done by python, but somehow i ve missed the clue.
Nope, python says Huuups... ordinal not in range(128), ;-(

Anyone of you having any idea?? Seems like i am too stupid to read
documentation carefully., perhaps i misunderstand something...

thanks for your help in advance

Sebastian

Jul 18 '05 #1

Subscribe Post Reply

3506

Rudy Schockaert

Sebastian Meyer wrote:

Hi newsgroup,

i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)

When i work with this, i always get the message
UniCode Error: ASCII decoding error : ordinal not in range(128)

Yes i have googled, i searched the faq, manual and python library and
searched all known soruces of information. I played with the python
builtin function encode to enforce the rigth encoding, but the error
stays the same. I ve read a lot about UniCode and internal conversion
about Strings done by python, but somehow i ve missed the clue.
Nope, python says Huuups... ordinal not in range(128), ;-(

Anyone of you having any idea?? Seems like i am too stupid to read
documentation carefully., perhaps i misunderstand something...

thanks for your help in advance

Sebastian

I'm experiencing something similar for the moment. I try to
base64-encode Unicode strings and I get the exact same errormessage.

s = u'ö'
s u'\xf6' s.encode('base64') Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python23\lib\encodings\base64_codec.py", line 24, in
base64_encode
output = base64.encodestring(input)
File "C:\Python23\lib\base64.py", line 39, in encodestring
pieces.append(binascii.b2a_base64(chunk))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 0: ordinal not in range(128)

When I don't specify it's unicode it works: s = 'ö'
s '\xf6' s.encode('base64')

'9g==\n'

The reason I want to base64-encode these unicode strings is because I
get those as input and want to store them in a MySQL database using
SQLObject.

Jul 18 '05 #2

Michael Hudson

"Sebastian Meyer" <s.*****@technology-network.de> writes:

Hi newsgroup,

i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)

1) str is the name of a builtin -- often a bad idea to use that as a
variable name.

2) I presume `str' is a unicode string? Try writing the literal as
u'ö' instead (and adding the appropriate coding cookie to your
source file if using Python 2.3). Or I guess you could write it

u'\N{LATIN SMALL LETTER O WITH DIAERESIS}'

Cheers,
mwh

--
Usenet is like a herd of performing elephants with diarrhea --
massive, difficult to redirect, awe-inspiring, entertaining, and
a source of mind-boggling amounts of excrement when you least
expect it. -- spaf (1992)

Jul 18 '05 #3

Sebastian Meyer

On Thu, 06 Nov 2003 13:39:25 +0000, Michael Hudson wrote:

"Sebastian Meyer" <s.*****@technology-network.de> writes:
Hi newsgroup,

i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)
1) str is the name of a builtin -- often a bad idea to use that as a
variable name.

it was only the example name for the variable, be sure that dont
use any builtins as variable names
maybe not a good example ... thanks for the hint

2) I presume `str' is a unicode string? Try writing the literal as
u'ö' instead (and adding the appropriate coding cookie to your
source file if using Python 2.3). Or I guess you could write it

u'\N{LATIN SMALL LETTER O WITH DIAERESIS}'
i ll try and report back...

Cheers,
mwh

Jul 18 '05 #4

Joe Fromm

"Sebastian Meyer" <s.*****@technology-network.de> wrote in message
news:pa***************************@technology-network.de...

Hi newsgroup,

i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)

When i work with this, i always get the message
UniCode Error: ASCII decoding error : ordinal not in range(128)

Try adding

sys.setdefaultencoding( 'latin-1' )

to your site.py module, or rewrite your fragment as

from = 'ö'
to = 'oe'
s = re.sub( from.encode('latin-1'), to.encode('latin-1', s )

If you are running on Windows you might want to change 'latin-1' to 'mbcs',
as that seems to be the most forgiving codec, but it is Windows only.

Joe

Jul 18 '05 #5

Michael Hudson

Rudy Schockaert <ru*************@pandoraSTOPSPAM.be> writes:

Sebastian Meyer wrote:
Hi newsgroup,
i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)
When i work with this, i always get the message
UniCode Error: ASCII decoding error : ordinal not in range(128)
Yes i have googled, i searched the faq, manual and python library
and
searched all known soruces of information. I played with the python
builtin function encode to enforce the rigth encoding, but the error
stays the same. I ve read a lot about UniCode and internal conversion
about Strings done by python, but somehow i ve missed the clue.
Nope, python says Huuups... ordinal not in range(128), ;-(
Anyone of you having any idea?? Seems like i am too stupid to read
documentation carefully., perhaps i misunderstand something...
thanks for your help in advance
Sebastian
I'm experiencing something similar for the moment. I try to
base64-encode Unicode strings and I get the exact same errormessage.

"base64-encoding Unicode strings" is not a particularly well defined
operation. "base64-encoding" is a way of turning *binary data* into a
particularly "safe" sequence of ascii characters.

Unicode (in some sense) is a family of ways of representing strings of
characters as binary data.

So to base-64 encode a Unicode string, you need to choose *which*
member of this family you're going to use, which is to say the
encoding. UTF-8 would seem a good bet.

But...

>>> s = u'ö'
>>> s u'\xf6' >>> s.encode('base64') Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python23\lib\encodings\base64_codec.py", line 24, in
base64_encode
output = base64.encodestring(input)
File "C:\Python23\lib\base64.py", line 39, in encodestring
pieces.append(binascii.b2a_base64(chunk))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 0: ordinal not in range(128)

u'ö'.encode('utf-8').encode('base64') 'w7Y=\n'
When I don't specify it's unicode it works:
>>> s = 'ö'
>>> s '\xf6' >>> s.encode('base64')
'9g==\n'

Well, this works because your terminal seems to be latin-1:
u'ö'.encode('latin-1').encode('base64')

'9g==\n'

What would you like to do with a character that isn't in latin-1?
The reason I want to base64-encode these unicode strings is because I
get those as input and want to store them in a MySQL database using
SQLObject.

! Why can't you just encode them as utf-8 strings? (Or, thinking
about it, why doesn't SQLObject support unicode?)

Cheers,
mwh

--
I think if we have the choice, I'd rather we didn't explicitly put
flaws in the reST syntax for the sole purpose of not insulting the
almighty. -- /will on the doc-sig

Jul 18 '05 #6

Peter Otten

Sebastian Meyer wrote:

Hi newsgroup,

i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)

When i work with this, i always get the message
UniCode Error: ASCII decoding error : ordinal not in range(128)

Yes i have googled, i searched the faq, manual and python library and
searched all known soruces of information. I played with the python
builtin function encode to enforce the rigth encoding, but the error
stays the same. I ve read a lot about UniCode and internal conversion
about Strings done by python, but somehow i ve missed the clue.
Nope, python says Huuups... ordinal not in range(128), ;-(

Anyone of you having any idea?? Seems like i am too stupid to read
documentation carefully., perhaps i misunderstand something...

thanks for your help in advance

Sebastian

Works here, even with my older snake:

Python 2.2.1 (#1, Sep 10 2002, 17:49:17)
[GCC 3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import re
re.sub("ö", "oe", "Döspaddel") 'Doespaddel' re.sub("ö", "oe", u"Döspaddel") u'Doespaddel' re.sub("ö", u"oe", u"Döspaddel") u'Doespaddel' re.sub(u"ö", u"oe", u"Döspaddel") u'Doespaddel'

To provoke a UnicodeError, I have to convert a unicode string with umlauts
to str without providing the encoding:
str(u"Döspaddel") Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)

I suspect that you have something similar hidden in your code (i. e.
characters >= 128 that are not converted). The remedy is to explicitly
decode with the appropriate encoding:
u"Döspaddel".encode("latin-1") 'D\xf6spaddel'

Try to build a minimal script that shows the reported behaviour and fix it
or post it for more detailed advice. By the way, don't use str as a
variable name, it's the type of "ordinary" strings.

Peter

Jul 18 '05 #7

Rudy Schockaert

Joe Fromm wrote:

Try adding

sys.setdefaultencoding( 'latin-1' )

to your site.py module, or rewrite your fragment as

At the end of site.py you can enable a piece of code that sets your
default encoding to the current locale of your computer:

if 1:
# Enable to support locale aware default string encodings.
import locale
loc = locale.getdefaultlocale()
if loc[1]:
encoding = loc[1]

This works great for me.

Thanks for pointing me to site.py

P.S. I really need some weeks off so I can read all the available
documentation ;-)

Jul 18 '05 #8

Rudy Schockaert

u'ö'.encode('utf-8').encode('base64')

'w7Y=\n'

This works indeed. And thanks to Joe Fromm's hint (site.py) I don't have
to worry about it anymore.
What would you like to do with a character that isn't in latin-1?
Actually, I don't care as long as the encode and decode on the same
machine give me back the original value.
The reason I want to base64-encode these unicode strings is because I
get those as input and want to store them in a MySQL database using
SQLObject.

! Why can't you just encode them as utf-8 strings? (Or, thinking
about it, why doesn't SQLObject support unicode?)

The actual input strings don't really contain unicode text values, but
rather binary values i get as result from calling win32.NetUserEnum.

The manual of SQLObject (great product btw) explains how you can easily
store binary data in a SQL table by encoding it when setting and
decoding it when getting the value. Tha is just what I was trying to do.

Jul 18 '05 #9

Michael Hudson

Rudy Schockaert <ru*************@pandoraSTOPSPAM.be> writes:

>u'ö'.encode('utf-8').encode('base64')

'w7Y=\n'

This works indeed. And thanks to Joe Fromm's hint (site.py) I don't
have to worry about it anymore.

Well, I'm from the setdefaultencoding-is-evil camp, but it sounds like
you're in a pretty icky situation.

What would you like to do with a character that isn't in latin-1?

Actually, I don't care as long as the encode and decode on the same
machine give me back the original value.

Huh?

The reason I want to base64-encode these unicode strings is because I
get those as input and want to store them in a MySQL database using
SQLObject.

! Why can't you just encode them as utf-8 strings? (Or, thinking
about it, why doesn't SQLObject support unicode?)

The actual input strings don't really contain unicode text values, but
rather binary values i get as result from calling win32.NetUserEnum.

Oh, so they're not really unicode strings at all? Blech. That's
really really nasty. Binary data should really be represented as
(narrow) strings in Python. Perhaps the utf-16-le codec would be the
most appropriate...

Cheers,
mwh

--
Q: What are 1000 lawyers at the bottom of the ocean?
A: A good start.
(A lawyer told me this joke.)
-- Michael Ströder, comp.lang.python

Jul 18 '05 #10

Sebastian Meyer

On Thu, 06 Nov 2003 15:10:49 +0100, Sebastian Meyer wrote:

On Thu, 06 Nov 2003 13:39:25 +0000, Michael Hudson wrote:
2) I presume `str' is a unicode string? Try writing the literal as
u'ö' instead (and adding the appropriate coding cookie to your
source file if using Python 2.3). Or I guess you could write it

u'\N{LATIN SMALL LETTER O WITH DIAERESIS}'

i ll try and report back...

okay, i ve solved my problem... it seems that my method which tries
to insert the data i process into the database raises the error. The
data comes from XML files, my derived xml.sax.handler.ContentHandler
returns UniCode encoded data. The database routine tries to
encode the values as ASCII and --**BOOOM**-- ... Exception.

I now replace the special characters by their UniCode Names
eg. u'\N{LATIN SMALL LETTER O WITH DIAERESIS}' (thanks for the hint
michael), now all for works fine... ;-))

thanks for the great help NG

Sebastian

Jul 18 '05 #11

Rudy Schockaert

Michael Hudson wrote:

Well, I'm from the setdefaultencoding-is-evil camp, but it sounds like
you're in a pretty icky situation.
I wasn't even aware there are two camps. What would be the reasons not
to use setdefaultencoding? As I configured it now it uses the systems
locale to set the encoding. I'm using the same machine to retrieve data,
manipulate it and store in a database (on the same machine).
I would like to understand what could be wrong in this case.

Actually, I don't care as long as the encode and decode on the same
machine give me back the original value.

Huh?

What I mean is that I encode the data when I store it in the DB and
decode it when I retrieve the data from the DB. I do this because
SQLObject doesn't support the binary data. As long as the result that
comes back out is exactly the same as it was when it went in, I don't care.

The reason I want to base64-encode these unicode strings is because I
get those as input and want to store them in a MySQL database using
SQLObject.

! Why can't you just encode them as utf-8 strings? (Or, thinking
about it, why doesn't SQLObject support unicode?)

The actual input strings don't really contain unicode text values, but
rather binary values i get as result from calling win32.NetUserEnum.

Oh, so they're not really unicode strings at all? Blech. That's
really really nasty. Binary data should really be represented as
(narrow) strings in Python.

I'm just doing it the easy way, I guess. I get the data from the win32
call as Unicode data, even when it contains binary data. Perhaps that I
will transform this data in a later phase to more usefull format, but
that'll depend on the need.

Perhaps the utf-16-le codec would be the most appropriate...

This is really not my thing. I noticed that on my system the encoding is
now set to cp1252. What would be the difference if I switched to utf-16-le?

Thanks for your explanation.

Rudy

Jul 18 '05 #12

Fredrik Lundh

Rudy Schockaert wrote:

At the end of site.py you can enable a piece of code that sets your
default encoding to the current locale of your computer:

if 1:
# Enable to support locale aware default string encodings.
import locale
loc = locale.getdefaultlocale()
if loc[1]:
encoding = loc[1]

This works great for me.
instead of hacking your Python installation, I suggest using
explicit calls to the "encode" method wherever you need to
convert from Unicode to binary data on the way out.
P.S. I really need some weeks off so I can read all the available
documentation ;-)

it shouldn't take you more than 15-20 minutes to learn enough
about Unicode to be able to write Python code that processes
non-ASCII text in a reliable and portable way:

short version:
http://effbot.org/zone/unicode-objects.htm

long version:
http://www.joelonsoftware.com/articles/Unicode.html

</F>

Jul 18 '05 #13

Rudy Schockaert

>>P.S. I really need some weeks off so I can read all the available

documentation ;-)

it shouldn't take you more than 15-20 minutes to learn enough
about Unicode to be able to write Python code that processes
non-ASCII text in a reliable and portable way:

short version:
http://effbot.org/zone/unicode-objects.htm

long version:
http://www.joelonsoftware.com/articles/Unicode.html

</F>

I wasn't referring to Unicode ;-) but to the existance of site.py .
There still is so much I have to learn about python that I will need
those weeks badly. I only got halfway in Alex' Python in a Nutshell
(splendid book btw) which I already have since Europython :-(

Jul 18 '05 #14

Martin v. Löwis

Rudy Schockaert <ru*************@pandoraSTOPSPAM.be> writes:

I wasn't even aware there are two camps. What would be the reasons not
to use setdefaultencoding?
You lose portability (more correctly: you get a false sense of
portability). If you have write an application that requires the
default encoding to be FOO-1, the application may work fine on system
A, and fail on system B. Telling the operator of system B to change
her default encoding may cause breakage of a different application on
system B, as B has BAR-2 as the default encoding; changing it to FOO-1
would break applications that require it to be BAR-2.

IOW, if you require conversions between Unicode and byte strings,
explicitly do them in your code. Explicit is better than implicit.
As I configured it now it uses the systems locale to set the
encoding. I'm using the same machine to retrieve data, manipulate it
and store in a database (on the same machine). I would like to
understand what could be wrong in this case.
If the next user logs in on the same system, and has a different
locale set, that user will misinterpret the data you have created.
What I mean is that I encode the data when I store it in the DB and
decode it when I retrieve the data from the DB. I do this because
SQLObject doesn't support the binary data. As long as the result that
comes back out is exactly the same as it was when it went in, I don't
care.

Then you should *define* an encoding that your application uses,
e.g. UTF-8, and use that encoding throughout whereever required,
instead of having the administrator to ask to change a system setting.

Regards,
Martin

Jul 18 '05 #15

by: Thomas | last post by:

Hi, I implemented a composite pattern which should be serializable to xml. After spending some time in the newsgroups, i finally managed serializing, even with utf-8 instead of utf-16, which...

.NET Framework

instead of actual value get strange charachers on dispaying the XML tag via ASPX web page

by: | last post by:

hi, I have got XML tag <fo:block font-weight="bold" font-size="13pt"><!]></fo:block>. Problem is when i gives xmlURL from c# to InputStreamReader in J# code with the aspx web page as XML which...

C# / C Sharp

UTF-8 preamble -> Possible bug in StreamWriter(or at least strange behaviour..)

by: Oscar Thornell | last post by:

Hi, I generate and temporary saves a text file to disk. Later I upload this file to Microsoft MapPoint (not so important). The file needs to be in UTF-8 encoding and I explicitly use the...

C# / C Sharp

Custom web control problems in the designer pane (simple examples)

by: Dales | last post by:

I have a custom control that builds what we refer to as "Formlets" around some content in a page. These are basically content "wrapper" sections that are tables that have a colored header and...

ASP.NET

Emails seemingly disappearing.. very strange problem

by: Chris Ashley | last post by:

I have been tearing my hair out (or indeed, what's left of it) all day with this one. I'm not sure if it's a .NET issue, a server issue or anything else and would appreciate any guidance. ...

C# / C Sharp

Strange behaviour with UTF-8 encoding

by: liam_weston | last post by:

I have 2 supposedly identical Windows 2000 web servers each with IIS5. Both have the ASPCODEPAGE set to 65001 (utf-8) in the metabase. The first server has been running pages like the one below...

ASP / Active Server Pages

Strange behavior when closing stream

by: John Kraft | last post by:

Hello all, I'm experiencing some, imo, strange behavior with the StreamReader object I am using in the code below. Summary is that I am downloading a file from a website and saving it to disk...

C# / C Sharp

XML encoding problems when storing to sql 2005 database.

by: Bexm | last post by:

Hello I have searched through this forum and it seems some people are having similar problems to me but none of the fixes are fixing mine..! :( I have a table in my database that has two xml...

.NET Framework

Strange IE7 behaviour (with embedded Flash)

by: ioni | last post by:

Good day, fellows! I have a strange problem – at my site there is a flash strip, that loads data dynamically. It works fine (grabs data from the remote server and presents it), however in IE7...

HTML / CSS

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Strange problems with encoding

Similar topics