473,722 Members | 2,397 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Strange problems with encoding

Hi newsgroup,

i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)

When i work with this, i always get the message
UniCode Error: ASCII decoding error : ordinal not in range(128)

Yes i have googled, i searched the faq, manual and python library and
searched all known soruces of information. I played with the python
builtin function encode to enforce the rigth encoding, but the error
stays the same. I ve read a lot about UniCode and internal conversion
about Strings done by python, but somehow i ve missed the clue.
Nope, python says Huuups... ordinal not in range(128), ;-(

Anyone of you having any idea?? Seems like i am too stupid to read
documentation carefully., perhaps i misunderstand something...

thanks for your help in advance

Sebastian
Jul 18 '05 #1
14 3529
Sebastian Meyer wrote:
Hi newsgroup,

i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)

When i work with this, i always get the message
UniCode Error: ASCII decoding error : ordinal not in range(128)

Yes i have googled, i searched the faq, manual and python library and
searched all known soruces of information. I played with the python
builtin function encode to enforce the rigth encoding, but the error
stays the same. I ve read a lot about UniCode and internal conversion
about Strings done by python, but somehow i ve missed the clue.
Nope, python says Huuups... ordinal not in range(128), ;-(

Anyone of you having any idea?? Seems like i am too stupid to read
documentation carefully., perhaps i misunderstand something...

thanks for your help in advance

Sebastian


I'm experiencing something similar for the moment. I try to
base64-encode Unicode strings and I get the exact same errormessage.
s = u'ö'
s u'\xf6' s.encode('base6 4') Traceback (most recent call last):
File "<interacti ve input>", line 1, in ?
File "C:\Python23\li b\encodings\bas e64_codec.py", line 24, in
base64_encode
output = base64.encodest ring(input)
File "C:\Python23\li b\base64.py", line 39, in encodestring
pieces.append(b inascii.b2a_bas e64(chunk))
UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xe9' in
position 0: ordinal not in range(128)

When I don't specify it's unicode it works: s = 'ö'
s '\xf6' s.encode('base6 4')

'9g==\n'

The reason I want to base64-encode these unicode strings is because I
get those as input and want to store them in a MySQL database using
SQLObject.
Jul 18 '05 #2
"Sebastian Meyer" <s.*****@techno logy-network.de> writes:
Hi newsgroup,

i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)


1) str is the name of a builtin -- often a bad idea to use that as a
variable name.

2) I presume `str' is a unicode string? Try writing the literal as
u'ö' instead (and adding the appropriate coding cookie to your
source file if using Python 2.3). Or I guess you could write it

u'\N{LATIN SMALL LETTER O WITH DIAERESIS}'

Cheers,
mwh

--
Usenet is like a herd of performing elephants with diarrhea --
massive, difficult to redirect, awe-inspiring, entertaining, and
a source of mind-boggling amounts of excrement when you least
expect it. -- spaf (1992)
Jul 18 '05 #3
On Thu, 06 Nov 2003 13:39:25 +0000, Michael Hudson wrote:
"Sebastian Meyer" <s.*****@techno logy-network.de> writes:
Hi newsgroup,

i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)
1) str is the name of a builtin -- often a bad idea to use that as a
variable name.


it was only the example name for the variable, be sure that dont
use any builtins as variable names
maybe not a good example ... thanks for the hint

2) I presume `str' is a unicode string? Try writing the literal as
u'ö' instead (and adding the appropriate coding cookie to your
source file if using Python 2.3). Or I guess you could write it

u'\N{LATIN SMALL LETTER O WITH DIAERESIS}'
i ll try and report back...

Cheers,
mwh


Jul 18 '05 #4
"Sebastian Meyer" <s.*****@techno logy-network.de> wrote in message
news:pa******** *************** ****@technology-network.de...
Hi newsgroup,

i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)

When i work with this, i always get the message
UniCode Error: ASCII decoding error : ordinal not in range(128)


Try adding

sys.setdefaulte ncoding( 'latin-1' )

to your site.py module, or rewrite your fragment as

from = 'ö'
to = 'oe'
s = re.sub( from.encode('la tin-1'), to.encode('lati n-1', s )

If you are running on Windows you might want to change 'latin-1' to 'mbcs',
as that seems to be the most forgiving codec, but it is Windows only.

Joe
Jul 18 '05 #5
Rudy Schockaert <ru************ *@pandoraSTOPSP AM.be> writes:
Sebastian Meyer wrote:
Hi newsgroup,
i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)
When i work with this, i always get the message
UniCode Error: ASCII decoding error : ordinal not in range(128)
Yes i have googled, i searched the faq, manual and python library
and
searched all known soruces of information. I played with the python
builtin function encode to enforce the rigth encoding, but the error
stays the same. I ve read a lot about UniCode and internal conversion
about Strings done by python, but somehow i ve missed the clue.
Nope, python says Huuups... ordinal not in range(128), ;-(
Anyone of you having any idea?? Seems like i am too stupid to read
documentation carefully., perhaps i misunderstand something...
thanks for your help in advance
Sebastian
I'm experiencing something similar for the moment. I try to
base64-encode Unicode strings and I get the exact same errormessage.


"base64-encoding Unicode strings" is not a particularly well defined
operation. "base64-encoding" is a way of turning *binary data* into a
particularly "safe" sequence of ascii characters.

Unicode (in some sense) is a family of ways of representing strings of
characters as binary data.

So to base-64 encode a Unicode string, you need to choose *which*
member of this family you're going to use, which is to say the
encoding. UTF-8 would seem a good bet.

But...
>>> s = u'ö'
>>> s u'\xf6' >>> s.encode('base6 4') Traceback (most recent call last):
File "<interacti ve input>", line 1, in ?
File "C:\Python23\li b\encodings\bas e64_codec.py", line 24, in
base64_encode
output = base64.encodest ring(input)
File "C:\Python23\li b\base64.py", line 39, in encodestring
pieces.append(b inascii.b2a_bas e64(chunk))
UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xe9' in
position 0: ordinal not in range(128)

u'ö'.encode('ut f-8').encode('bas e64') 'w7Y=\n'
When I don't specify it's unicode it works:
>>> s = 'ö'
>>> s '\xf6' >>> s.encode('base6 4')
'9g==\n'


Well, this works because your terminal seems to be latin-1:
u'ö'.encode('la tin-1').encode('bas e64')

'9g==\n'

What would you like to do with a character that isn't in latin-1?
The reason I want to base64-encode these unicode strings is because I
get those as input and want to store them in a MySQL database using
SQLObject.


! Why can't you just encode them as utf-8 strings? (Or, thinking
about it, why doesn't SQLObject support unicode?)

Cheers,
mwh

--
I think if we have the choice, I'd rather we didn't explicitly put
flaws in the reST syntax for the sole purpose of not insulting the
almighty. -- /will on the doc-sig
Jul 18 '05 #6
Sebastian Meyer wrote:
Hi newsgroup,

i am trying to replace german special characters in strings like
str = re.sub('ö', 'oe', str)

When i work with this, i always get the message
UniCode Error: ASCII decoding error : ordinal not in range(128)

Yes i have googled, i searched the faq, manual and python library and
searched all known soruces of information. I played with the python
builtin function encode to enforce the rigth encoding, but the error
stays the same. I ve read a lot about UniCode and internal conversion
about Strings done by python, but somehow i ve missed the clue.
Nope, python says Huuups... ordinal not in range(128), ;-(

Anyone of you having any idea?? Seems like i am too stupid to read
documentation carefully., perhaps i misunderstand something...

thanks for your help in advance

Sebastian


Works here, even with my older snake:

Python 2.2.1 (#1, Sep 10 2002, 17:49:17)
[GCC 3.2] on linux2
Type "help", "copyright" , "credits" or "license" for more information.
import re
re.sub("ö", "oe", "Döspaddel" ) 'Doespaddel' re.sub("ö", "oe", u"Döspaddel" ) u'Doespaddel' re.sub("ö", u"oe", u"Döspaddel" ) u'Doespaddel' re.sub(u"ö", u"oe", u"Döspaddel" ) u'Doespaddel'

To provoke a UnicodeError, I have to convert a unicode string with umlauts
to str without providing the encoding:
str(u"Döspaddel ") Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)

I suspect that you have something similar hidden in your code (i. e.
characters >= 128 that are not converted). The remedy is to explicitly
decode with the appropriate encoding:
u"Döspaddel".en code("latin-1") 'D\xf6spaddel'


Try to build a minimal script that shows the reported behaviour and fix it
or post it for more detailed advice. By the way, don't use str as a
variable name, it's the type of "ordinary" strings.

Peter

Jul 18 '05 #7
Joe Fromm wrote:

Try adding

sys.setdefaulte ncoding( 'latin-1' )

to your site.py module, or rewrite your fragment as

At the end of site.py you can enable a piece of code that sets your
default encoding to the current locale of your computer:

if 1:
# Enable to support locale aware default string encodings.
import locale
loc = locale.getdefau ltlocale()
if loc[1]:
encoding = loc[1]

This works great for me.

Thanks for pointing me to site.py

P.S. I really need some weeks off so I can read all the available
documentation ;-)
Jul 18 '05 #8
>
u'ö'.encode ('utf-8').encode('bas e64')

'w7Y=\n'


This works indeed. And thanks to Joe Fromm's hint (site.py) I don't have
to worry about it anymore.
What would you like to do with a character that isn't in latin-1?
Actually, I don't care as long as the encode and decode on the same
machine give me back the original value.
The reason I want to base64-encode these unicode strings is because I
get those as input and want to store them in a MySQL database using
SQLObject.

! Why can't you just encode them as utf-8 strings? (Or, thinking
about it, why doesn't SQLObject support unicode?)


The actual input strings don't really contain unicode text values, but
rather binary values i get as result from calling win32.NetUserEn um.

The manual of SQLObject (great product btw) explains how you can easily
store binary data in a SQL table by encoding it when setting and
decoding it when getting the value. Tha is just what I was trying to do.
Jul 18 '05 #9
Rudy Schockaert <ru************ *@pandoraSTOPSP AM.be> writes:
>u'ö'.encode ('utf-8').encode('bas e64')

'w7Y=\n'


This works indeed. And thanks to Joe Fromm's hint (site.py) I don't
have to worry about it anymore.


Well, I'm from the setdefaultencod ing-is-evil camp, but it sounds like
you're in a pretty icky situation.
What would you like to do with a character that isn't in latin-1?

Actually, I don't care as long as the encode and decode on the same
machine give me back the original value.


Huh?
The reason I want to base64-encode these unicode strings is because I
get those as input and want to store them in a MySQL database using
SQLObject.

! Why can't you just encode them as utf-8 strings? (Or, thinking
about it, why doesn't SQLObject support unicode?)


The actual input strings don't really contain unicode text values, but
rather binary values i get as result from calling win32.NetUserEn um.


Oh, so they're not really unicode strings at all? Blech. That's
really really nasty. Binary data should really be represented as
(narrow) strings in Python. Perhaps the utf-16-le codec would be the
most appropriate...

Cheers,
mwh

--
Q: What are 1000 lawyers at the bottom of the ocean?
A: A good start.
(A lawyer told me this joke.)
-- Michael Ströder, comp.lang.pytho n
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1855
by: Thomas | last post by:
Hi, I implemented a composite pattern which should be serializable to xml. After spending some time in the newsgroups, i finally managed serializing, even with utf-8 instead of utf-16, which causes ie problems. But when deserializing the xml into the object structure, the following exception is beeing thrown: There is an error in XML document (3, 701).
7
1721
by: | last post by:
hi, I have got XML tag <fo:block font-weight="bold" font-size="13pt"><!]></fo:block>. Problem is when i gives xmlURL from c# to InputStreamReader in J# code with the aspx web page as XML which contain tag. When i try to display output on PDF, it'll get the value from the tag <!]> some thing like a~,^ 845 strange charachers instead of Euro sign. Also where ever i have qoute in text data in XML tags it displays strange charachers. J#...
10
4656
by: Oscar Thornell | last post by:
Hi, I generate and temporary saves a text file to disk. Later I upload this file to Microsoft MapPoint (not so important). The file needs to be in UTF-8 encoding and I explicitly use the "Encoding.UTF8" in the constructor like this: StreamWriter writer = new StreamWriter(file, Encoding.UTF8); When I do this the StreamWriter inserts an UTF-8 preamble "" into the
19
2979
by: Dales | last post by:
I have a custom control that builds what we refer to as "Formlets" around some content in a page. These are basically content "wrapper" sections that are tables that have a colored header and provide an open TD with a DIV in it for the content of this formlet. (The DIV is for DHTML to hide and show the content) I've created a web page showing step by step the two problems I'm encountering. This problem is much easier to see than it...
6
1400
by: Chris Ashley | last post by:
I have been tearing my hair out (or indeed, what's left of it) all day with this one. I'm not sure if it's a .NET issue, a server issue or anything else and would appreciate any guidance. Basically, I have a web app that sends emails. Very basic code, nothing fancy, and I have had it working on about 5 machines! It's hardly worth pasting, but here's the send method from my email class anyway. As you can see, very basic: public void...
4
2393
by: liam_weston | last post by:
I have 2 supposedly identical Windows 2000 web servers each with IIS5. Both have the ASPCODEPAGE set to 65001 (utf-8) in the metabase. The first server has been running pages like the one below for a long time and the output has been correct ie. the ™ character and international characters have displayed correctly (except that the ™ has never displayed correctly in the titlebar). My problem is that the second server is not displaying...
10
2335
by: John Kraft | last post by:
Hello all, I'm experiencing some, imo, strange behavior with the StreamReader object I am using in the code below. Summary is that I am downloading a file from a website and saving it to disk for further parsing. I know, I could use the WebClient and it would be easier, but I don't have the flexibility I want with it. This code appears to work exactly the way I want unless the user cancels the the background operation. In that...
15
6035
by: Bexm | last post by:
Hello I have searched through this forum and it seems some people are having similar problems to me but none of the fixes are fixing mine..! :( I have a table in my database that has two xml fields. I have two bits of generated XML I want to store.. the first one has "<?xml version="1.0" encoding="utf-8" ?>" as its declaration and the second has "<?xml version="1.0" encoding="utf-16" ?>" First time round these both get added to the...
5
2432
by: ioni | last post by:
Good day, fellows! I have a strange problem – at my site there is a flash strip, that loads data dynamically. It works fine (grabs data from the remote server and presents it), however in IE7 and its clones I encounter a strange problem where I can hear clicking sound non-stop (like the page is being reloaded non- stop), whereas the page is not reloading.
0
8863
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9384
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9157
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
5995
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4502
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4762
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3207
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2602
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2147
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.