469,616 Members | 2,559 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,616 developers. It's quick & easy.

Unicode conversion problem (codec can't decode)

I'm having a problem (Python 2.4) converting strings with random 8-bit
characters into an escape form which is 7-bit clean for storage in a database.
Here's an example:

body = meta['mini_body'].encode('unicode-escape')

when given an 8-bit string, (in meta['mini_body']), the code fragment above
yields the error below.

'ascii' codec can't decode byte 0xe1 in position 13: ordinal not in range(128)

the string that generates that error is:

<br>Reduce Whát You Owe by 50%. Get out of debt today!<br>Reduuce Interest &
|V|onthlyy Paymeñts Easy, we will show you how..<br>Freee Quote in 10
Min.<br>http://www.freefromdebtin.net.cn

I've read a lot of stuff about Unicode and Python and I'm pretty comfortable
with how you can convert between different encoding types. What I don't
understand is how to go from a byte string with 8-bit characters to an encoded
string where 8-bit characters are turned into two character hexadecimal sequences.

I really don't care about the character set used. I'm looking for a matched set
of operations that converts the string to a seven bits a form and back to its
original form. Since I need the ability to match a substring of the original
text while the string is in it's encoded state, something like Unicode-escaped
encoding would work well for me. unfortunately, I am missing some knowledge
about encoding and decoding. I wish I knew what cjson was doing because it does
the right things for my project. It takes strings or Unicode, stores everything
as Unicode and then returns everything as Unicode. Quite frankly, I love to
have my entire system run using Unicode strings but again, I missing some
knowledge on how to force all of my modules to be Unicode by default

any enlightenment would be most appreciated.

---eric
--
Speech-recognition in use. It makes mistakes, I correct some.
Apr 4 '08 #1
1 3300
On 2008-04-04 08:18, Jason Scheirer wrote:
On Apr 3, 9:35 pm, "Eric S. Johansson" <e...@harvee.orgwrote:
>I'm having a problem (Python 2.4) converting strings with random 8-bit
characters into an escape form which is 7-bit clean for storage in a database.
If you don't want to process the 7-bit form in any way, there
are a couple of encodings which you could use:
>Here's an example:

body = meta['mini_body'].encode('unicode-escape')

when given an 8-bit string, (in meta['mini_body']), the code fragment above
yields the error below.

'ascii' codec can't decode byte 0xe1 in position 13: ordinal not in range(128)
Try this:

body = meta['mini_body'].decode('latin-1').encode('unicode-escape')
mini_body = body.decode('unicode-escape').encode('latin-1')

or this:

body = meta['mini_body'].decode('latin-1').encode('utf-7')
mini_body = body.decode('utf-7').encode('latin-1')

If all you need is the 7-bit form, you're probably better of
with a base64 encoding:

body = meta['mini_body'].encode('base64')
mini_body = body.decode('base64')
>the string that generates that error is:

<br>Reduce Whát You Owe by 50%. Get out of debt today!<br>Reduuce Interest &
|V|onthlyy Paymeñts Easy, we will show you how..<br>Freee Quote in 10
Min.<br>http://www.freefromdebtin.net.cn
Looks like spam :-)

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Apr 04 2008)
>>Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
__________________________________________________ ______________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
Apr 4 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

8 posts views Thread by Bill Eldridge | last post: by
8 posts views Thread by Ivan Voras | last post: by
14 posts views Thread by wolfgang haefelinger | last post: by
4 posts views Thread by fowlertrainer | last post: by
1 post views Thread by olsongt | last post: by
19 posts views Thread by Thomas W | last post: by
reply views Thread by John Machin | last post: by
1 post views Thread by Mudcat | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.