Unicode conversion problem (codec can't decode)

Eric S. Johansson

I'm having a problem (Python 2.4) converting strings with random 8-bit
characters into an escape form which is 7-bit clean for storage in a database.
Here's an example:

body = meta['mini_body'].encode('unicode-escape')

when given an 8-bit string, (in meta['mini_body']), the code fragment above
yields the error below.

'ascii' codec can't decode byte 0xe1 in position 13: ordinal not in range(128)

the string that generates that error is:

 Reduce Whát You Owe by 50%. Get out of debt today! Reduuce Interest &
|V|onthlyy Paymeñts Easy, we will show you how.. Freee Quote in 10
Min. http://www.freefromdebtin.net.cn

I've read a lot of stuff about Unicode and Python and I'm pretty comfortable
with how you can convert between different encoding types. What I don't
understand is how to go from a byte string with 8-bit characters to an encoded
string where 8-bit characters are turned into two character hexadecimal sequences.

I really don't care about the character set used. I'm looking for a matched set
of operations that converts the string to a seven bits a form and back to its
original form. Since I need the ability to match a substring of the original
text while the string is in it's encoded state, something like Unicode-escaped
encoding would work well for me. unfortunately, I am missing some knowledge
about encoding and decoding. I wish I knew what cjson was doing because it does
the right things for my project. It takes strings or Unicode, stores everything
as Unicode and then returns everything as Unicode. Quite frankly, I love to
have my entire system run using Unicode strings but again, I missing some
knowledge on how to force all of my modules to be Unicode by default

any enlightenment would be most appreciated.

---eric
--
Speech-recognition in use. It makes mistakes, I correct some.

Apr 4 '08 #1

Subscribe Post Reply

3539

M.-A. Lemburg

On 2008-04-04 08:18, Jason Scheirer wrote:

On Apr 3, 9:35 pm, "Eric S. Johansson" <e...@harvee.orgwrote:
>I'm having a problem (Python 2.4) converting strings with random 8-bit
characters into an escape form which is 7-bit clean for storage in a database.

If you don't want to process the 7-bit form in any way, there
are a couple of encodings which you could use:

>Here's an example:

body = meta['mini_body'].encode('unicode-escape')

when given an 8-bit string, (in meta['mini_body']), the code fragment above
yields the error below.

'ascii' codec can't decode byte 0xe1 in position 13: ordinal not in range(128)

Try this:

body = meta['mini_body'].decode('latin-1').encode('unicode-escape')
mini_body = body.decode('unicode-escape').encode('latin-1')

or this:

body = meta['mini_body'].decode('latin-1').encode('utf-7')
mini_body = body.decode('utf-7').encode('latin-1')

If all you need is the 7-bit form, you're probably better of
with a base64 encoding:

body = meta['mini_body'].encode('base64')
mini_body = body.decode('base64')

>the string that generates that error is:

 Reduce Whát You Owe by 50%. Get out of debt today! Reduuce Interest &
|V|onthlyy Paymeñts Easy, we will show you how.. Freee Quote in 10
Min. http://www.freefromdebtin.net.cn

Looks like spam :-)

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Apr 04 2008)

>>Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

__________________________________________________ ______________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611

Apr 4 '08 #2

Similar topics

Unicode from Web to MySQL

by: Bill Eldridge | last post by:

I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5,...

Python

unicode codecs

by: Ivan Voras | last post by:

When concatenating strings (actually, a constant and a string...) i get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 1: ordinal not in range(128) ...

Python

unicode question

by: wolfgang haefelinger | last post by:

Hi, I wonder whether someone could explain me a bit what's going on here: import sys # I'm running Mandrake 1o and Windows XP. print sys.version ## 2.3.3 (#2, Feb 17 2004, 11:45:40)

Python

Unicode drives me crazy...

by: fowlertrainer | last post by:

Hi ! I want to get the WMI infos from Windows machines. I use Py from HU (iso-8859-2) charset. Then I wrote some utility for it, because I want to write it to an XML file. def...

Python

is this a unicode/string bug?

by: olsongt | last post by:

I was going to submit to sourceforge, but my unicode skills are weak. I was trying to strip characters from a string that contained values outside of ASCII. I though I could just encode as 'ascii'...

Python

decode unicode string using 'unicode_escape' codecs

by: aurora | last post by:

I have some unicode string with some characters encode using python notation like '\n' for LF. I need to convert that to the actual LF character. There is a 'unicode_escape' codec that seems to...

Python

Unicode/ascii encoding nightmare

by: Thomas W | last post by:

I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". I stored the string as "fødselsdag"...

Python

Re: problem with unicode

by: John Machin | last post by:

On Apr 25, 9:15 pm, "andreas.prof...@googlemail.com" <andreas.prof...@googlemail.comwrote: Guessing is no substitute for reading the manual. print has nothing to do with your problem; the...

Python

Writing Unicode to database using ODBC

by: Mudcat | last post by:

In short what I'm trying to do is read a document using an xml parser and then upload that data back into a database. I've got the code more or less completed using xml.etree.ElementTree for the...

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware