473,851 Members | 2,037 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Unicode conversion problem (codec can't decode)

I'm having a problem (Python 2.4) converting strings with random 8-bit
characters into an escape form which is 7-bit clean for storage in a database.
Here's an example:

body = meta['mini_body'].encode('unicod e-escape')

when given an 8-bit string, (in meta['mini_body']), the code fragment above
yields the error below.

'ascii' codec can't decode byte 0xe1 in position 13: ordinal not in range(128)

the string that generates that error is:

<br>Reduce Whßt You Owe by 50%. Get out of debt today!<br>Reduu ce Interest &
|V|onthlyy Payme˝ts Easy, we will show you how..<br>Freee Quote in 10
Min.<br>http://www.freefromdebtin.net.cn

I've read a lot of stuff about Unicode and Python and I'm pretty comfortable
with how you can convert between different encoding types. What I don't
understand is how to go from a byte string with 8-bit characters to an encoded
string where 8-bit characters are turned into two character hexadecimal sequences.

I really don't care about the character set used. I'm looking for a matched set
of operations that converts the string to a seven bits a form and back to its
original form. Since I need the ability to match a substring of the original
text while the string is in it's encoded state, something like Unicode-escaped
encoding would work well for me. unfortunately, I am missing some knowledge
about encoding and decoding. I wish I knew what cjson was doing because it does
the right things for my project. It takes strings or Unicode, stores everything
as Unicode and then returns everything as Unicode. Quite frankly, I love to
have my entire system run using Unicode strings but again, I missing some
knowledge on how to force all of my modules to be Unicode by default

any enlightenment would be most appreciated.

---eric
--
Speech-recognition in use. It makes mistakes, I correct some.
Apr 4 '08 #1
1 3581
On 2008-04-04 08:18, Jason Scheirer wrote:
On Apr 3, 9:35 pm, "Eric S. Johansson" <e...@harvee.or gwrote:
>I'm having a problem (Python 2.4) converting strings with random 8-bit
characters into an escape form which is 7-bit clean for storage in a database.
If you don't want to process the 7-bit form in any way, there
are a couple of encodings which you could use:
>Here's an example:

body = meta['mini_body'].encode('unicod e-escape')

when given an 8-bit string, (in meta['mini_body']), the code fragment above
yields the error below.

'ascii' codec can't decode byte 0xe1 in position 13: ordinal not in range(128)
Try this:

body = meta['mini_body'].decode('latin-1').encode('uni code-escape')
mini_body = body.decode('un icode-escape').encode ('latin-1')

or this:

body = meta['mini_body'].decode('latin-1').encode('utf-7')
mini_body = body.decode('ut f-7').encode('lat in-1')

If all you need is the 7-bit form, you're probably better of
with a base64 encoding:

body = meta['mini_body'].encode('base64 ')
mini_body = body.decode('ba se64')
>the string that generates that error is:

<br>Reduce Whßt You Owe by 50%. Get out of debt today!<br>Reduu ce Interest &
|V|onthlyy Payme˝ts Easy, we will show you how..<br>Freee Quote in 10
Min.<br>http://www.freefromdebtin.net.cn
Looks like spam :-)

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Apr 04 2008)
>>Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.D atabase.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
_______________ _______________ _______________ _______________ ____________

:::: Try mxODBC.Zope.DA for Windows,Linux,S olaris,MacOSX for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
Apr 4 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
5285
by: Bill Eldridge | last post by:
I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5, etc.) What I'd like is something as simple as: CREATE TABLE junk (junklet VARCHAR(2500) CHARACTER SET UTF8)); import MySQLdb, re,urllib
8
2562
by: Ivan Voras | last post by:
When concatenating strings (actually, a constant and a string...) i get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 1: ordinal not in range(128) Now I don't think either string is unicode, but I'm working with win32api so it might be... :) The point is: I know all values will fit in a particular code page (iso-8859-2), so how do I change the 'ascii' codec in the above error into something...
14
2867
by: wolfgang haefelinger | last post by:
Hi, I wonder whether someone could explain me a bit what's going on here: import sys # I'm running Mandrake 1o and Windows XP. print sys.version ## 2.3.3 (#2, Feb 17 2004, 11:45:40)
4
2040
by: fowlertrainer | last post by:
Hi ! I want to get the WMI infos from Windows machines. I use Py from HU (iso-8859-2) charset. Then I wrote some utility for it, because I want to write it to an XML file. def ToHU(s,NoneStr='-'): if s==None: s=NoneStr if not (type(s) in ):
1
3048
by: olsongt | last post by:
I was going to submit to sourceforge, but my unicode skills are weak. I was trying to strip characters from a string that contained values outside of ASCII. I though I could just encode as 'ascii' in 'replace' mode but it threw an error. Strangely enough, if I decode via the ascii codec and then encode via the ascii codec, I get what I want. That being said, this may be operating correctly. >>> print 'aaa\xae' aaa« >>>...
2
10332
by: aurora | last post by:
I have some unicode string with some characters encode using python notation like '\n' for LF. I need to convert that to the actual LF character. There is a 'unicode_escape' codec that seems to suit my purpose. >>> encoded = u'A\\nA' >>> decoded = encoded.decode('unicode_escape') >>> print len(decoded) 3 Note that both encoded and decoded are unicode string. I'm trying to use
19
3347
by: Thomas W | last post by:
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "f°dselsdag". I stored the string as "f°dselsdag" but somewhere in my code it got translated into the mess above and I cannot get the original string back. It cannot be printed in the console or written a plain text-file. I've tried to convert it using
0
1715
by: John Machin | last post by:
On Apr 25, 9:15 pm, "andreas.prof...@googlemail.com" <andreas.prof...@googlemail.comwrote: Guessing is no substitute for reading the manual. print has nothing to do with your problem; the problem is unicode(media) -- as you specified no encoding, it uses the default encoding, which is ascii . As the 2nd byte is 0x9c, ascii is going nowhere.
1
3867
by: Mudcat | last post by:
In short what I'm trying to do is read a document using an xml parser and then upload that data back into a database. I've got the code more or less completed using xml.etree.ElementTree for the parser and dbi/ odbc for my db connection. To fix problems with unicode I built a work-around by mapping unicode characters to equivalent ascii characters and then encoding everything to ascii. That allowed me to build the application and debug...
0
9747
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11019
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10670
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10728
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10356
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7906
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph DuprÚ who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5933
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4549
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3179
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.