473,795 Members | 3,167 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

'ascii' codec can't encode character u'\u2013'

Hi

Using Python 2.3.4 + Feedparser 3.3 (a library to parse XML documents)

I'm trying to parse a UTF-8 document with special characters like
acute-accent vowels:
--------
<?xml version="1.0" encoding="UTF-8" standalone="yes "?>
....
-------

But I get this error message:
-------
UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\u2013' in
position 122: ordinal not in range(128)
-------

when trying to execute a MySQL query:
----
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query) #<--- error line
----

I tried with:
-------
text_extrated = text_extrated.e ncode('iso-8859-1') #<--- error line
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query)
-------

But I get this error:
------
UnicodeEncodeEr ror: 'latin-1' codec can't encode character u'\u2013'
in position 92: ordinal not in range(256)
-----

I also tried with:
----
text_extrated = re.sub(u'\u2013 ', '-' , text_extrated)
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query)
-----

It works, but I don't want to substitute each special character,
because there are
always forgotten ones which can crack the program.

Any suggestion to fix it? Thank you very much.
Sep 30 '05 #1
3 20546
thomas Armstrong wrote:
(...)
when trying to execute a MySQL query:
----
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query) #<--- error line
----


well, to start it's not the best way to do an update,
try this instead:

query = "UPDATE blogs_news SET text = %s WHERE id=%s"
cursor.execute( query, (text_extrated, id))

so mysqldb will take care to quote text_extrated automatically. this
may not not your problem, but it's considered "good style" when dealing
with dbs.

apart for this, IIRC feedparser returns text as unicode strings, and
you correctly tried to encode those as latin-1 str objects before to
pass it to mysql, but not all glyphs in the orginal utf-8 feed can be
translated to latin-1. the charecter set of latin-1 is very thin
compared to the utf-8.

you have to decide:

* switch your mysql db to utf-8 and encode stuff before
insertion to UTF-8

* lose those characters that cannot be mapped into latin-1,
using the:

text_extrated.e ncode('latin-1', errors='replace ')

so unrecognized chars will be replaced by ?

also, mysqldb has some support to manage unicode objects directly, but
things changed a bit during recent releases so i cannot be precise in
this regard.

HTH.

--
deelan, #1 fan of adriana lima!
<http://www.deelan.com/>


Sep 30 '05 #2
Hi.

Thank you both for your answers.

Finally I changed my MySQL table to UTF-8 and changed the structure
of the query (with '%s').

It works. Thank you very much.

2005/9/30, deelan <gg*@zzz.it>:
thomas Armstrong wrote:
(...)
when trying to execute a MySQL query:
----
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query) #<--- error line
----


well, to start it's not the best way to do an update,
try this instead:

query = "UPDATE blogs_news SET text = %s WHERE id=%s"
cursor.execute( query, (text_extrated, id))

so mysqldb will take care to quote text_extrated automatically. this
may not not your problem, but it's considered "good style" when dealing
with dbs.

apart for this, IIRC feedparser returns text as unicode strings, and
you correctly tried to encode those as latin-1 str objects before to
pass it to mysql, but not all glyphs in the orginal utf-8 feed can be
translated to latin-1. the charecter set of latin-1 is very thin
compared to the utf-8.

you have to decide:

* switch your mysql db to utf-8 and encode stuff before
insertion to UTF-8

* lose those characters that cannot be mapped into latin-1,
using the:

text_extrated.e ncode('latin-1', errors='replace ')

so unrecognized chars will be replaced by ?

also, mysqldb has some support to manage unicode objects directly, but
things changed a bit during recent releases so i cannot be precise in
this regard.

HTH.

--
deelan, #1 fan of adriana lima!
<http://www.deelan.com/>


--
http://mail.python.org/mailman/listinfo/python-list

Sep 30 '05 #3
deelan <gg*@zzz.it> writes:
[...]
query = "UPDATE blogs_news SET text = %s WHERE id=%s"
cursor.execute( query, (text_extrated, id))

so mysqldb will take care to quote text_extrated automatically. this
may not not your problem, but it's considered "good style" when dealing
with dbs.

[...]

More than just good style: it prevents SQL injection attacks that
could otherwise allow people to do bad things to your databases.
John

Sep 30 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
10706
by: Peter Wilkinson | last post by:
Hello tlistmembers, I am using the encoding function to convert unicode to ascii. At one point this code was working just fine, however, now it has broken. I am reading a text file that has is in unicode (I am unsure of which flavour or bit depth). as I read in the file one line at a time (readlines()) it converts to ascii. Simple enough. At the same time I am copressing to bz2 with the bz2 module but that works just fine. The code...
1
8150
by: oziko | last post by:
Hi, I get a piece of code of ogg123.py from the pyogg site, this is the code: ******************************* ogg_file = sys.argv vorbis_file=ogg.vorbis.VorbisFile(ogg_file) comentarios = vorbis_file.comment() recognized_comments = ('Artist', 'Album', 'Title', 'Version',
2
12435
by: Francach | last post by:
Hi, I don't know what I'm doing wrong here. I''m using Python 2.4 and py2exe. I get he following error: Traceback (most recent call last): File "notegui.pyc", line 34, in OnClose File "brain.pyc", line 61, in setNote File "points.pyc", line 151, in setNote File "point.pyc", line 100, in writeNote
24
9073
by: ChaosKCW | last post by:
Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special charcaters from an ASCII perspective. I get the following error: > SQLiteCur.execute(sql, row)
5
6214
by: Mike Currie | last post by:
Can anyone explain why I'm getting an ascii encoding error when I'm trying to write out using a UTF-8 encoder? Thanks Python 2.4.3 (#69, Mar 29 2006, 17:35:34) on win32 Type "help", "copyright", "credits" or "license" for more information. >>> filterMap = {} >>> for i in range(0,255):
19
3341
by: Thomas W | last post by:
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". I stored the string as "fødselsdag" but somewhere in my code it got translated into the mess above and I cannot get the original string back. It cannot be printed in the console or written a plain text-file. I've tried to convert it using
2
3328
by: John Nagle | last post by:
I'm trying to clean up a bad ASCII string, one read from a web page that is supposedly in the ASCII character set but has some characters above 127. And I get this: File "D:\projects\sitetruth\InfoSitePage.py", line 285, in httpfetch sitetext = sitetext.encode('ascii','replace') # force to clean ASCII UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 29151: ordinal not in range(128)
4
5382
by: Oleg Parashchenko | last post by:
Hello, I'm working on an unicode-aware application. I like to use "print" to debug programs, but in this case it was nightmare. The most popular result of "print" was: UnicodeDecodeError: 'ascii' codec can't decode byte 0xXX in position 0: ordinal not in range(128) I spent two hours fixing it, and I hope it's done. The solution is one
7
11142
by: luca72 | last post by:
hello i have this problem: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 8: ordinal not in range(128) Generally i solve the problem inserting : # -*- coding: ISO-8859-1 -*- at the top of the file but now he don't work can you help me thanks Luca
0
9672
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9519
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10000
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7538
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6779
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5436
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5563
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4113
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3721
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.