473,395 Members | 1,623 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

'ascii' codec can't encode character u'\u2013'

Hi

Using Python 2.3.4 + Feedparser 3.3 (a library to parse XML documents)

I'm trying to parse a UTF-8 document with special characters like
acute-accent vowels:
--------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
....
-------

But I get this error message:
-------
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in
position 122: ordinal not in range(128)
-------

when trying to execute a MySQL query:
----
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query) #<--- error line
----

I tried with:
-------
text_extrated = text_extrated.encode('iso-8859-1') #<--- error line
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query)
-------

But I get this error:
------
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013'
in position 92: ordinal not in range(256)
-----

I also tried with:
----
text_extrated = re.sub(u'\u2013', '-' , text_extrated)
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query)
-----

It works, but I don't want to substitute each special character,
because there are
always forgotten ones which can crack the program.

Any suggestion to fix it? Thank you very much.
Sep 30 '05 #1
3 20513
thomas Armstrong wrote:
(...)
when trying to execute a MySQL query:
----
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query) #<--- error line
----


well, to start it's not the best way to do an update,
try this instead:

query = "UPDATE blogs_news SET text = %s WHERE id=%s"
cursor.execute(query, (text_extrated, id))

so mysqldb will take care to quote text_extrated automatically. this
may not not your problem, but it's considered "good style" when dealing
with dbs.

apart for this, IIRC feedparser returns text as unicode strings, and
you correctly tried to encode those as latin-1 str objects before to
pass it to mysql, but not all glyphs in the orginal utf-8 feed can be
translated to latin-1. the charecter set of latin-1 is very thin
compared to the utf-8.

you have to decide:

* switch your mysql db to utf-8 and encode stuff before
insertion to UTF-8

* lose those characters that cannot be mapped into latin-1,
using the:

text_extrated.encode('latin-1', errors='replace')

so unrecognized chars will be replaced by ?

also, mysqldb has some support to manage unicode objects directly, but
things changed a bit during recent releases so i cannot be precise in
this regard.

HTH.

--
deelan, #1 fan of adriana lima!
<http://www.deelan.com/>


Sep 30 '05 #2
Hi.

Thank you both for your answers.

Finally I changed my MySQL table to UTF-8 and changed the structure
of the query (with '%s').

It works. Thank you very much.

2005/9/30, deelan <gg*@zzz.it>:
thomas Armstrong wrote:
(...)
when trying to execute a MySQL query:
----
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query) #<--- error line
----


well, to start it's not the best way to do an update,
try this instead:

query = "UPDATE blogs_news SET text = %s WHERE id=%s"
cursor.execute(query, (text_extrated, id))

so mysqldb will take care to quote text_extrated automatically. this
may not not your problem, but it's considered "good style" when dealing
with dbs.

apart for this, IIRC feedparser returns text as unicode strings, and
you correctly tried to encode those as latin-1 str objects before to
pass it to mysql, but not all glyphs in the orginal utf-8 feed can be
translated to latin-1. the charecter set of latin-1 is very thin
compared to the utf-8.

you have to decide:

* switch your mysql db to utf-8 and encode stuff before
insertion to UTF-8

* lose those characters that cannot be mapped into latin-1,
using the:

text_extrated.encode('latin-1', errors='replace')

so unrecognized chars will be replaced by ?

also, mysqldb has some support to manage unicode objects directly, but
things changed a bit during recent releases so i cannot be precise in
this regard.

HTH.

--
deelan, #1 fan of adriana lima!
<http://www.deelan.com/>


--
http://mail.python.org/mailman/listinfo/python-list

Sep 30 '05 #3
deelan <gg*@zzz.it> writes:
[...]
query = "UPDATE blogs_news SET text = %s WHERE id=%s"
cursor.execute(query, (text_extrated, id))

so mysqldb will take care to quote text_extrated automatically. this
may not not your problem, but it's considered "good style" when dealing
with dbs.

[...]

More than just good style: it prevents SQL injection attacks that
could otherwise allow people to do bad things to your databases.
John

Sep 30 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: Peter Wilkinson | last post by:
Hello tlistmembers, I am using the encoding function to convert unicode to ascii. At one point this code was working just fine, however, now it has broken. I am reading a text file that has is...
1
by: oziko | last post by:
Hi, I get a piece of code of ogg123.py from the pyogg site, this is the code: ******************************* ogg_file = sys.argv vorbis_file=ogg.vorbis.VorbisFile(ogg_file) comentarios =...
2
by: Francach | last post by:
Hi, I don't know what I'm doing wrong here. I''m using Python 2.4 and py2exe. I get he following error: Traceback (most recent call last): File "notegui.pyc", line 34, in OnClose File...
24
by: ChaosKCW | last post by:
Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special...
5
by: Mike Currie | last post by:
Can anyone explain why I'm getting an ascii encoding error when I'm trying to write out using a UTF-8 encoder? Thanks Python 2.4.3 (#69, Mar 29 2006, 17:35:34) on win32 Type "help",...
19
by: Thomas W | last post by:
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". I stored the string as "fødselsdag"...
2
by: John Nagle | last post by:
I'm trying to clean up a bad ASCII string, one read from a web page that is supposedly in the ASCII character set but has some characters above 127. And I get this: File...
4
by: Oleg Parashchenko | last post by:
Hello, I'm working on an unicode-aware application. I like to use "print" to debug programs, but in this case it was nightmare. The most popular result of "print" was: UnicodeDecodeError:...
7
by: luca72 | last post by:
hello i have this problem: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 8: ordinal not in range(128) Generally i solve the problem inserting : # -*- coding:...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.