'ascii' codec can't encode character u'\u2013'

thomas Armstrong

Hi

Using Python 2.3.4 + Feedparser 3.3 (a library to parse XML documents)

I'm trying to parse a UTF-8 document with special characters like
acute-accent vowels:
--------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
....
-------

But I get this error message:
-------
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in
position 122: ordinal not in range(128)
-------

when trying to execute a MySQL query:
----
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query) #<--- error line
----

I tried with:
-------
text_extrated = text_extrated.encode('iso-8859-1') #<--- error line
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query)
-------

But I get this error:
------
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013'
in position 92: ordinal not in range(256)
-----

I also tried with:
----
text_extrated = re.sub(u'\u2013', '-' , text_extrated)
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query)
-----

It works, but I don't want to substitute each special character,
because there are
always forgotten ones which can crack the program.

Any suggestion to fix it? Thank you very much.

Sep 30 '05 #1

Subscribe Post Reply

20513

deelan

thomas Armstrong wrote:
(...)

when trying to execute a MySQL query:
----
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query) #<--- error line
----

well, to start it's not the best way to do an update,
try this instead:

query = "UPDATE blogs_news SET text = %s WHERE id=%s"
cursor.execute(query, (text_extrated, id))

so mysqldb will take care to quote text_extrated automatically. this
may not not your problem, but it's considered "good style" when dealing
with dbs.

apart for this, IIRC feedparser returns text as unicode strings, and
you correctly tried to encode those as latin-1 str objects before to
pass it to mysql, but not all glyphs in the orginal utf-8 feed can be
translated to latin-1. the charecter set of latin-1 is very thin
compared to the utf-8.

you have to decide:

* switch your mysql db to utf-8 and encode stuff before
insertion to UTF-8

* lose those characters that cannot be mapped into latin-1,
using the:

text_extrated.encode('latin-1', errors='replace')

so unrecognized chars will be replaced by ?

also, mysqldb has some support to manage unicode objects directly, but
things changed a bit during recent releases so i cannot be precise in
this regard.

HTH.

--
deelan, #1 fan of adriana lima!
<http://www.deelan.com/>

Sep 30 '05 #2

thomas Armstrong

Hi.

Thank you both for your answers.

Finally I changed my MySQL table to UTF-8 and changed the structure
of the query (with '%s').

It works. Thank you very much.

2005/9/30, deelan <gg*@zzz.it>:

thomas Armstrong wrote:
(...)
when trying to execute a MySQL query:
----
query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
id='" + id + "'"
cursor.execute (query) #<--- error line
----

well, to start it's not the best way to do an update,
try this instead:

query = "UPDATE blogs_news SET text = %s WHERE id=%s"
cursor.execute(query, (text_extrated, id))

so mysqldb will take care to quote text_extrated automatically. this
may not not your problem, but it's considered "good style" when dealing
with dbs.

apart for this, IIRC feedparser returns text as unicode strings, and
you correctly tried to encode those as latin-1 str objects before to
pass it to mysql, but not all glyphs in the orginal utf-8 feed can be
translated to latin-1. the charecter set of latin-1 is very thin
compared to the utf-8.

you have to decide:

* switch your mysql db to utf-8 and encode stuff before
insertion to UTF-8

* lose those characters that cannot be mapped into latin-1,
using the:

text_extrated.encode('latin-1', errors='replace')

so unrecognized chars will be replaced by ?

also, mysqldb has some support to manage unicode objects directly, but
things changed a bit during recent releases so i cannot be precise in
this regard.

HTH.

--
deelan, #1 fan of adriana lima!
<http://www.deelan.com/>

--
http://mail.python.org/mailman/listinfo/python-list

Sep 30 '05 #3

John J. Lee

deelan <gg*@zzz.it> writes:
[...]

query = "UPDATE blogs_news SET text = %s WHERE id=%s"
cursor.execute(query, (text_extrated, id))

so mysqldb will take care to quote text_extrated automatically. this
may not not your problem, but it's considered "good style" when dealing
with dbs.

[...]

More than just good style: it prevents SQL injection attacks that
could otherwise allow people to do bad things to your databases.
John

Sep 30 '05 #4

Similar topics

unicode to ascii converting

by: Peter Wilkinson | last post by:

Hello tlistmembers, I am using the encoding function to convert unicode to ascii. At one point this code was working just fine, however, now it has broken. I am reading a text file that has is...

Python

'ascii' codec can't encode character u'\xf3'

by: oziko | last post by:

Hi, I get a piece of code of ogg123.py from the pyogg site, this is the code: ******************************* ogg_file = sys.argv vorbis_file=ogg.vorbis.VorbisFile(ogg_file) comentarios =...

Python

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 99: ordinal not in range(128)

by: Francach | last post by:

Hi, I don't know what I'm doing wrong here. I''m using Python 2.4 and py2exe. I get he following error: Traceback (most recent call last): File "notegui.pyc", line 34, in OnClose File...

Python

Convertion of Unicode to ASCII NIGHTMARE

by: ChaosKCW | last post by:

Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special...

Python

Ascii Encoding Error with UTF-8 encoder

by: Mike Currie | last post by:

Can anyone explain why I'm getting an ascii encoding error when I'm trying to write out using a UTF-8 encoder? Thanks Python 2.4.3 (#69, Mar 29 2006, 17:35:34) on win32 Type "help",...

Python

Unicode/ascii encoding nightmare

by: Thomas W | last post by:

I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". I stored the string as "fødselsdag"...

Python

Trouble fixing a broken ASCII string - "replace" mode in codec notworking.

by: John Nagle | last post by:

I'm trying to clean up a bad ASCII string, one read from a web page that is supposedly in the ASCII character set but has some characters above 127. And I get this: File...

Python

Long way around UnicodeDecodeError, or 'ascii' codec can't decode byte

by: Oleg Parashchenko | last post by:

Hello, I'm working on an unicode-aware application. I like to use "print" to debug programs, but in this case it was nightmare. The most popular result of "print" was: UnicodeDecodeError:...

Python

Ascii codec can't encode

by: luca72 | last post by:

hello i have this problem: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 8: ordinal not in range(128) Generally i solve the problem inserting : # -*- coding:...

Python

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General