Hi,
I am parsing an XML file that includes chineses characters, like ^
評評啖啖才是眞.細氺長锍才是愛 or ヘアアイロン... The problem is that I get an error like:
UnicodeEncodeerror:'charmap' codec can't encode characters in position....
The thing is that I would like to ignore it and parse all the characters
less these ones. So, could anyone help me? I suppose that I can catch an
exception that ignores it or maybe use any function that detects this
chinese characters and after that ignore them.
Thanks!!
Fabian 3 2978
On Mon, 22 Oct 2007 21:24:40 +0200, Fabian L贸pez wrote:
I am parsing an XML file that includes chineses characters, like ^
uu鍟栧晼鎵嶆槸w.鎵塋閿嶆墠鏄 or 銉樸偄銈€偆銉*銉... The problem is that I get an error like:
UnicodeEncodeerror:'charmap' codec can't encode characters in
position..
You say you are *parsing* the file but this is an *encode* error. Parsing
means *decoding*.
You have to show some code and the actual traceback to get help. Crystal
balls are not that reliable. ;-)
Ciao,
Marc 'BlackJack' Rintsch
Fabian L贸pez wrote:
Thanks Mark, the code is like this. The attrib name is the problem:
from lxml import etree
context = etree.iterparse("file.xml")
for action, elem in context:
if elem.tag == "weblog":
print action, elem.tag , elem.attrib["name"],elem.attrib["url"],
The problem is the print statement. Looks like your terminal encoding (that
Python needs to encode the unicode string to) can't handle these unicode
characters.
Stefan
On 10/23/07, Stefan Behnel <st******************@web.dewrote:
Fabian L髉ez wrote:
Thanks Mark, the code is like this. The attrib name is the problem:
from lxml import etree
context = etree.iterparse("file.xml")
for action, elem in context:
if elem.tag == "weblog":
print action, elem.tag , elem.attrib["name"],elem.attrib["url"],
The problem is the print statement. Looks like your terminal encoding (that
Python needs to encode the unicode string to) can't handle these unicode
characters.
I agree. For Japanese, you should know the exactly encoding name, and
convert them, just like:
print text.encoding('encoding')
--
I like python!
UliPad <<The Python Editor>>: http://code.google.com/p/ulipad/
meide <<wxPython UI module>>: http://code.google.com/p/meide/
My Blog: http://www.donews.net/limodou This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Kobi Lurie |
last post by:
Hello all,
I'm trying to make a simple script
beginner level script, with just functions.
it uses the functions:
file_get_contents
substr
taking into an array the text substr took
then...
|
by: Knackeback |
last post by:
Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
I wrote something like:
<?xml version="1.0" encoding="UTF-8"?>
<test>
<chinese>蓟</chinese>...
|
by: Dean A. Hoover |
last post by:
I am attempting to parse an extremely
simple xml file that has an embedded DTD
using java sax2. Here is the xml file:
<?xml version="1.0" ?>
<!DOCTYPE foo >
<foo>
<bar/>
</foo>
|
by: Agnes |
last post by:
In my .net ,i need to generate an xml file , however, user may input a
chinese character, Then , the xml will got something unknow characters. the
following is my code, Does anyone know how to...
|
by: pabv |
last post by:
Hello all,
I am having a few issues with encoding to chinese characters and
perhaps someone might be able to assist.
At the moment I am only able to see chinese characters when displayed
as...
|
by: K |
last post by:
I've an XML file in UTF-8.
It contains some chinese characters ( both simplified chinese and
traditional chinese).
In loading the XML file with MSXML parser, I used the below code to retrieve...
|
by: st.frey |
last post by:
I've got a problem with importing chinese characters into a mysql-table
and have read several mailings but didn't find a solution.
i have a utf-8 text file that contains chinese characters. the...
|
by: Clive Green |
last post by:
Hello peeps,
I am using PHP 5.2.2 together with MP3_Id (a PEAR module for reading and
writing MP3 tags). I have been using PHP on the command line (Mac OS X
Unix shell, to be precise), and am...
|
by: csl |
last post by:
I have to open and parse an xml file (with some xml elements in Chinese) in
C#, VS2003. I can open the file. However, when I read the xml elements,
it's showing "?????". Am I doing something...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM).
In this month's session, the creator of the excellent VBE...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you抣l learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: Aftab Ahmad |
last post by:
Hello Experts!
I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
|
by: Aftab Ahmad |
last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below.
Dim IE As Object
Set IE =...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
| |