473,386 Members | 1,720 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

ignoring chinese characters parsing xml file

Hi,
I am parsing an XML file that includes chineses characters, like ^
評評啖啖才是眞.細氺長锍才是愛 or ヘアアイロン... The problem is that I get an error like:
UnicodeEncodeerror:'charmap' codec can't encode characters in position....
The thing is that I would like to ignore it and parse all the characters
less these ones. So, could anyone help me? I suppose that I can catch an
exception that ignores it or maybe use any function that detects this
chinese characters and after that ignore them.

Thanks!!
Fabian
Oct 22 '07 #1
3 2987
On Mon, 22 Oct 2007 21:24:40 +0200, Fabian L贸pez wrote:
I am parsing an XML file that includes chineses characters, like ^
uu鍟栧晼鎵嶆槸w.鎵塋閿嶆墠鏄 or 銉樸偄銈€偆銉*銉... The problem is that I get an error like:
UnicodeEncodeerror:'charmap' codec can't encode characters in
position..
You say you are *parsing* the file but this is an *encode* error. Parsing
means *decoding*.

You have to show some code and the actual traceback to get help. Crystal
balls are not that reliable. ;-)

Ciao,
Marc 'BlackJack' Rintsch
Oct 22 '07 #2
Fabian L贸pez wrote:
Thanks Mark, the code is like this. The attrib name is the problem:

from lxml import etree

context = etree.iterparse("file.xml")
for action, elem in context:
if elem.tag == "weblog":
print action, elem.tag , elem.attrib["name"],elem.attrib["url"],
The problem is the print statement. Looks like your terminal encoding (that
Python needs to encode the unicode string to) can't handle these unicode
characters.

Stefan
Oct 23 '07 #3
On 10/23/07, Stefan Behnel <st******************@web.dewrote:
Fabian L髉ez wrote:
Thanks Mark, the code is like this. The attrib name is the problem:

from lxml import etree

context = etree.iterparse("file.xml")
for action, elem in context:
if elem.tag == "weblog":
print action, elem.tag , elem.attrib["name"],elem.attrib["url"],

The problem is the print statement. Looks like your terminal encoding (that
Python needs to encode the unicode string to) can't handle these unicode
characters.
I agree. For Japanese, you should know the exactly encoding name, and
convert them, just like:

print text.encoding('encoding')

--
I like python!
UliPad <<The Python Editor>>: http://code.google.com/p/ulipad/
meide <<wxPython UI module>>: http://code.google.com/p/meide/
My Blog: http://www.donews.net/limodou
Oct 23 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Kobi Lurie | last post by:
Hello all, I'm trying to make a simple script beginner level script, with just functions. it uses the functions: file_get_contents substr taking into an array the text substr took then...
4
by: Knackeback | last post by:
Hi, I wrote a XML file with GNU emacs 21.2.2 and with chinese character content encoded in UTF-8. I wrote something like: <?xml version="1.0" encoding="UTF-8"?> <test> <chinese>蓟</chinese>...
2
by: Dean A. Hoover | last post by:
I am attempting to parse an extremely simple xml file that has an embedded DTD using java sax2. Here is the xml file: <?xml version="1.0" ?> <!DOCTYPE foo > <foo> <bar/> </foo>
8
by: Agnes | last post by:
In my .net ,i need to generate an xml file , however, user may input a chinese character, Then , the xml will got something unknow characters. the following is my code, Does anyone know how to...
8
by: pabv | last post by:
Hello all, I am having a few issues with encoding to chinese characters and perhaps someone might be able to assist. At the moment I am only able to see chinese characters when displayed as...
4
by: K | last post by:
I've an XML file in UTF-8. It contains some chinese characters ( both simplified chinese and traditional chinese). In loading the XML file with MSXML parser, I used the below code to retrieve...
0
by: st.frey | last post by:
I've got a problem with importing chinese characters into a mysql-table and have read several mailings but didn't find a solution. i have a utf-8 text file that contains chinese characters. the...
2
by: Clive Green | last post by:
Hello peeps, I am using PHP 5.2.2 together with MP3_Id (a PEAR module for reading and writing MP3 tags). I have been using PHP on the command line (Mac OS X Unix shell, to be precise), and am...
4
by: csl | last post by:
I have to open and parse an xml file (with some xml elements in Chinese) in C#, VS2003. I can open the file. However, when I read the xml elements, it's showing "?????". Am I doing something...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.