ignoring chinese characters parsing xml file

=?ISO-8859-1?Q?Fabian_L=F3pez?=

Hi,
I am parsing an XML file that includes chineses characters, like ^
ÔuÔuà¢à¢²ÅÊÇ±w.¼ššìéLï³²ÅÊÇÛ or ¥Ø¥¢¥¢¥¤¥í¥ó... The problem is that I get an error like:
UnicodeEncodeerror:'charmap' codec can't encode characters in position....
The thing is that I would like to ignore it and parse all the characters
less these ones. So, could anyone help me? I suppose that I can catch an
exception that ignores it or maybe use any function that detects this
chinese characters and after that ignore them.

Thanks!!
Fabian

Oct 22 '07 #1

Subscribe Post Reply

2987

Marc 'BlackJack' Rintsch

On Mon, 22 Oct 2007 21:24:40 +0200, Fabian LÃ³pez wrote:

I am parsing an XML file that includes chineses characters, like ^
uuå•–å•–æ‰æ˜¯w.æ‰‰Lé”æ‰æ˜¯ or ãƒ˜ã‚¢ã‚¢ã‚¤ãƒ*ãƒ³... The problem is that I get an error like:
UnicodeEncodeerror:'charmap' codec can't encode characters in
position..

You say you are *parsing* the file but this is an *encode* error. Parsing
means *decoding*.

You have to show some code and the actual traceback to get help. Crystal
balls are not that reliable. ;-)

Ciao,
Marc 'BlackJack' Rintsch

Oct 22 '07 #2

Stefan Behnel

Fabian LÃ³pez wrote:

Thanks Mark, the code is like this. The attrib name is the problem:

from lxml import etree

context = etree.iterparse("file.xml")
for action, elem in context:
if elem.tag == "weblog":
print action, elem.tag , elem.attrib["name"],elem.attrib["url"],

The problem is the print statement. Looks like your terminal encoding (that
Python needs to encode the unicode string to) can't handle these unicode
characters.

Stefan

Oct 23 '07 #3

limodou

On 10/23/07, Stefan Behnel <st******************@web.dewrote:

Fabian López wrote:
Thanks Mark, the code is like this. The attrib name is the problem:

from lxml import etree

context = etree.iterparse("file.xml")
for action, elem in context:
if elem.tag == "weblog":
print action, elem.tag , elem.attrib["name"],elem.attrib["url"],

The problem is the print statement. Looks like your terminal encoding (that
Python needs to encode the unicode string to) can't handle these unicode
characters.

I agree. For Japanese, you should know the exactly encoding name, and
convert them, just like:

print text.encoding('encoding')

--
I like python!
UliPad <<The Python Editor>>: http://code.google.com/p/ulipad/
meide <<wxPython UI module>>: http://code.google.com/p/meide/
My Blog: http://www.donews.net/limodou

Oct 23 '07 #4

Similar topics

chinese and arrays

by: Kobi Lurie | last post by:

Hello all, I'm trying to make a simple script beginner level script, with just functions. it uses the functions: file_get_contents substr taking into an array the text substr took then...

PHP

chinese encoded in UTF-8 and XML

by: Knackeback | last post by:

Hi, I wrote a XML file with GNU emacs 21.2.2 and with chinese character content encoded in UTF-8. I wrote something like: <?xml version="1.0" encoding="UTF-8"?> <test> <chinese>¼»</chinese>...

.NET Framework

sax ignoring DTD?

by: Dean A. Hoover | last post by:

I am attempting to parse an extremely simple xml file that has an embedded DTD using java sax2. Here is the xml file: <?xml version="1.0" ?> <!DOCTYPE foo > <foo> <bar/> </foo>

.NET Framework

XML with chinese character problem

by: Agnes | last post by:

In my .net ,i need to generate an xml file , however, user may input a chinese character, Then , the xml will got something unknow characters. the following is my code, Does anyone know how to...

.NET Framework

asp.net chinese encoding

by: pabv | last post by:

Hello all, I am having a few issues with encoding to chinese characters and perhaps someone might be able to assist. At the moment I am only able to see chinese characters when displayed as...

C# / C Sharp

MSXML and UTF-8 chinese characters

by: K | last post by:

I've an XML file in UTF-8. It contains some chinese characters ( both simplified chinese and traditional chinese). In loading the XML file with MSXML parser, I used the below code to retrieve...

.NET Framework

Importing chinese characters

by: st.frey | last post by:

I've got a problem with importing chinese characters into a mysql-table and have read several mailings but didn't find a solution. i have a utf-8 text file that contains chinese characters. the...

MySQL Database

Preserving Chinese Characters when reading and writing text

by: Clive Green | last post by:

Hello peeps, I am using PHP 5.2.2 together with MP3_Id (a PEAR module for reading and writing MP3 tags). I have been using PHP on the command line (Mac OS X Unix shell, to be precise), and am...

PHP

Parsing XML file with Chinese Character

by: csl | last post by:

I have to open and parse an xml file (with some xml elements in Chinese) in C#, VS2003. I can open the file. However, when I read the xml elements, it's showing "?????". Am I doing something...

C# / C Sharp

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing