473,666 Members | 2,038 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

xml sax and unicode utf8 - parser error: mismatched tag

2 New Member
Hi,

I'm trying to parse an XML file with Python SAX: xml.sax
If i parse the file with an empty Handler it works perfectly, no error of mismathed tag.
But if i try to print the characters of the tags i get the error:
xml.sax._except ions.SAXParseEx ception: theFile.xml:242 6:6: mismatched tag

I'm working with a file which has latin1 encoding mixed with utf-8. If i change the original file from latin1 to utf-8, i get characters like <AE> and the parser gets mixed up.
So now im working with the original file, but i think that there are some characters that Python is interpreting as tags.
I did add the line:
[PYTHON] # -*- coding: latin-1 -*- [/PYTHON]
in the Python parser code, which one i dont use in the empty Handler.

It's a problem of characters representation, but i dont seem to find the solution.
I'm working in Linux.

Thanks a lot for your time!
Beatriz.
May 2 '07 #1
3 5688
bartonc
6,596 Recognized Expert Expert
Hi,

I'm trying to parse an XML file with Python SAX: xml.sax
If i parse the file with an empty Handler it works perfectly, no error of mismathed tag.
But if i try to print the characters of the tags i get the error:
xml.sax._except ions.SAXParseEx ception: theFile.xml:242 6:6: mismatched tag

I'm working with a file which has latin1 encoding mixed with utf-8. If i change the original file from latin1 to utf-8, i get characters like <AE> and the parser gets mixed up.
So now im working with the original file, but i think that there are some characters that Python is interpreting as tags.
I did add the line:
[PYTHON] # -*- coding: latin-1 -*- [/PYTHON]
in the Python parser code, which one i dont use in the empty Handler.

It's a problem of characters representation, but i dont seem to find the solution.
I'm working in Linux.

Thanks a lot for your time!
Beatriz.
Have you tried adding the
Expand|Select|Wrap|Line Numbers
  1. # -*- coding: latin-1 -*- 
to the file being parsed? That's the only thing that I can think of.
May 2 '07 #2
brevello
2 New Member
Have you tried adding the
Expand|Select|Wrap|Line Numbers
  1. # -*- coding: latin-1 -*- 
to the file being parsed? That's the only thing that I can think of.
Yes, i already tried that.
Thanks anyways.
May 2 '07 #3
bartonc
6,596 Recognized Expert Expert
Yes, i already tried that.
Thanks anyways.
By the way, UTF-8 in not unicode. If there are unicode characters in the file, this may be your trouble.
May 2 '07 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

1
1617
by: Todd Jenista | last post by:
I have a parser I am building with python and, unfortunately, people have decided to put unicode characters in the files I am parsing. The parser seems to have a fit when I search for one \uXXXX symbol, and there is another unicode symbol in the file. In this case, a search and replace for © with a µ in the file causes the infamous ordinal error. My quick-fix, because they have good context, is to change them both to "UTF8", and then...
2
2681
by: Kevin Dangoor | last post by:
This is a followup to a blog post I wrote the other day http://www.blueskyonmars.com/archives/2005/01/31/using_unicode_with_elementtidy.html I started out working in the context of elementtidy, but now I am running into trouble in general Python-XML areas, so I thought I'd toss the question out here. The code below is fairly self-explanatory. I have a small HTML snippet that is UTF-8 encoded and is not 7-bit ASCII compatible. I use Tidy...
11
21716
by: Jürgen Kahrs | last post by:
Hello, do you think that this file is a proper Unicode file? http://belnet.dl.sourceforge.net/sourceforge/ganttproject/ganttproject-example3.xml <?xml version="1.0" encoding="UTF-8"?> ... <resource id="1" name="Andreas Plüschke" function="10" contacts=""/>
1
4844
by: jrs_14618 | last post by:
Hello All, This post is essentially a reply a previous post/thread here on this mailing.database.myodbc group titled: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode I was wondering if anybody has experienced the same issues
7
3507
by: aine_canby | last post by:
Hi, Im totally new to Python so please bare with me. Data is entered into my program using the folling code - str = raw_input(command) words = str.split() for word in words:
11
9497
by: George Sakkis | last post by:
The following snippet results in different outcome for (at least) the last three major releases: # Python 2.3.4 u'%94' # Python 2.4.2 UnicodeDecodeError: 'ascii' codec can't decode byte 0x94 in position 0: ordinal not in range(128)
9
15728
by: thijs.braem | last post by:
Hi everyone, I'm having quite some troubles trying to convert Unicode to String (for use in psycopg, which apparently doesn't know how to cope with unicode strings). The error I keep having is something like this: ERREUR: Séquence d'octets invalide pour le codage «UTF8» : 0xe02063 (sorry, locale is french, it means "byte sequence invalid for encoding
8
5701
by: Simon Willison | last post by:
Hello, I'm using ElementTree to parse an XML file which includes some data encoded as cp1252, for example: <name>Bob\x92s Breakfast</name> If this was a regular bytestring, I would convert it to utf8 using the following:
1
3854
by: Mudcat | last post by:
In short what I'm trying to do is read a document using an xml parser and then upload that data back into a database. I've got the code more or less completed using xml.etree.ElementTree for the parser and dbi/ odbc for my db connection. To fix problems with unicode I built a work-around by mapping unicode characters to equivalent ascii characters and then encoding everything to ascii. That allowed me to build the application and debug...
0
8444
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8356
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
8551
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8639
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7386
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6198
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5664
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
2
2011
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1775
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.