By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,100 Members | 2,495 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,100 IT Pros & Developers. It's quick & easy.

xml sax and unicode utf8 - parser error: mismatched tag

P: 2
Hi,

I'm trying to parse an XML file with Python SAX: xml.sax
If i parse the file with an empty Handler it works perfectly, no error of mismathed tag.
But if i try to print the characters of the tags i get the error:
xml.sax._exceptions.SAXParseException: theFile.xml:2426:6: mismatched tag

I'm working with a file which has latin1 encoding mixed with utf-8. If i change the original file from latin1 to utf-8, i get characters like <AE> and the parser gets mixed up.
So now im working with the original file, but i think that there are some characters that Python is interpreting as tags.
I did add the line:
[PYTHON] # -*- coding: latin-1 -*- [/PYTHON]
in the Python parser code, which one i dont use in the empty Handler.

It's a problem of characters representation, but i dont seem to find the solution.
I'm working in Linux.

Thanks a lot for your time!
Beatriz.
May 2 '07 #1
Share this Question
Share on Google+
3 Replies


bartonc
Expert 5K+
P: 6,596
Hi,

I'm trying to parse an XML file with Python SAX: xml.sax
If i parse the file with an empty Handler it works perfectly, no error of mismathed tag.
But if i try to print the characters of the tags i get the error:
xml.sax._exceptions.SAXParseException: theFile.xml:2426:6: mismatched tag

I'm working with a file which has latin1 encoding mixed with utf-8. If i change the original file from latin1 to utf-8, i get characters like <AE> and the parser gets mixed up.
So now im working with the original file, but i think that there are some characters that Python is interpreting as tags.
I did add the line:
[PYTHON] # -*- coding: latin-1 -*- [/PYTHON]
in the Python parser code, which one i dont use in the empty Handler.

It's a problem of characters representation, but i dont seem to find the solution.
I'm working in Linux.

Thanks a lot for your time!
Beatriz.
Have you tried adding the
Expand|Select|Wrap|Line Numbers
  1. # -*- coding: latin-1 -*- 
to the file being parsed? That's the only thing that I can think of.
May 2 '07 #2

P: 2
Have you tried adding the
Expand|Select|Wrap|Line Numbers
  1. # -*- coding: latin-1 -*- 
to the file being parsed? That's the only thing that I can think of.
Yes, i already tried that.
Thanks anyways.
May 2 '07 #3

bartonc
Expert 5K+
P: 6,596
Yes, i already tried that.
Thanks anyways.
By the way, UTF-8 in not unicode. If there are unicode characters in the file, this may be your trouble.
May 2 '07 #4

Post your reply

Sign in to post your reply or Sign up for a free account.