By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,626 Members | 1,146 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,626 IT Pros & Developers. It's quick & easy.

BZip2 decompression and parsing XML

P: n/a
Hi.

I'm trying to disassemble bzipped file. If I use minidom.parseString,
I'm getting this error:

Traceback (most recent call last):
File "./replications.py", line 342, in ?

File "/usr/lib64/python2.4/xml/dom/minidom.py", line 1925, in
parseString
return expatbuilder.parseString(string)
File "/usr/lib64/python2.4/xml/dom/expatbuilder.py", line 940, in
parseString
return builder.parseString(string)
File "/usr/lib64/python2.4/xml/dom/expatbuilder.py", line 223, in
parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line
538676, column 17

If I use minidom.parse, I'm getting this error:

Traceback (most recent call last):
File "./replications.py", line 341, in ?
files.xml = minidom.parse(bz2.decompress(dump))
File "/usr/lib64/python2.4/xml/dom/minidom.py", line 1915, in parse
return expatbuilder.parse(file)
File "/usr/lib64/python2.4/xml/dom/expatbuilder.py", line 922, in
parse
fp = open(file, 'rb')
IOError

But XML parsed normally.

Code:

try:
handler = open(args[0], "r")
dump = handler.read()
handler.close()
except IOError, error:
print("Can't open dump: %s" % error)
sys.exit(1)

files.xml = minidom.parse(bz2.decompress(dump))

Jun 27 '08 #1
Share this Question
Share on Google+
1 Reply


P: n/a
phasma wrote:
xml.parsers.expat.ExpatError: not well-formed (invalid token): line
538676, column 17
Looks like your XML file is broken in line 538676.

try:
handler = open(args[0], "r")
This should read

handler = open(args[0], "rb")

Maybe that's your problem.

BTW, since you seem to parse a pretty big chunk of XML there, you should
consider using lxml. It's faster, more memory friendly, more feature-rich and
easier to use than minidom. It can also parse directly from a gzip-ed file or
a file-like object as provided by the bz2 module.

http://codespeak.net/lxml/

Stefan
Jun 27 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.