473,287 Members | 1,582 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,287 software developers and data experts.

BZip2 decompression and parsing XML

Hi.

I'm trying to disassemble bzipped file. If I use minidom.parseString,
I'm getting this error:

Traceback (most recent call last):
File "./replications.py", line 342, in ?

File "/usr/lib64/python2.4/xml/dom/minidom.py", line 1925, in
parseString
return expatbuilder.parseString(string)
File "/usr/lib64/python2.4/xml/dom/expatbuilder.py", line 940, in
parseString
return builder.parseString(string)
File "/usr/lib64/python2.4/xml/dom/expatbuilder.py", line 223, in
parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line
538676, column 17

If I use minidom.parse, I'm getting this error:

Traceback (most recent call last):
File "./replications.py", line 341, in ?
files.xml = minidom.parse(bz2.decompress(dump))
File "/usr/lib64/python2.4/xml/dom/minidom.py", line 1915, in parse
return expatbuilder.parse(file)
File "/usr/lib64/python2.4/xml/dom/expatbuilder.py", line 922, in
parse
fp = open(file, 'rb')
IOError

But XML parsed normally.

Code:

try:
handler = open(args[0], "r")
dump = handler.read()
handler.close()
except IOError, error:
print("Can't open dump: %s" % error)
sys.exit(1)

files.xml = minidom.parse(bz2.decompress(dump))

Jun 27 '08 #1
1 2667
phasma wrote:
xml.parsers.expat.ExpatError: not well-formed (invalid token): line
538676, column 17
Looks like your XML file is broken in line 538676.

try:
handler = open(args[0], "r")
This should read

handler = open(args[0], "rb")

Maybe that's your problem.

BTW, since you seem to parse a pretty big chunk of XML there, you should
consider using lxml. It's faster, more memory friendly, more feature-rich and
easier to use than minidom. It can also parse directly from a gzip-ed file or
a file-like object as provided by the bz2 module.

http://codespeak.net/lxml/

Stefan
Jun 27 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: itzel | last post by:
Hello!! In using tarfile to group thousands of small files from a directory and then compress it. I already compress a group of files in my pc, but I need do it in a server and I'm testing the...
3
by: Frederik Vanderhaegen | last post by:
Hey, I've written a http sniffer that monitors the traffic on port 80. This application works like it should be but there is one small problem. Suppose I run the application and I surf to...
7
by: giuseppe.cannella | last post by:
i need to read a row on a bzip2 file like the fgets function does on a text file can you help me to find this function thank
232
by: robert maas, see http://tinyurl.com/uh3t | last post by:
I'm working on examples of programming in several languages, all (except PHP) running under CGI so that I can show both the source files and the actually running of the examples online. The first...
3
by: Sayudh27 | last post by:
Hi all, I need to decompress data packets that are being received over the network.The packets contain live data and I need to use the LZO algorithm for decompression.However,I am not sure...
4
by: =?Utf-8?B?Sm9uIEphY29icw==?= | last post by:
The compression part of this code works fine for me, but the decompression part of this code does not. The output file from that is the same as the compressed file. What is wrong with my code?...
0
by: Geky | last post by:
I'm trying to use BZip2 library but I receive always an access violation error. I use two different codes: bz_stream bzstream; ZeroMemory(&bzstream, sizeof(bzstream)); ...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
6
by: Nilesh | last post by:
I have a query about Bzip2 in PHP. Let's say that i have a paragraph like: abcdbc nnotp iidhdmmc; uuidpssa -------End------- Now, i compress it using bzcompress() and store it in a mysql...
3
by: Stephany Young | last post by:
Does anyone have the source code for a C# implementation of the AR2000 Compression/Decompression algorithm that they might like to share?
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: marcoviolo | last post by:
Dear all, I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.