473,398 Members | 2,212 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

BZip2 decompression and parsing XML

Hi.

I'm trying to disassemble bzipped file. If I use minidom.parseString,
I'm getting this error:

Traceback (most recent call last):
File "./replications.py", line 342, in ?

File "/usr/lib64/python2.4/xml/dom/minidom.py", line 1925, in
parseString
return expatbuilder.parseString(string)
File "/usr/lib64/python2.4/xml/dom/expatbuilder.py", line 940, in
parseString
return builder.parseString(string)
File "/usr/lib64/python2.4/xml/dom/expatbuilder.py", line 223, in
parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line
538676, column 17

If I use minidom.parse, I'm getting this error:

Traceback (most recent call last):
File "./replications.py", line 341, in ?
files.xml = minidom.parse(bz2.decompress(dump))
File "/usr/lib64/python2.4/xml/dom/minidom.py", line 1915, in parse
return expatbuilder.parse(file)
File "/usr/lib64/python2.4/xml/dom/expatbuilder.py", line 922, in
parse
fp = open(file, 'rb')
IOError

But XML parsed normally.

Code:

try:
handler = open(args[0], "r")
dump = handler.read()
handler.close()
except IOError, error:
print("Can't open dump: %s" % error)
sys.exit(1)

files.xml = minidom.parse(bz2.decompress(dump))

Jun 27 '08 #1
1 2678
phasma wrote:
xml.parsers.expat.ExpatError: not well-formed (invalid token): line
538676, column 17
Looks like your XML file is broken in line 538676.

try:
handler = open(args[0], "r")
This should read

handler = open(args[0], "rb")

Maybe that's your problem.

BTW, since you seem to parse a pretty big chunk of XML there, you should
consider using lxml. It's faster, more memory friendly, more feature-rich and
easier to use than minidom. It can also parse directly from a gzip-ed file or
a file-like object as provided by the bz2 module.

http://codespeak.net/lxml/

Stefan
Jun 27 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: itzel | last post by:
Hello!! In using tarfile to group thousands of small files from a directory and then compress it. I already compress a group of files in my pc, but I need do it in a server and I'm testing the...
3
by: Frederik Vanderhaegen | last post by:
Hey, I've written a http sniffer that monitors the traffic on port 80. This application works like it should be but there is one small problem. Suppose I run the application and I surf to...
7
by: giuseppe.cannella | last post by:
i need to read a row on a bzip2 file like the fgets function does on a text file can you help me to find this function thank
232
by: robert maas, see http://tinyurl.com/uh3t | last post by:
I'm working on examples of programming in several languages, all (except PHP) running under CGI so that I can show both the source files and the actually running of the examples online. The first...
3
by: Sayudh27 | last post by:
Hi all, I need to decompress data packets that are being received over the network.The packets contain live data and I need to use the LZO algorithm for decompression.However,I am not sure...
4
by: =?Utf-8?B?Sm9uIEphY29icw==?= | last post by:
The compression part of this code works fine for me, but the decompression part of this code does not. The output file from that is the same as the compressed file. What is wrong with my code?...
0
by: Geky | last post by:
I'm trying to use BZip2 library but I receive always an access violation error. I use two different codes: bz_stream bzstream; ZeroMemory(&bzstream, sizeof(bzstream)); ...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
6
by: Nilesh | last post by:
I have a query about Bzip2 in PHP. Let's say that i have a paragraph like: abcdbc nnotp iidhdmmc; uuidpssa -------End------- Now, i compress it using bzcompress() and store it in a mysql...
3
by: Stephany Young | last post by:
Does anyone have the source code for a C# implementation of the AR2000 Compression/Decompression algorithm that they might like to share?
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.