By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,957 Members | 2,017 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,957 IT Pros & Developers. It's quick & easy.

Problem reading with bz2.BZ2File(). Bug?

P: n/a
When comparing two files which should be equal the last line is
different:

The first file is a bzip2 compressed file and is read with
bz2.BZ2File()
The second file is the same file uncompressed and read with open()

The first file named file.txt.bz2 is uncompressed with:

$ bunzip2 -k file.txt.bz2

To compare I use this script:
###############################
import bz2

f1 = bz2.BZ2File(r'file.txt.bz2', 'r')
f2 = open(r'file.txt', 'r')
lines = 0
while True:
line1 = f1.readline()
line2 = f2.readline()
if line1 == '':
break
lines += 1
if line1 != line2:
print 'line number:', lines
print repr(line1)
print repr(line2)
f1.close()
f2.close()
##############################

The offending file is 5.5 MB. Sorry, i could not reproduce this problem
with a smaller file.
http://fahstats.com/img/file.txt.bz2

Regards, Clodoaldo Pinto Neto

Nov 15 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Clodoaldo Pinto Neto wrote:
The offending file is 5.5 MB. Sorry, i could not reproduce this problem
with a smaller file.
but surely you can post the repr() of the last two lines?

</F>

Nov 15 '06 #2

P: n/a

Fredrik Lundh wrote:
Clodoaldo Pinto Neto wrote:
The offending file is 5.5 MB. Sorry, i could not reproduce this problem
with a smaller file.

but surely you can post the repr() of the last two lines?
This is the output:

$ python bzp.py
line number: 588317
'\x07'
''

Clodoaldo

Nov 15 '06 #3

P: n/a
Clodoaldo Pinto Neto wrote:
Fredrik Lundh wrote:
Clodoaldo Pinto Neto wrote:
The offending file is 5.5 MB. Sorry, i could not reproduce this problem
with a smaller file.
but surely you can post the repr() of the last two lines?

This is the output:

$ python bzp.py
line number: 588317
'\x07'
''
Confirmed on windows with 2.4 and 2.5:

C:\p>\Python24\python.exe bzp.py
line number: 588317
'\x1e'
''

C:\p>\Python25\python.exe bzp.py
line number: 588317
'\x1e'
''

Looks like one byte of garbage is appended at the end of file. Please
file a bug report. As a workaround "rU" mode seems to work fine for
this file.

-- Leo

Nov 15 '06 #4

P: n/a
Leo Kislov wrote:
Confirmed on windows with 2.4 and 2.5:

C:\p>\Python24\python.exe bzp.py
line number: 588317
'\x1e'
''

C:\p>\Python25\python.exe bzp.py
line number: 588317
'\x1e'
''

Looks like one byte of garbage is appended at the end of file. Please
file a bug report.
Bug number 1597011

Clodoaldo

Nov 15 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.