memory error with zipfile module

Hari Sekhon

I do

import zipfile
zip=zipfile.ZipFile('d:\somepath\cdimage.zip')
zip.namelist()
['someimage.iso']

then either of the two:

A) file('someimage.iso','w').write(zip.read('someimag e.iso'))
or
B) content=zip.read('someimage.iso')

but both result in the same error:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "D:\u\Python24\lib\zipfile.py", line 357, in read
bytes = dc.decompress(bytes)
MemoryError

I thought python was supposed to handle memory for you?

The python zipfile module is obviously broken...
Any advise?

May 19 '06 #1

Subscribe Post Reply

7553

bruno at modulix

Hari Sekhon wrote:

I do

import zipfile
zip=zipfile.ZipFile('d:\somepath\cdimage.zip')
zip.namelist()
['someimage.iso']

then either of the two:

A) file('someimage.iso','w').write(zip.read('someimag e.iso'))
or
B) content=zip.read('someimage.iso')

but both result in the same error:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "D:\u\Python24\lib\zipfile.py", line 357, in read
bytes = dc.decompress(bytes)
MemoryError
<ot>Is that the *full* traceback ?</ot>
I thought python was supposed to handle memory for you?
Err... This doesn't mean that it bypasses system's memory management.

http://pyref.infogami.com/MemoryError

http://mail.zope.org/pipermail/zope/...er/153882.html
"""
MemoryError is raised by Python when an underlying (OS-level) allocation
fails.
(...)
Normally this would mean that you were out of even virtual memory
(swap), but it could also be a symptom of a libc bug, a bad RAM chip, etc.
"""

What do you think will append if you try to allocate a huge block when
you've already eaten all available memory ? Do you really hope that
Python will give you extra ram for free ?-)

Please try this code:

import zipfile
zip=zipfile.ZipFile('d:\somepath\cdimage.zip')
info = zip.getinfo('someimage.iso')
csize = info.compress_size
fsize = info.file_size
print "someimage compressed size is : %s" % csize
print "someimage real file size is : %s" % fsize
print """
So, knowing how zipfile.read() is actually implemented,
total needed ram is : %s
""" % (csize + fsize)

print "Well... Do I have that much memory available ???"

The python zipfile module is obviously broken...
s/is obviously broken/could be improved to handle huge files/

Making such statements may not be the best way to make friends...
Any advise?

Yes : Python is free software ('free' as in 'free speach' *and* as in
'free beer'), mostly written by benevolent contributors. So try and
improve the zipfile module by yourself, and submit your enhancements.
Then we all will be very grateful, and your name will be forever in the
Python Hall of Fame.

Or choose to behave as a whiny-whiny clueless luser making dumb
statements, and your name will be buried for a long long time in a lot
of killfiles.

It's up to you !-)

NB : If you go the first route, this may help:
http://www.python.org/doc/2.4.2/lib/module-zlib.html
with particular attention to the decompressobj.

HTH
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"

May 19 '06 #2

jordan.taylor2

Take a look at the pywin32 extension, which I believe has some lower
level memory allocation and file capabilities that might help you in
this situation. If I'm completely wrong, someone please tell me XD.
Of course, you could just make the read() a step process, reading, O
lets say 8192 bytes at a time (could be bigger if u want), writes them
to the new file, and then reads the next portion. This will be slower
(not sure how much) than if you had some AMD X2 64 with 3 gigs of ram
and could just read the file all at once, but it should work.

May 19 '06 #3

bruno at modulix

jo************@gmail.com wrote:

Take a look at the pywin32 extension, which I believe has some lower
level memory allocation and file capabilities that might help you in
this situation.

But then the solution would not be portable, which would be a shame
since the zlib module (on which ZipFile relies for compression /
decompression) already has everything needed to handle streams.

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"

May 19 '06 #4

Sion Arrowsmith

Hari Sekhon <se*********@googlemail.com> wrote:

import zipfile
zip=zipfile.ZipFile('d:\somepath\cdimage.zip')
zip.namelist()
['someimage.iso'] [ ... ]B) content=zip.read('someimage.iso')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "D:\u\Python24\lib\zipfile.py", line 357, in read
bytes = dc.decompress(bytes)
MemoryError

I thought python was supposed to handle memory for you?
Yes, but it can't handle more memory than the operating system
is prepared to give it. How big is cdimage.zip? How big is the
uncompressed someimage.iso? How much memory do you have?
The python zipfile module is obviously broken...

This isn't at all obvious to me.

--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump

May 19 '06 #5

Sion Arrowsmith

bruno at modulix <on***@xiludom.gro> wrote:

http://mail.zope.org/pipermail/zope/...er/153882.html
"""
MemoryError is raised by Python when an underlying (OS-level) allocation
fails.
(...)
Normally this would mean that you were out of even virtual memory
(swap), but it could also be a symptom of a libc bug, a bad RAM chip, etc.
"""

There's another possibility, which I ran into recently. Which is a
problem with physical+virtual memory exceding the space addressable by
a process. So I've got 2G physical + 4G swap and I'm getting just such
a memory error -- I'm sure the compressed + uncompressed data isn't
going to eat all of that. But on a 32bit OS, it doesn't need to of
course. 2G is quite enough to cause problems....

--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump

May 19 '06 #6

bruno at modulix

Sion Arrowsmith wrote:

Hari Sekhon <se*********@googlemail.com> wrote:

(snip)

The python zipfile module is obviously broken...

This isn't at all obvious to me.

zipfile.read() does not seem to take full advantage of zlib's
decompressobj's features. This could perhaps be improved (left as an
exercice to the OP, who is obviously very good at detecting broken
memory management <g>).

Also, there's a known bug with file headers beginning past 2+GB - which
is not a very common case...
http://sourceforge.net/tracker/index...70&atid=105470

So yes, there is actually something broken - but this has nothing to do
with the OP problem - *and* there are actually some limitions (FWIW, the
main goal of zipfile was mostly to implement support for zipped python
packages, not to replace Winzip). But what, the OP is going to fix this,
isn't he ?-)
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"

May 19 '06 #7

Roger Miller

The basic problem is that the zipfile interface only reads and writes
whole files, so it may perform poorly or fail on huge files. At one
time I implemented a patch to allow reading files in chunks. However I
believe that the current interface has too many problems to solve by
incremental patching, and that a zipfile2 module is probably warranted.
(Is anyone working on this?)

In the meantime I think the best solution is often to just run an
external zip/unzip utility to do the heavy lifting.

May 19 '06 #8

Bruno Desthuilliers

Roger Miller a écrit :

The basic problem is that the zipfile interface only reads and writes
whole files, so it may perform poorly or fail on huge files. At one
time I implemented a patch to allow reading files in chunks. However I
believe that the current interface has too many problems to solve by
incremental patching,
Yeps, that was the general tone of some thread on python-dev. And from
what I saw of the source code, it may effectively not be the cleanest
part of the stdlib. But what, it does what it was written for at first :
provide working support for zipped packages.
and that a zipfile2 module is probably warranted.
(Is anyone working on this?)
Seems like Bob Ippolito was on the rank, but I guess you'll get better
answers on python-dev.
In the meantime I think the best solution is often to just run an
external zip/unzip utility to do the heavy lifting.

Indeed !-)

But while having zip/unzip installed OOTB on a unix-like system is close
to warrented, it may not be the case on Windows.

May 19 '06 #9

Fredrik Lundh

Hari Sekhon wrote:

Is it me or is having to use os.system() all the time symtomatic of a
deficiency/things which are missing from python as a language?

it's you.

</F>

Jun 21 '06 #10

Fredrik Lundh

Hari Sekhon wrote:

I take it that it's still a work in progress to be able to pythonify
everything, and until then we're just gonna have to rely on shell and
those great C coded coreutils and stuff like that. Ok, I'm rather fond
of Bash+coreutils, highest ratio of code lines to work I've ever
seen.... it's the real strength of Linux. Shame about Windows...

you make very little sense.

</F>

Jun 21 '06 #11

Fredrik Lundh

Hari Sekhon wrote:

I've seen people using everything from zip to touch, either out of
laziness or out of the fact it wouldn't work very well in python, this
zip case is a good example.

so based on a limitation in one library, and some random code you've
seen on the internet, you're making generalizations about the language ?

the zip case is a pretty lousy example, btw; after all, using the
existing API, it's not that hard to implement an *incremental* read
function if the provided read-into-string version isn't sufficient:

import zipfile, zlib

##
# Given a 'zip' instance, copy data from the 'name' to the
# 'out' stream.

def explode(out, zip, name):

zinfo = zip.getinfo(name)

if zinfo.compress_type == zipfile.ZIP_STORED:
decoder = None
elif zinfo.compress_type == zipfile.ZIP_DEFLATED:
decoder = zlib.decompressobj(-zlib.MAX_WBITS)
else:
raise zipfile.BadZipFile("unsupported compression method")

zip.fp.seek(zinfo.file_offset)

size = zinfo.compress_size

while 1:
data = zip.fp.read(min(size, 8192))
if not data:
break
size -= len(data)
if decoder:
data = decoder.decompress(data)
out.write(data)

if decoder:
out.write(decoder.decompress('Z'))
out.write(decoder.flush())

</F>

Jun 21 '06 #12

Similar topics

zipfile module

by: LC | last post by:

Hi, I'm having a problem using the zipfile module in Windows 2000 sp4. When I use it to zip a small file it works fine, but large file doesnt. Here's the error msg i get......

Python

How to fresh or delete a file in azip-archive via zipfile module?

by: Åâãåíèé Êîñåíêî | last post by:

Hi! I need to fresh a zip-archive using zipfile. However, it seems it is no function to fresh the archive. When I use the append-mode of ZipFile, the written file is merely appended to the...

Python

zipfile module: problems with filename having non ascii characters

by: vincent_delft | last post by:

I've a simple python script that read a directory and put the files into a Zip file. I'm using the os.walk method to get the directory content, I'm creating ZipInfo objects and set "filename",...

Python

zlib and zipfile module in Python2.4

by: Alan Toppen | last post by:

I was unable to use the ZipFile class in the zipfile module in Python2.4. I got an error that zlib could not be found. Comparing my Python 2.2 installation I noticed Python 2.4 was missing a...

Python

Zipping files/zipfile module

by: OriginalBrownster | last post by:

This will probably sound like a very dumb question. I am trying to zip some files within a directory. I want to zip all the files within a directory called "temp" and have the zip archive...

Python

Could zipfile module process the zip data in memory?

by: =?utf-8?B?5Lq66KiA6JC95pel5piv5aSp5rav77yM5pyb5p6B | last post by:

I made a C/S network program, the client receive the zip file from the server, and read the data into a variable. how could I process the zipfile directly without saving it into file. In the...

Python

zipfile stupidly broken

by: Martin Maney | last post by:

To quote from zipfile.py (2.4 library): # Search the last END_BLOCK bytes of the file for the record signature. # The comment is appended to the ZIP file and has a 16 bit length. # So the...

Python

Memory error while saving dictionary of size 65000X50 using pickle

by: Nagu | last post by:

I am trying to save a dictionary of size 65000X50 to a local file and I get the memory error problem. How do I go about resolving this? Is there way to partition the pickle object and combine...

Python

Memory error while saving dictionary using pickle

by: Nagu | last post by:

I am trying to save a dictionary of size 65000X50 to a local file and I get the memory error problem. How do I go about resolving this? Is there way to partition the pickle object and combine...

Python

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp