467,885 Members | 1,300 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 467,885 developers. It's quick & easy.

Memory errors with large zip files

Is there a limitation with python's zipfile utility that limits the
size of a file that can be extracted? I'm currently trying to extract
125MB zip files with files that are uncompressed to > 1GB and am
receiving memory errors. Indeed my ram gets maxed during extraction and
then the script quits. Is there a way to spool to disk on the fly, or
is necessary that python opens the entire file before writing? The code
below iterates through a directory of zip files and extracts them
(thanks John!), however for testing I've just been using one file:

zipnames = [x for x in glob.glob('*.zip') if isfile(x)]
for zipname in zipnames:
zf =zipfile.ZipFile (zipname, 'r')
for zfilename in zf.namelist():
newFile = open ( zfilename, "wb")
newFile.write (zf.read (zfilename))
newFile.close()
zf.close()
Any suggestions or comments on how I might be able to work with zip
files of this size would be very helpful.

Best regards,
Lorn

Jul 19 '05 #1
  • viewed: 4504
Share:
5 Replies
Ok, I'm not sure if this helps any, but in debugging it a bit I see the
script stalls on:

newFile.write (zf.read (zfilename))

The memory error generated references line 357 of the zipfile.py
program at the point of decompression:

elif zinfo.compress_type == ZIP_DEFLATED:
if not zlib:
raise RuntimeError, \
"De-compression requires the (missing) zlib module"
# zlib compress/decompress code by Jeremy Hylton of CNRI
dc = zlib.decompressobj(-15)
bytes = dc.decompress(bytes) ### <------ right here

Is there anyway to modify how my code is approaching this or perhaps
how the zipfile code is handling it or do I need to just invest in more
RAM? I currently have 512 MB and thought that would be plenty....
perhaps I was wrong :-(. If anyone has any ideas it would truly be very
helpful.

Lorn

Jul 19 '05 #2
Hi
I had make this test (try) :

- create 12 txt's files of 100 MB (exactly 102 400 000 bytes)
- create the file "tst.zip" who contains this 12 files (but the file result
is only 1 095 965 bytes size...)
- delete the 12 txt's files
- try your code

And... it's OK for me.

But : the compressed file is only 1 MB of size ; I had 1 GB of RAM ; I use
windows-XP

Sorry, because :
1) my english is bad
2) I had no found your problem
Michel Claveau


Jul 19 '05 #3
On 20 May 2005 18:04:22 -0700, "Lorn" <ef*******@yahoo.com> wrote:
Ok, I'm not sure if this helps any, but in debugging it a bit I see the
script stalls on:

newFile.write (zf.read (zfilename))

The memory error generated references line 357 of the zipfile.py
program at the point of decompression:

elif zinfo.compress_type == ZIP_DEFLATED:
if not zlib:
raise RuntimeError, \
"De-compression requires the (missing) zlib module"
# zlib compress/decompress code by Jeremy Hylton of CNRI
dc = zlib.decompressobj(-15)
bytes = dc.decompress(bytes) ### <------ right here
The basic problem is that the zipfile module is asking the "dc" object
to decompress the whole file at once -- so you would need (at least)
enough memory to hold both the compressed file (C) and the
uncompressed file (U). There is also a possibility that this could
rise to 2U instead of U+C -- read a few lines further on:

bytes = bytes + ex
Is there anyway to modify how my code is approaching this
You're doing the best you can, as far as I can tell.
or perhaps
how the zipfile code is handling it
Read this:
http://docs.python.org/lib/module-zlib.html

If you think you can work out how to modify zipfile.py to feed
dc.decompressobj a chunk of data at a time, properly manipulating
dc.unconsumed_tail, and keeping memory usage to a minimum, then go for
it :-)

Reading the source of the Python zlib module, plus this page from the
zlib website could be helpful, perhaps even necessary:
http://www.gzip.org/zlib/zlib_how.html

See also the following post to this newsgroup:
From: John Goerzen <jgoer...@complete.org>
Newsgroups: comp.lang.python
Subject: Fixes to zipfile.py [PATCH]
Date: Fri, 07 Mar 2003 16:39:25 -0600

.... his patch obviously wasn't accepted :-(

or do I need to just invest in more
RAM? I currently have 512 MB and thought that would be plenty....
perhaps I was wrong :-(.


Before you do anything rash (hacking zipfile.py or buying more
memory), take a step back for a moment:

Is this a one-off exercise or a regular exercise? Does it *really*
need to be done programatically? There will be at least one
command-line unzipper program for your platform . One-off req't: do it
manually.
Regular: Try using the unzipper manually; if all the available
unzippers on your platform die with a memory allocation problem then
you really have a problem. If it works, then instead of using the
zipfile module, use the unzipper program from your Python code via a
subprocess.

HTH,
John
Jul 19 '05 #4
Thank for the detailed reply John! I guess it turned out to be a bit
tougher than I originally thought :-)....

Reading over your links, I think I better not attempt rewriting the
zipfile.py program... a little over my head :-). The best solution,
from everything I read seems to be calling an unzipper program from a
subprocess. I assume you mean using execfile()? I can't think of
another way.

Anyway, thank you very much for your help, it's been very educational.

Best regards,
Lorn

Jul 19 '05 #5
On 23 May 2005 09:28:15 -0700, "Marcus Lowland" <mc*******@walla.com>
wrote:
Thank for the detailed reply John! I guess it turned out to be a bit
tougher than I originally thought :-)....

Reading over your links, I think I better not attempt rewriting the
zipfile.py program... a little over my head :-). The best solution,
from everything I read seems to be calling an unzipper program from a
subprocess. I assume you mean using execfile()? I can't think of
another way.


Errrmmmm ... no, execfile runs a Python source file.

Check out the subprocess module:

"""
6.8 subprocess -- Subprocess management

New in version 2.4.

The subprocess module allows you to spawn new processes, connect to
their input/output/error pipes, and obtain their return codes. This
module intends to replace several other, older modules and functions,
such as:

os.system
os.spawn*
os.popen*
popen2.*
commands.*
"""
Jul 19 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

31 posts views Thread by lawrence | last post: by
7 posts views Thread by mikester | last post: by
5 posts views Thread by Michael H | last post: by
1 post views Thread by Patrick | last post: by
3 posts views Thread by A.M-SG | last post: by
6 posts views Thread by comp.lang.php | last post: by
2 posts views Thread by Kevin Ar18 | last post: by
17 posts views Thread by byte8bits | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.