Memory errors with large zip files

Lorn

Is there a limitation with python's zipfile utility that limits the
size of a file that can be extracted? I'm currently trying to extract
125MB zip files with files that are uncompressed to > 1GB and am
receiving memory errors. Indeed my ram gets maxed during extraction and
then the script quits. Is there a way to spool to disk on the fly, or
is necessary that python opens the entire file before writing? The code
below iterates through a directory of zip files and extracts them
(thanks John!), however for testing I've just been using one file:

zipnames = [x for x in glob.glob('*.zip') if isfile(x)]
for zipname in zipnames:
zf =zipfile.ZipFile (zipname, 'r')
for zfilename in zf.namelist():
newFile = open ( zfilename, "wb")
newFile.write (zf.read (zfilename))
newFile.close()
zf.close()
Any suggestions or comments on how I might be able to work with zip
files of this size would be very helpful.

Best regards,
Lorn

Jul 19 '05 #1

Subscribe Post Reply

5550

Lorn

Ok, I'm not sure if this helps any, but in debugging it a bit I see the
script stalls on:

newFile.write (zf.read (zfilename))

The memory error generated references line 357 of the zipfile.py
program at the point of decompression:

elif zinfo.compress_type == ZIP_DEFLATED:
if not zlib:
raise RuntimeError, \
"De-compression requires the (missing) zlib module"
# zlib compress/decompress code by Jeremy Hylton of CNRI
dc = zlib.decompressobj(-15)
bytes = dc.decompress(bytes) ### <------ right here

Is there anyway to modify how my code is approaching this or perhaps
how the zipfile code is handling it or do I need to just invest in more
RAM? I currently have 512 MB and thought that would be plenty....
perhaps I was wrong :-(. If anyone has any ideas it would truly be very
helpful.

Lorn

Jul 19 '05 #2

Do Re Mi chel La Si Do

Hi
I had make this test (try) :

- create 12 txt's files of 100 MB (exactly 102 400 000 bytes)
- create the file "tst.zip" who contains this 12 files (but the file result
is only 1 095 965 bytes size...)
- delete the 12 txt's files
- try your code

And... it's OK for me.

But : the compressed file is only 1 MB of size ; I had 1 GB of RAM ; I use
windows-XP

Sorry, because :
1) my english is bad
2) I had no found your problem
Michel Claveau

Jul 19 '05 #3

John Machin

On 20 May 2005 18:04:22 -0700, "Lorn" <ef*******@yahoo.com> wrote:

Ok, I'm not sure if this helps any, but in debugging it a bit I see the
script stalls on:

newFile.write (zf.read (zfilename))

The memory error generated references line 357 of the zipfile.py
program at the point of decompression:

elif zinfo.compress_type == ZIP_DEFLATED:
if not zlib:
raise RuntimeError, \
"De-compression requires the (missing) zlib module"
# zlib compress/decompress code by Jeremy Hylton of CNRI
dc = zlib.decompressobj(-15)
bytes = dc.decompress(bytes) ### <------ right here
The basic problem is that the zipfile module is asking the "dc" object
to decompress the whole file at once -- so you would need (at least)
enough memory to hold both the compressed file (C) and the
uncompressed file (U). There is also a possibility that this could
rise to 2U instead of U+C -- read a few lines further on:

bytes = bytes + ex
Is there anyway to modify how my code is approaching this
You're doing the best you can, as far as I can tell.
or perhaps
how the zipfile code is handling it
Read this:
http://docs.python.org/lib/module-zlib.html

If you think you can work out how to modify zipfile.py to feed
dc.decompressobj a chunk of data at a time, properly manipulating
dc.unconsumed_tail, and keeping memory usage to a minimum, then go for
it :-)

Reading the source of the Python zlib module, plus this page from the
zlib website could be helpful, perhaps even necessary:
http://www.gzip.org/zlib/zlib_how.html

See also the following post to this newsgroup:
From: John Goerzen <jgoer...@complete.org>
Newsgroups: comp.lang.python
Subject: Fixes to zipfile.py [PATCH]
Date: Fri, 07 Mar 2003 16:39:25 -0600

.... his patch obviously wasn't accepted :-(

or do I need to just invest in more
RAM? I currently have 512 MB and thought that would be plenty....
perhaps I was wrong :-(.

Before you do anything rash (hacking zipfile.py or buying more
memory), take a step back for a moment:

Is this a one-off exercise or a regular exercise? Does it *really*
need to be done programatically? There will be at least one
command-line unzipper program for your platform . One-off req't: do it
manually.
Regular: Try using the unzipper manually; if all the available
unzippers on your platform die with a memory allocation problem then
you really have a problem. If it works, then instead of using the
zipfile module, use the unzipper program from your Python code via a
subprocess.

HTH,
John

Jul 19 '05 #4

Marcus Lowland

Thank for the detailed reply John! I guess it turned out to be a bit
tougher than I originally thought :-)....

Reading over your links, I think I better not attempt rewriting the
zipfile.py program... a little over my head :-). The best solution,
from everything I read seems to be calling an unzipper program from a
subprocess. I assume you mean using execfile()? I can't think of
another way.

Anyway, thank you very much for your help, it's been very educational.

Best regards,
Lorn

Jul 19 '05 #5

John Machin

On 23 May 2005 09:28:15 -0700, "Marcus Lowland" <mc*******@walla.com>
wrote:

Thank for the detailed reply John! I guess it turned out to be a bit
tougher than I originally thought :-)....

Reading over your links, I think I better not attempt rewriting the
zipfile.py program... a little over my head :-). The best solution,
from everything I read seems to be calling an unzipper program from a
subprocess. I assume you mean using execfile()? I can't think of
another way.

Errrmmmm ... no, execfile runs a Python source file.

Check out the subprocess module:

"""
6.8 subprocess -- Subprocess management

New in version 2.4.

The subprocess module allows you to spawn new processes, connect to
their input/output/error pipes, and obtain their return codes. This
module intends to replace several other, older modules and functions,
such as:

os.system
os.spawn*
os.popen*
popen2.*
commands.*
"""

Jul 19 '05 #6

by: lawrence | last post by:

I'm not sure how this is normally done, on a large site, perhaps one running Phorum. Occassionally a thread will have hundreds of entries, perhaps a meg or two worth of data. You won't necessarily...

PHP

counting matched lines in extremely large files.

by: mikester | last post by:

First off I'll say - I am a bad perl programmer. I want to be better and with your help I'll get there and then be able to contribute more here. That being said, I have a simple problem...

Perl

Designing Data Interface for Very Large Files [more than GB size]

by: shailesh kumar | last post by:

Hi, I need to design data interfaces for accessing files of very large sizes efficiently. The data will be accessed in chunks of fixed size ... My data interface should be able to do a random...

C / C++

SHA1 Hash question with large Files

by: Michael H | last post by:

Hi all, I guess I don't fully understand how a SHA1 hash value is calculated in C# / .NET for a large file... I'm trying to calculate SHA1 values for large files that are much larger than my...

C# / C Sharp

Dealing with large files (random access)

by: Patrick | last post by:

Hi, This post is the 'sequel' ;) of the "Data Oriented vs Object Oriented Design" post, but it can be read and treated apart from that one. I will just quote the beginning of my previous message...

C# / C Sharp

Streaming large files over network

by: A.M-SG | last post by:

Hi, I have a ASP.NET aspx file that needs to pass large images from a network storage to client browser. The requirement is that users cannot have access to the network share. The aspx file...

ASP.NET

can't read large files - help

by: comp.lang.php | last post by:

if (!function_exists('bigfile')) { /** * Works like file() in PHP except that it will work more efficiently with very large files * * @access public * @param mixed $fullFilePath * @return...

PHP

Unable to read large files from zip

by: Kevin Ar18 | last post by:

I posted this on the forum, but nobody seems to know the solution: http://python-forum.org/py/viewtopic.php?t=5230 I have a zip file that is several GB in size, and one of the files inside of it...

Python

safely reading large files

by: byte8bits | last post by:

How does C++ safely open and read very large files? For example, say I have 1GB of physical memory and I open a 4GB file and attempt to read it like so: #include <iostream> #include <fstream>...

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Memory errors with large zip files

Similar topics