Is there a limitation with python's zipfile utility that limits the
size of a file that can be extracted? I'm currently trying to extract
125MB zip files with files that are uncompressed to > 1GB and am
receiving memory errors. Indeed my ram gets maxed during extraction and
then the script quits. Is there a way to spool to disk on the fly, or
is necessary that python opens the entire file before writing? The code
below iterates through a directory of zip files and extracts them
(thanks John!), however for testing I've just been using one file:
zipnames = [x for x in glob.glob('*.zip') if isfile(x)]
for zipname in zipnames:
zf =zipfile.ZipFile (zipname, 'r')
for zfilename in zf.namelist():
newFile = open ( zfilename, "wb")
newFile.write (zf.read (zfilename))
newFile.close()
zf.close()
Any suggestions or comments on how I might be able to work with zip
files of this size would be very helpful.
Best regards,
Lorn 5 5550
Ok, I'm not sure if this helps any, but in debugging it a bit I see the
script stalls on:
newFile.write (zf.read (zfilename))
The memory error generated references line 357 of the zipfile.py
program at the point of decompression:
elif zinfo.compress_type == ZIP_DEFLATED:
if not zlib:
raise RuntimeError, \
"De-compression requires the (missing) zlib module"
# zlib compress/decompress code by Jeremy Hylton of CNRI
dc = zlib.decompressobj(-15)
bytes = dc.decompress(bytes) ### <------ right here
Is there anyway to modify how my code is approaching this or perhaps
how the zipfile code is handling it or do I need to just invest in more
RAM? I currently have 512 MB and thought that would be plenty....
perhaps I was wrong :-(. If anyone has any ideas it would truly be very
helpful.
Lorn
Hi
I had make this test (try) :
- create 12 txt's files of 100 MB (exactly 102 400 000 bytes)
- create the file "tst.zip" who contains this 12 files (but the file result
is only 1 095 965 bytes size...)
- delete the 12 txt's files
- try your code
And... it's OK for me.
But : the compressed file is only 1 MB of size ; I had 1 GB of RAM ; I use
windows-XP
Sorry, because :
1) my english is bad
2) I had no found your problem
Michel Claveau
On 20 May 2005 18:04:22 -0700, "Lorn" <ef*******@yahoo.com> wrote: Ok, I'm not sure if this helps any, but in debugging it a bit I see the script stalls on:
newFile.write (zf.read (zfilename))
The memory error generated references line 357 of the zipfile.py program at the point of decompression:
elif zinfo.compress_type == ZIP_DEFLATED: if not zlib: raise RuntimeError, \ "De-compression requires the (missing) zlib module" # zlib compress/decompress code by Jeremy Hylton of CNRI dc = zlib.decompressobj(-15) bytes = dc.decompress(bytes) ### <------ right here
The basic problem is that the zipfile module is asking the "dc" object
to decompress the whole file at once -- so you would need (at least)
enough memory to hold both the compressed file (C) and the
uncompressed file (U). There is also a possibility that this could
rise to 2U instead of U+C -- read a few lines further on:
bytes = bytes + ex
Is there anyway to modify how my code is approaching this
You're doing the best you can, as far as I can tell.
or perhaps how the zipfile code is handling it
Read this: http://docs.python.org/lib/module-zlib.html
If you think you can work out how to modify zipfile.py to feed
dc.decompressobj a chunk of data at a time, properly manipulating
dc.unconsumed_tail, and keeping memory usage to a minimum, then go for
it :-)
Reading the source of the Python zlib module, plus this page from the
zlib website could be helpful, perhaps even necessary: http://www.gzip.org/zlib/zlib_how.html
See also the following post to this newsgroup:
From: John Goerzen <jgoer...@complete.org>
Newsgroups: comp.lang.python
Subject: Fixes to zipfile.py [PATCH]
Date: Fri, 07 Mar 2003 16:39:25 -0600
.... his patch obviously wasn't accepted :-(
or do I need to just invest in more RAM? I currently have 512 MB and thought that would be plenty.... perhaps I was wrong :-(.
Before you do anything rash (hacking zipfile.py or buying more
memory), take a step back for a moment:
Is this a one-off exercise or a regular exercise? Does it *really*
need to be done programatically? There will be at least one
command-line unzipper program for your platform . One-off req't: do it
manually.
Regular: Try using the unzipper manually; if all the available
unzippers on your platform die with a memory allocation problem then
you really have a problem. If it works, then instead of using the
zipfile module, use the unzipper program from your Python code via a
subprocess.
HTH,
John
Thank for the detailed reply John! I guess it turned out to be a bit
tougher than I originally thought :-)....
Reading over your links, I think I better not attempt rewriting the
zipfile.py program... a little over my head :-). The best solution,
from everything I read seems to be calling an unzipper program from a
subprocess. I assume you mean using execfile()? I can't think of
another way.
Anyway, thank you very much for your help, it's been very educational.
Best regards,
Lorn
On 23 May 2005 09:28:15 -0700, "Marcus Lowland" <mc*******@walla.com>
wrote: Thank for the detailed reply John! I guess it turned out to be a bit tougher than I originally thought :-)....
Reading over your links, I think I better not attempt rewriting the zipfile.py program... a little over my head :-). The best solution, from everything I read seems to be calling an unzipper program from a subprocess. I assume you mean using execfile()? I can't think of another way.
Errrmmmm ... no, execfile runs a Python source file.
Check out the subprocess module:
"""
6.8 subprocess -- Subprocess management
New in version 2.4.
The subprocess module allows you to spawn new processes, connect to
their input/output/error pipes, and obtain their return codes. This
module intends to replace several other, older modules and functions,
such as:
os.system
os.spawn*
os.popen*
popen2.*
commands.*
""" This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: lawrence |
last post by:
I'm not sure how this is normally done, on a large site, perhaps one
running Phorum. Occassionally a thread will have hundreds of entries,
perhaps a meg or two worth of data. You won't necessarily...
|
by: mikester |
last post by:
First off I'll say - I am a bad perl programmer.
I want to be better and with your help I'll get there and then be able
to contribute more here.
That being said, I have a simple problem...
|
by: shailesh kumar |
last post by:
Hi,
I need to design data interfaces for accessing files of very large
sizes efficiently. The data will be accessed in chunks of fixed size
... My data interface should be able to do a
random...
|
by: Michael H |
last post by:
Hi all,
I guess I don't fully understand how a SHA1 hash value is calculated
in C# / .NET for a large file... I'm trying to calculate SHA1 values
for large files that are much larger than my...
|
by: Patrick |
last post by:
Hi,
This post is the 'sequel' ;) of the "Data Oriented vs Object Oriented
Design" post, but it can be read and treated apart from that one. I
will just quote the beginning of my previous message...
|
by: A.M-SG |
last post by:
Hi,
I have a ASP.NET aspx file that needs to pass large images from a network
storage to client browser. The requirement is that users cannot have access
to the network share. The aspx file...
|
by: comp.lang.php |
last post by:
if (!function_exists('bigfile')) {
/**
* Works like file() in PHP except that it will work more efficiently
with very large files
*
* @access public
* @param mixed $fullFilePath
* @return...
|
by: Kevin Ar18 |
last post by:
I posted this on the forum, but nobody seems to know the solution: http://python-forum.org/py/viewtopic.php?t=5230
I have a zip file that is several GB in size, and one of the files inside of it...
|
by: byte8bits |
last post by:
How does C++ safely open and read very large files? For example, say I
have 1GB of physical memory and I open a 4GB file and attempt to read
it like so:
#include <iostream>
#include <fstream>...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |