Hi,
is there a way to or at least a reason why I can not use tarfile to
create a gzip or bunzip2 compressed archive in the memory?
You might might wanna answer "use StringIO" but this isn't such easy
as it seems to be. ;) I am using Python 2.5.2, by the way. I think
this is a bug in at least in this version of python, but maybe
StringIO isn't just file-like enough for this "korky" tarfile module.
But this would conflict with its documentation.
"For special purposes, there is a second format for mode: 'filemode|
[compression]'. open() will return a TarFile object that processes its
data as a stream of blocks. No random seeking will be done on the
file. If given, fileobj may be any object that has a read() or write()
method (depending on the mode)."
Sounds good, but doesn't work. ;P StringIO provides a read() and
write() method amongst others. But tarfile has especially in this mode
problems with the StringIO object.
I extracted the code out of my project into a standalone python script
to proof this issue on the lowest level. You can run the script below
as following: ./StringIO-tarfile.py file1 [file2] [...]
#
# File: StringIO-tarfile.py
#
#!/usr/bin/env python
from StringIO import StringIO
import tarfile
import sys
def create_tar_file(filenames, fileobj, mode, result_cb=lambda f:
None):
tar_file = tarfile.open(mode=mode, fileobj=fileobj)
for f in filenames:
tar_file.add(f)
result = result_cb(fileobj)
tar_file.close()
return result
if __name__ == '__main__':
files = sys.argv[1:]
modes = ['w%s%s' % (x, y)for x in (':', '|') for y in ('', 'gz',
'bz2')]
string_io_cb = lambda f: f.getvalue()
for mode in modes:
ext = mode.replace('w|', '-pipe.tar.').replace('w:',
'.tar.').rstrip('.')
# StringIO test.
content = create_tar_file(files, StringIO(), mode, string_io_cb)
fd = open('StringIO%s' % ext, 'w')
fd.write(content)
fd.close()
# file object test.
fd = open('file%s' % ext, 'w')
create_tar_file(files, fd, mode)
As test input, I have used a directory with a single text file. As you
can see below, any tests using plain file objects were successful. But
when using StringIO, I can only create uncompressed tar files. Even
though I don't get any errors when creating them most of the files are
just empty or truncated.
$ for f in `ls *.tar{,.gz,.bz2}`; do echo -n $f; du -h $f | awk
'{print " ("$1"B)"}'; tar -tf $f; echo; done
file-pipe.tar (84KB)
foo/
foo/ksp-fosdem2008.txt
file-pipe.tar.bz2 (20KB)
foo/
foo/ksp-fosdem2008.txt
file-pipe.tar.gz (20KB)
foo/
foo/ksp-fosdem2008.txt
file.tar (84KB)
foo/
foo/ksp-fosdem2008.txt
file.tar.bz2 (20KB)
foo/
foo/ksp-fosdem2008.txt
file.tar.gz (20KB)
foo/
foo/ksp-fosdem2008.txt
StringIO-pipe.tar (76KB)
foo/
foo/ksp-fosdem2008.txt
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
StringIO-pipe.tar.bz2 (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors
StringIO-pipe.tar.gz (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors
StringIO.tar (76KB)
foo/
foo/ksp-fosdem2008.txt
StringIO.tar.bz2 (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors
StringIO.tar.gz (4.0KB)
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error exit delayed from previous errors
Can somebody reproduce this problem? Did I misunderstood the API? What
would be the best work around, if I am right? I am thinking about
using the gzip and bz2 module directly.
Regards
Sebastian Noack 6 5936
En Mon, 26 May 2008 17:44:28 -0300, se*************@googlemail.com
<se*************@googlemail.comescribió:
is there a way to or at least a reason why I can not use tarfile to
create a gzip or bunzip2 compressed archive in the memory?
You might might wanna answer "use StringIO" but this isn't such easy
as it seems to be. ;) I am using Python 2.5.2, by the way. I think
this is a bug in at least in this version of python, but maybe
StringIO isn't just file-like enough for this "korky" tarfile module.
But this would conflict with its documentation.
def create_tar_file(filenames, fileobj, mode, result_cb=lambda f:
None):
tar_file = tarfile.open(mode=mode, fileobj=fileobj)
for f in filenames:
tar_file.add(f)
result = result_cb(fileobj)
tar_file.close()
return result
It's not a bug, you must extract the StringIO contents *after* closing
tar_file, else you won't get the last blocks pending to be written.
--
Gabriel Genellina
On May 27, 2:17 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
It's not a bug, you must extract the StringIO contents *after* closing
tar_file, else you won't get the last blocks pending to be written.
I looked at tarfile's source code last night after I wrote this
message and figured it out. But the problem is that TarFile's close
method closes the underlying file object after the last block is
written and when you close StringIO you can not get its content
anymore. Wtf does it close the underlying file? There is absolute no
reason for doing this. Are you still sure this isn't a bug?
Regards
Sebastian Noack
I have written a FileWrapper class as workaround, which works for me
(see the code below). The FileWrapper object holds an internal file-
like object and maps its attributes, but prevents the user (in this
case tarfile) from closing the internal file, so I can still access
StringIO's content after closing the TarFile object.
But this should not be required to create in memory tar files. It is
definitely a bug, that TarFile closes external file objects passed to
tarfile.open, when closing the TarFile object. The code which opens a
file is also responsible for closing it.
Regards
Sebastian Noack
#
# File: StringIO-tarfile.py
#
#!/usr/bin/env python
from StringIO import StringIO
import tarfile
import sys
class FileWrapper(object):
def __init__(self, fileobj):
self.file = fileobj
self.closed = fileobj.closed
def __getattr__(self, name):
# Raise AttributeError, if it isn't a file attribute.
if name not in dir(file):
raise AttributeError(name)
# Get the attribute of the internal file object.
value = getattr(self.file, name)
# Raise a ValueError, if the attribute is callable (e.g. an instance
# method) and the FileWrapper is closed.
if callable(value) and self.closed:
raise ValueError('I/O operation on closed file')
return value
def close(self):
self.closed = True
def create_tar_file(filenames, fileobj, mode):
tar_file = tarfile.open(mode=mode, fileobj=fileobj)
for f in filenames:
tar_file.add(f)
tar_file.close()
if __name__ == '__main__':
files = sys.argv[1:]
modes = ['w%s%s' % (x, y) for x in (':', '|') for y in ('', 'gz',
'bz2')]
for mode in modes:
ext = mode.replace('w|', '-pipe.tar.').replace('w:',
'.tar.').rstrip('.')
# StringIO test.
stream = FileWrapper(StringIO())
create_tar_file(files, stream, mode)
fd = open('StringIO%s' % ext, 'w')
fd.write(stream.file.getvalue())
stream.file.close()
fd.close()
# file object test.
fd = open('file%s' % ext, 'w')
create_tar_file(files, fd, mode)
On Tue, May 27, 2008 at 01:51:47AM -0700, se*************@googlemail.com wrote:
I have written a FileWrapper class as workaround, which works for me
(see the code below). The FileWrapper object holds an internal file-
like object and maps its attributes, but prevents the user (in this
case tarfile) from closing the internal file, so I can still access
StringIO's content after closing the TarFile object.
But this should not be required to create in memory tar files. It is
definitely a bug, that TarFile closes external file objects passed to
tarfile.open, when closing the TarFile object. The code which opens a
file is also responsible for closing it.
You're right, _BZ2Proxy.close() calls the wrapped file object's close() method
and that is definitely not the desired behaviour. So, if you can do without 'bz2'
modes for now, you're problem is gone, all other modes work fine.
I fixed it (r63744), so the next beta release will work as expected. Your test
script helped a lot, thanks.
Regards,
--
Lars Gustäbel la**@gustaebel.de
A casual stroll through a lunatic asylum shows that
faith does not prove anything.
(Friedrich Nietzsche)
En Tue, 27 May 2008 02:43:53 -0300, se*************@googlemail.com
<se*************@googlemail.comescribió:
On May 27, 2:17 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
>It's not a bug, you must extract the StringIO contents *after* closing tar_file, else you won't get the last blocks pending to be written.
I looked at tarfile's source code last night after I wrote this
message and figured it out. But the problem is that TarFile's close
method closes the underlying file object after the last block is
written and when you close StringIO you can not get its content
anymore. Wtf does it close the underlying file? There is absolute no
reason for doing this. Are you still sure this isn't a bug?
Ouch, sorry, I only tried with gzip (and worked fine), not bz2 (which is
buggy).
--
Gabriel Genellina
That is right, only bz2 is affected. I am happy that i could help. ;)
Regards
Sebastian Noack This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Jay Donnell |
last post by:
Is there a way to use the tarfile module to recursively compress the
contents of a directory and maintain the directory structure in the
tar archive?
Simply doing os.system('tar -czvf ' +...
|
by: Matt Doucleff |
last post by:
Hi everyone! I must be doing something wrong here :) I have a
tarball that contains a single file whose contents are a pickled
object. I would like to unpickle the object directly from the...
|
by: Dennis Hotson |
last post by:
Hi,
I'm trying to write a function that adds a file-like-object to a
compressed tarfile... eg ".tar.gz" or ".tar.bz2"
I've had a look at the tarfile module but the append mode doesn't support...
|
by: Uwe Mayer |
last post by:
Hi,
is it possible to delete a file from a tar-archive using the tarfile module?
Thanks
Uwe
|
by: Beowulf |
last post by:
Hello,
I'm using Python to automate admin tasks on my job. We use Windoze
2000 as desktop platform. When executing this daily backup scripts I
get the following error:
Traceback (most recent...
|
by: Matthew Thorley |
last post by:
I've been using tarfile like this
import tarfile
tar = tarfile.open('path_to_tar_archive', 'r:gz')
But I need to use it like this:
archive = open('path_to_tar_archive', 'r')
tar =...
|
by: aurora00 |
last post by:
I have a program that generates a number of files that will be
packaged into a tarball. Can I stream the content into TarFile without
first writing them out to the file system? All add(), addfile()...
|
by: Terry Carroll |
last post by:
I am trying to do something with a very large tarfile from within
Python, and am running into memory constraints. The tarfile in
question is a 4-gigabyte datafile from freedb.org,...
|
by: boblatest |
last post by:
Hello,
I'm trying to catch an "EOFError" exception that occurs when reading
truncated tarfile. Here's my routine, and below that the callback
trace. Note that although I'm trying to catch all...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: af34tf |
last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
| | |