473,856 Members | 1,678 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

tarfile.open(mo de='w:gz'|'w|gz '|..., fileobj=StringI O()) fails.

Hi,

is there a way to or at least a reason why I can not use tarfile to
create a gzip or bunzip2 compressed archive in the memory?

You might might wanna answer "use StringIO" but this isn't such easy
as it seems to be. ;) I am using Python 2.5.2, by the way. I think
this is a bug in at least in this version of python, but maybe
StringIO isn't just file-like enough for this "korky" tarfile module.
But this would conflict with its documentation.

"For special purposes, there is a second format for mode: 'filemode|
[compression]'. open() will return a TarFile object that processes its
data as a stream of blocks. No random seeking will be done on the
file. If given, fileobj may be any object that has a read() or write()
method (depending on the mode)."

Sounds good, but doesn't work. ;P StringIO provides a read() and
write() method amongst others. But tarfile has especially in this mode
problems with the StringIO object.

I extracted the code out of my project into a standalone python script
to proof this issue on the lowest level. You can run the script below
as following: ./StringIO-tarfile.py file1 [file2] [...]
#
# File: StringIO-tarfile.py
#
#!/usr/bin/env python

from StringIO import StringIO
import tarfile
import sys

def create_tar_file (filenames, fileobj, mode, result_cb=lambd a f:
None):
tar_file = tarfile.open(mo de=mode, fileobj=fileobj )
for f in filenames:
tar_file.add(f)
result = result_cb(fileo bj)
tar_file.close( )
return result

if __name__ == '__main__':
files = sys.argv[1:]
modes = ['w%s%s' % (x, y)for x in (':', '|') for y in ('', 'gz',
'bz2')]

string_io_cb = lambda f: f.getvalue()

for mode in modes:
ext = mode.replace('w |', '-pipe.tar.').rep lace('w:',
'.tar.').rstrip ('.')
# StringIO test.
content = create_tar_file (files, StringIO(), mode, string_io_cb)
fd = open('StringIO% s' % ext, 'w')
fd.write(conten t)
fd.close()

# file object test.
fd = open('file%s' % ext, 'w')
create_tar_file (files, fd, mode)
As test input, I have used a directory with a single text file. As you
can see below, any tests using plain file objects were successful. But
when using StringIO, I can only create uncompressed tar files. Even
though I don't get any errors when creating them most of the files are
just empty or truncated.
$ for f in `ls *.tar{,.gz,.bz2 }`; do echo -n $f; du -h $f | awk
'{print " ("$1"B)"}'; tar -tf $f; echo; done

file-pipe.tar (84KB)
foo/
foo/ksp-fosdem2008.txt

file-pipe.tar.bz2 (20KB)
foo/
foo/ksp-fosdem2008.txt

file-pipe.tar.gz (20KB)
foo/
foo/ksp-fosdem2008.txt

file.tar (84KB)
foo/
foo/ksp-fosdem2008.txt

file.tar.bz2 (20KB)
foo/
foo/ksp-fosdem2008.txt

file.tar.gz (20KB)
foo/
foo/ksp-fosdem2008.txt

StringIO-pipe.tar (76KB)
foo/
foo/ksp-fosdem2008.txt
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

StringIO-pipe.tar.bz2 (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors

StringIO-pipe.tar.gz (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors

StringIO.tar (76KB)
foo/
foo/ksp-fosdem2008.txt

StringIO.tar.bz 2 (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors

StringIO.tar.gz (4.0KB)

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error exit delayed from previous errors
Can somebody reproduce this problem? Did I misunderstood the API? What
would be the best work around, if I am right? I am thinking about
using the gzip and bz2 module directly.

Regards
Sebastian Noack
Jun 27 '08 #1
6 6039
En Mon, 26 May 2008 17:44:28 -0300, se************* @googlemail.com
<se************ *@googlemail.co mescribió:
is there a way to or at least a reason why I can not use tarfile to
create a gzip or bunzip2 compressed archive in the memory?

You might might wanna answer "use StringIO" but this isn't such easy
as it seems to be. ;) I am using Python 2.5.2, by the way. I think
this is a bug in at least in this version of python, but maybe
StringIO isn't just file-like enough for this "korky" tarfile module.
But this would conflict with its documentation.
def create_tar_file (filenames, fileobj, mode, result_cb=lambd a f:
None):
tar_file = tarfile.open(mo de=mode, fileobj=fileobj )
for f in filenames:
tar_file.add(f)
result = result_cb(fileo bj)
tar_file.close( )
return result
It's not a bug, you must extract the StringIO contents *after* closing
tar_file, else you won't get the last blocks pending to be written.

--
Gabriel Genellina

Jun 27 '08 #2
On May 27, 2:17 am, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:
It's not a bug, you must extract the StringIO contents *after* closing
tar_file, else you won't get the last blocks pending to be written.
I looked at tarfile's source code last night after I wrote this
message and figured it out. But the problem is that TarFile's close
method closes the underlying file object after the last block is
written and when you close StringIO you can not get its content
anymore. Wtf does it close the underlying file? There is absolute no
reason for doing this. Are you still sure this isn't a bug?

Regards
Sebastian Noack
Jun 27 '08 #3
I have written a FileWrapper class as workaround, which works for me
(see the code below). The FileWrapper object holds an internal file-
like object and maps its attributes, but prevents the user (in this
case tarfile) from closing the internal file, so I can still access
StringIO's content after closing the TarFile object.

But this should not be required to create in memory tar files. It is
definitely a bug, that TarFile closes external file objects passed to
tarfile.open, when closing the TarFile object. The code which opens a
file is also responsible for closing it.

Regards
Sebastian Noack
#
# File: StringIO-tarfile.py
#
#!/usr/bin/env python

from StringIO import StringIO
import tarfile
import sys

class FileWrapper(obj ect):
def __init__(self, fileobj):
self.file = fileobj
self.closed = fileobj.closed

def __getattr__(sel f, name):
# Raise AttributeError, if it isn't a file attribute.
if name not in dir(file):
raise AttributeError( name)

# Get the attribute of the internal file object.
value = getattr(self.fi le, name)

# Raise a ValueError, if the attribute is callable (e.g. an instance
# method) and the FileWrapper is closed.
if callable(value) and self.closed:
raise ValueError('I/O operation on closed file')
return value

def close(self):
self.closed = True

def create_tar_file (filenames, fileobj, mode):
tar_file = tarfile.open(mo de=mode, fileobj=fileobj )
for f in filenames:
tar_file.add(f)
tar_file.close( )

if __name__ == '__main__':
files = sys.argv[1:]
modes = ['w%s%s' % (x, y) for x in (':', '|') for y in ('', 'gz',
'bz2')]

for mode in modes:
ext = mode.replace('w |', '-pipe.tar.').rep lace('w:',
'.tar.').rstrip ('.')
# StringIO test.
stream = FileWrapper(Str ingIO())
create_tar_file (files, stream, mode)
fd = open('StringIO% s' % ext, 'w')
fd.write(stream .file.getvalue( ))
stream.file.clo se()
fd.close()

# file object test.
fd = open('file%s' % ext, 'w')
create_tar_file (files, fd, mode)
Jun 27 '08 #4
On Tue, May 27, 2008 at 01:51:47AM -0700, se************* @googlemail.com wrote:
I have written a FileWrapper class as workaround, which works for me
(see the code below). The FileWrapper object holds an internal file-
like object and maps its attributes, but prevents the user (in this
case tarfile) from closing the internal file, so I can still access
StringIO's content after closing the TarFile object.

But this should not be required to create in memory tar files. It is
definitely a bug, that TarFile closes external file objects passed to
tarfile.open, when closing the TarFile object. The code which opens a
file is also responsible for closing it.
You're right, _BZ2Proxy.close () calls the wrapped file object's close() method
and that is definitely not the desired behaviour. So, if you can do without 'bz2'
modes for now, you're problem is gone, all other modes work fine.

I fixed it (r63744), so the next beta release will work as expected. Your test
script helped a lot, thanks.

Regards,

--
Lars Gustäbel
la**@gustaebel. de

A casual stroll through a lunatic asylum shows that
faith does not prove anything.
(Friedrich Nietzsche)
Jun 27 '08 #5
En Tue, 27 May 2008 02:43:53 -0300, se************* @googlemail.com
<se************ *@googlemail.co mescribió:
On May 27, 2:17 am, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:
>It's not a bug, you must extract the StringIO contents *after* closing
tar_file, else you won't get the last blocks pending to be written.

I looked at tarfile's source code last night after I wrote this
message and figured it out. But the problem is that TarFile's close
method closes the underlying file object after the last block is
written and when you close StringIO you can not get its content
anymore. Wtf does it close the underlying file? There is absolute no
reason for doing this. Are you still sure this isn't a bug?
Ouch, sorry, I only tried with gzip (and worked fine), not bz2 (which is
buggy).

--
Gabriel Genellina

Jun 27 '08 #6
That is right, only bz2 is affected. I am happy that i could help. ;)

Regards
Sebastian Noack
Jun 27 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
11153
by: Jay Donnell | last post by:
Is there a way to use the tarfile module to recursively compress the contents of a directory and maintain the directory structure in the tar archive? Simply doing os.system('tar -czvf ' + fileName +'.tar.gz ' + directory) works great on linux, but I need this script to work on windows as well :(
5
4890
by: Matt Doucleff | last post by:
Hi everyone! I must be doing something wrong here :) I have a tarball that contains a single file whose contents are a pickled object. I would like to unpickle the object directly from the tarball using the file-like object provided by extractfile(). Attempts to do this result in EOFError. However if I first extract to a temporary file, then unpickle from there, it works. The below code reproduces the problem (on my machine at...
8
8848
by: Dennis Hotson | last post by:
Hi, I'm trying to write a function that adds a file-like-object to a compressed tarfile... eg ".tar.gz" or ".tar.bz2" I've had a look at the tarfile module but the append mode doesn't support compressed tarfiles... :( Any thoughts on what I can do to get around this?
5
7712
by: Uwe Mayer | last post by:
Hi, is it possible to delete a file from a tar-archive using the tarfile module? Thanks Uwe
1
1929
by: Beowulf | last post by:
Hello, I'm using Python to automate admin tasks on my job. We use Windoze 2000 as desktop platform. When executing this daily backup scripts I get the following error: Traceback (most recent call last): File "C:\UTILS\backup.py", line 8, in ? TarFileBackup = tarfile.open(NewBackupFilename, 'w:bz2') File "C:\Python23\lib\tarfile.py", line 875, in open
1
3275
by: Matthew Thorley | last post by:
I've been using tarfile like this import tarfile tar = tarfile.open('path_to_tar_archive', 'r:gz') But I need to use it like this: archive = open('path_to_tar_archive', 'r') tar = tarfile.open(archive.readlines())
7
7885
by: aurora00 | last post by:
I have a program that generates a number of files that will be packaged into a tarball. Can I stream the content into TarFile without first writing them out to the file system? All add(), addfile() and gettarinfo() seems to assume there is a file in the disk. But for me I seems inefficient to write all the content to the disk and then have it read back by the TarFile module. Thank you for your help wy
6
7516
by: Terry Carroll | last post by:
I am trying to do something with a very large tarfile from within Python, and am running into memory constraints. The tarfile in question is a 4-gigabyte datafile from freedb.org, http://ftp.freedb.org/pub/freedb/ , and has about 2.5 million members in it. Here's a simple toy program that just goes through and counts the number of members in the tarfile, printing a status message every N records (N=10,000 for the smaller file;...
1
2589
by: boblatest | last post by:
Hello, I'm trying to catch an "EOFError" exception that occurs when reading truncated tarfile. Here's my routine, and below that the callback trace. Note that although I'm trying to catch all TarFile exceptions, the tarfile.EOFError ecxeption, and the global EOFError exception, the program still falls through and fails. def query_archive(batch_base): arc_name = os.path.join(archive_dir, 'B_'+batch_base+'.tar.bz2')
0
9916
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9762
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11057
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10782
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10384
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9531
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7932
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5958
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4575
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.