By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,454 Members | 1,804 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,454 IT Pros & Developers. It's quick & easy.

In need of a virtual filesystem / archive

P: n/a
I need to store a large number of files in an archive. From Python, I
need to be able to create an archive, put files into it, modify files
that are already in it, and delete files already in it.

The easy solution would be to use a zip file or a tar file. Python has
good standard modules for accessing those types. However, I would tend
to think that modifying or deleting files in the archive would require
rewriting the entire archive.

Is there any archive format that can allow Python to modify a file in
the archive *in place*? That is to say if my archive is 2GB large and I
have a small text file in the archive I want to be able to modify that
small text file (or delete it) without having to rewrite the entire
archive to disk.

Does anything like this exist? If nothing exists for Python, is there
something written in C maybe that I could wrap (preferably you won't
suggest wrapping the ext2 filesystem driver.. ;) ?

Feb 21 '06 #1
Share this Question
Share on Google+
7 Replies


P: n/a
may be store them in sqlite ?

On linux, fuse can also be an interesting option, gmailfs is written in
python.

Enigma Curry wrote:
I need to store a large number of files in an archive. From Python, I
need to be able to create an archive, put files into it, modify files
that are already in it, and delete files already in it.

The easy solution would be to use a zip file or a tar file. Python has
good standard modules for accessing those types. However, I would tend
to think that modifying or deleting files in the archive would require
rewriting the entire archive.

Is there any archive format that can allow Python to modify a file in
the archive *in place*? That is to say if my archive is 2GB large and I
have a small text file in the archive I want to be able to modify that
small text file (or delete it) without having to rewrite the entire
archive to disk.

Does anything like this exist? If nothing exists for Python, is there
something written in C maybe that I could wrap (preferably you won't
suggest wrapping the ext2 filesystem driver.. ;) ?


Feb 21 '06 #2

P: n/a
"Enigma Curry" <wo*****@gmail.com> writes:
Is there any archive format that can allow Python to modify a file in
the archive *in place*? That is to say if my archive is 2GB large and I
have a small text file in the archive I want to be able to modify that
small text file (or delete it) without having to rewrite the entire
archive to disk.

Does anything like this exist?


Yes, what you want is called a database. Try the bsddb module or
something with MySQL depending on your requirements.
Feb 21 '06 #3

P: n/a
Enigma Curry wrote:
I need to store a large number of files in an archive. From Python, I
need to be able to create an archive, put files into it, modify files
that are already in it, and delete files already in it.

The easy solution would be to use a zip file or a tar file. Python has
good standard modules for accessing those types. However, I would tend
to think that modifying or deleting files in the archive would require
rewriting the entire archive.

Is there any archive format that can allow Python to modify a file in
the archive *in place*? That is to say if my archive is 2GB large and I
have a small text file in the archive I want to be able to modify that
small text file (or delete it) without having to rewrite the entire
archive to disk.


Yes. I believe your common or garden variety file
manager can handle this task, by storing files in an
archive called "a directory". For example, many mail
systems use the "maildir" archive for storing email
while still being able to access it quickly and robustly.

Do you really need to store your files in a single
meta-file? Do you need compression? How much overhead
for the archive structure are you prepared to carry? Do
you expect the archive to shrink when you delete a file
from the middle?

I suspect you can pick any two of the following three:

1. single file
2. space used for deleted files is reclaimed
3. fast performance

Using a proper database will give you 2 and 3, but at
the cost of a lot of overhead, and typically a
relational database is not a single file.

--
Steven.

Feb 21 '06 #4

P: n/a

Steven D'Aprano wrote:
I suspect you can pick any two of the following three:

1. single file
2. space used for deleted files is reclaimed
3. fast performance

Using a proper database will give you 2 and 3, but at
the cost of a lot of overhead, and typically a
relational database is not a single file.

sqlite can give 1-3, it does have overhead but whether it worths it
depends on individual judgement based on features, usage pattern etc..
I think monotone use it.

Feb 21 '06 #5

P: n/a
Enigma Curry:
I need to store a large number of files in an archive. From Python, I
need to be able to create an archive, put files into it, modify files
that are already in it, and delete files already in it.


Use the file system. That's what it's for.

--
RenÚ Pijlman
Feb 21 '06 #6

P: n/a
En/na Enigma Curry ha escrit::
I need to store a large number of files in an archive. From Python, I
need to be able to create an archive, put files into it, modify files
that are already in it, and delete files already in it.
[...]
Is there any archive format that can allow Python to modify a file in
the archive *in place*? That is to say if my archive is 2GB large and I
have a small text file in the archive I want to be able to modify that
small text file (or delete it) without having to rewrite the entire
archive to disk.
[...]


Although it is not its main usage, PyTables_ can be used to store
ordinary files in a single HDF5_ file. HDF5 files have a hierarchical
structure of nodes and groups which maps quite well to files and
directories. You can create, read, modify, copy, move and remove nodes
at will, freed space is reclaimed, and HDF5 is very efficient no matter
how large data is.

For working with the files, PyTables includes a FileNode_ module which
offers Python file semantics for nodes in an HDF5 file. You can also
keep nodes transparently compressed, or you may repack the whole HDF5
file to defragment it or (de)compress its nodes, which may make a
reasonable option to a compressed archive.

I will be pleased to give more information. Hope that helps.

.. _PyTables: http://www.pytables.org/
.. _HDF5: http://hdf.ncsa.uiuc.edu/HDF5/
.. _FileNode: http://pytables.sourceforge.net/html...ersguide6.html

import disclaimer

::

Ivan Vilata i Balaguer >qo< http://www.carabos.com/
Cárabos Coop. V. V V Enjoy Data
""
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFD+wXCmKrUC8oEF40RAnF2AJ40ZFvZhujkpK2GtAXXZO A05EUBXQCginkR
JrkqUEMB8pKxyPghkKlY7Gg=
=7iCi
-----END PGP SIGNATURE-----

Feb 21 '06 #7

P: n/a
Thanks for all the suggestions!

I realized a few minutes after I posted that a database would work.. I
just wasn't in that "mode" of thinking when I posted.

PyTables also looks very interesting, especially because apparently I
can read a file in the archive like a normal python file, ie one line
at a time.

Could I do the same using SQL? I'm assuming I would get the whole file
back when I did my SELECT statement. I guess I could chunk the file out
and store it in multiple rows, but that sounds complicated.

Feb 21 '06 #8

This discussion thread is closed

Replies have been disabled for this discussion.