Python 3000: Standard API for archives?

samwyse

I'm a relative newbie to Python, so please bear with me. There are
currently two standard modules used to access archived data: zipfile
and tarfile. The interfaces are completely different. In particular,
script wanting to analyze different types of archives must duplicate
substantial pieces of logic. The problem is not limited to method
names; it includes how stat-like information is accessed.

I think it would be a good thing if a standardized interface existed,
similar to PEP 247. This would make it easier for one script to access
multiple types of archives, such as RAR, 7-Zip, ISO, etc. In
particular, a single factory class could produce PEP 302 import hooks
for future as well as current archive formats.

I think that an archive module adhering to the standard should adopt a
least-common-denominator approach, initially supporting read-only access
without seek, i.e. tar files on actual tape. For applications that
require a seek method (such as importers) a standard wrapper class could
transparently cache archive members in temp files; this would fit in
well with Python 3000's rewrite of the I/O interface to support
stackable interfaces. To this end, we'd need is_seekable and
is_writable attributes for both the module and instances (moduel level
would declare if something is possible, not if it is always true).

Most importantly, all archive modules should provide a standard API for
accessing their individual files via a single archive_content class that
provides a standard 'read' method. Less importantly but nice to have
would be a way for archives to be auto-magically scanned during walks of
directories.

Feedback?

Jun 4 '07 #1

Subscribe Post Reply

1195

Chuck Rhode

samwyse wrote this on Mon, 04 Jun 2007 12:02:03 +0000. My reply is
below.

I think it would be a good thing if a standardized interface
existed, similar to PEP 247. This would make it easier for one
script to access multiple types of archives, such as RAR, 7-Zip,
ISO, etc.

Gee, it would be great to be able to open an archive member for update
I/O. This is kind of hard to do now. If it were possible, though, it
would obscure the difference between file directories and archives,
which would be kind of neat. Furthermore, you could navigate archives
of archives (zips of tars and other abominations).

--
... Chuck Rhode, Sheboygan, WI, USA
... Weather: http://LacusVeris.com/WX
... 62Â° â€” Wind N 7 mph â€” Sky overcast. Mist.

Jun 4 '07 #2

Tim Golden

Chuck Rhode wrote:

samwyse wrote this on Mon, 04 Jun 2007 12:02:03 +0000. My reply is
below.

>I think it would be a good thing if a standardized interface
existed, similar to PEP 247. This would make it easier for one
script to access multiple types of archives, such as RAR, 7-Zip,
ISO, etc.

Gee, it would be great to be able to open an archive member for update
I/O. This is kind of hard to do now. If it were possible, though, it
would obscure the difference between file directories and archives,
which would be kind of neat. Furthermore, you could navigate archives
of archives (zips of tars and other abominations).

FWIW, there's no need to get hung on Python-3000 or
any other release. Just put something together a module
called "archive" or whatever, which exposes the kind of
API you're thinking of, offering support across zip, bz2
and whatever else you want. Put it up on the Cheeseshop,
announce it on c.l.py.ann and anywhere else which seems
apt. See if it gains traction. Take it from there.

NB This has the advantage that you can start small, say
with zip and bz2 support and maybe see if you get
contributions for less common formats, even via 3rd
party libs. If you were to try to get it into the stdlib
it would need to be much more fully specified up front,
I suspect.

TJG

Jun 4 '07 #3

Chuck Rhode

Tim Golden wrote this on Mon, 04 Jun 2007 15:55:30 +0100. My reply is
below.

Chuck Rhode wrote:

>samwyse wrote this on Mon, 04 Jun 2007 12:02:03 +0000. My reply is
below.

>>I think it would be a good thing if a standardized interface
existed, similar to PEP 247. This would make it easier for one
script to access multiple types of archives, such as RAR, 7-Zip,
ISO, etc.

>Gee, it would be great to be able to open an archive member for
update I/O. This is kind of hard to do now. If it were possible,
though, it would obscure the difference between file directories
and archives, which would be kind of neat. Furthermore, you could
navigate archives of archives (zips of tars and other
abominations).

Just put something together a module called "archive" or whatever,
which exposes the kind of API you're thinking of, offering support
across zip, bz2 and whatever else you want. Put it up on the
Cheeseshop, announce it on c.l.py.ann and anywhere else which seems
apt. See if it gains traction. Take it from there.

NB This has the advantage that you can start small, say with zip and
bz2 support and maybe see if you get contributions for less common
formats, even via 3rd party libs. If you were to try to get it into
the stdlib it would need to be much more fully specified up front, I
suspect.

Yeah, this is in the daydreaming stages. I'd like to maintain
not-just-read-only libraries of geographic shapefiles, which are
available free from governmental agencies and which are riddled with
obvious errors. Typically these are published in compressed archives
within which every subdirectory is likewise compressed (apparently for
no other purpose than a rather vain attempt at flattening the
directory structure, which must be reconstituted on the User's end
anyway). Building a comprehensive index to what member name(s) the
different map layers (roads, political boundaries, watercourses) have
in various political districts of varying geographic resolutions is
much more than merely frustrating. I've given it up. However, I
believe that once I've located something usable, the thing to do is
save a grand unified reference locator (GURL) for it. The GURL would
specify a directory path to the highest level archive followed by a
(potential cascade of) archive member name(s for enclosed archives) of
the data file(s) to be operated on. Unpacking and repacking would be
behind the scenes. Updates (via FTP) of non-local resources would be
transparent, too. I think, though, that notes about the publication
date, publisher, resolution, area covered, and format of the map or
map layer ought to be kept out of the GURL.

My whole appetite for this sort of thing would vanish if access to the
shapefiles were more tractable to begin with.

--
... Chuck Rhode, Sheboygan, WI, USA
... 1979 Honda Goldwing GL1000 (Geraldine)
... Weather: http://LacusVeris.com/WX
... 52Â° â€” Wind N 9 mph â€” Sky overcast.

Jun 5 '07 #4

Similar topics

programming with Python 3000 in mind

by: beliavsky | last post by:

At http://www-03.ibm.com/developerworks/blogs/page/davidmertz David Mertz writes "Presumably with 2.7 (and later 2.x versions), there will be a means of warning developers of constructs that are...

Python

Python language extension mechanism for Python 3000... Worth for PEP?

by: Petr Prikryl | last post by:

Do you think that the following could became PEP (pre PEP). Please, read it, comment it, reformulate it,... Abstract Introduction of the mechanism for language extensions via modules...

Python

Python 3000 released as 3.0a1

by: Guido van Rossum | last post by:

python-list@python.org] The first Python 3000 release is out -- Python 3.0a1. Be the first one on your block to download it! http://python.org/download/releases/3.0/ Excerpts: Python...

Python

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing