PEP on path module for standard library

Michael Hoffman

Many of you are familiar with Jason Orendorff's path module
<http://www.jorendorff.com/articles/python/path/>, which is frequently
recommended here on c.l.p. I submitted an RFE to add it to the Python
standard library, and Reinhold Birkenfeld started a discussion on it in
python-dev
<http://mail.python.org/pipermail/python-dev/2005-June/054438.html>.

The upshot of the discussion was that many python-dev'ers wanted path
added to the stdlib, but Guido was not convinced and said it must have a
PEP. So Reinhold and I are going to work on one. Reinhold has already
made some changes to the module to fit the python-dev discussion and put
it in CPython CVS at nondist/sandbox/path.

For the PEP, do any of you have arguments for or against including path?
Code samples that are much easier or more difficult with this class
would also be most helpful.

I use path in more of my modules and scripts than any other third-party
module, and I know it will be very helpful when I no longer have to
worry about deploying it.

Thanks in advance,
--
Michael Hoffman

Jul 21 '05

Subscribe Reply

4013

Michael Hoffman

Peter Hansen wrote:

When files are opened through a "path" object -- e.g.
path('name').open() -- then file.name returns the path object that was
used to open it.

Also works if you use file(path('name')) or open(path('name')).
--
Michael Hoffman

Jul 22 '05 #51

Reinhold Birkenfeld

Andrew Dalke wrote:

Duncan Booth wrote:
Personally I think the concept of a specific path type is a good one, but
subclassing string just cries out to me as the wrong thing to do.
I disagree. I've tried using a class which wasn't derived from
a basestring and kept running into places where it didn't work well.
For example, "open" and "mkdir" take strings as input. There is no
automatic coercion.

Well, as a Path object provides both open() and mkdir() functions, these use
cases are covered. And that's the point of the Path class: Every common use
you may have for a path is implemented as a method.

So, it's maybe a good thing that for uncommon uses you have to explicitly
"cast" the path to a string.

class Spam: ... def __getattr__(self, name):
... print "Want", repr(name)
... raise AttributeError, name
... open(Spam()) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found import os
os.mkdir(Spam()) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found
The solutions to this are:
1) make the path object be derived from str or unicode. Doing
this does not conflict with any OO design practice (eg, Liskov
substitution).

2) develop a new "I represent a filename" protocol, probably done
via adapt().

I've considered the second of these but I think it's a more
complicated solution and it won't fit well with existing APIs
which do things like
if isinstance(input, basestring):
input = open(input, "rU")
for line in input:
print line

I showed several places in the stdlib and in 3rd party packages
where this is used.

That's a valid point. However, if Path is not introduced as a string,
most new users will not try to use a Path instance where a string is
needed, just as you wouldn't try to pass a list where a string is wanted.
You should need an explicit call to convert a path to a string and that
forces you when passing the path to something that requires a string to
think whether you wanted the string relative, absolute, UNC, uri etc.

You are broadening the definition of a file path to include URIs?
That's making life more complicated. Eg, the rules for joining
file paths may be different than the rules for joining URIs.
Consider if I have a file named "mail:da***@example.com" and I
join that with "file://home/dalke/badfiles/".

Additionally, the actions done on URIs are different than on file
paths. What should os.listdir("http://www.python.org/") do?

I agree. Path is only for local filesystem paths (well, in UNIX they could
as well be remote, but that's thanks to the abstraction the filesystem
provides, not Python).
As I mentioned, I tried some classes which emulated file
paths. One was something like

class TempDir:
"""removes the directory when the refcount goes to 0"""
def __init__(self):
self.filename = ... use a function from the tempfile module
def __del__(self):
if os.path.exists(self.filename):
shutil.rmtree(self.filename)
def __str__(self):
return self.filename

I could do

dirname = TempDir()

but then instead of

os.mkdir(dirname)
tmpfile = os.path.join(dirname, "blah.txt")

I needed to write it as

os.mkdir(str(dirname))
tmpfile = os.path.join(str(dirname), "blah.txt"))

or have two variables, one which could delete the
directory and the other for the name. I didn't think
that was good design.
I can't follow. That's clearly not a Path but a custom object of yours.
However, I would have done it differently: provide a "name" property
for the object, and don't call the variable "dirname", which is confusing.
If I had derived from str/unicode then things would
have been cleaner.

Please note, btw, that some filesystems are unicode
based and others are not. As I recall, one nice thing
about the path module is that it chooses the appropriate
base class at import time. My "str()" example above
does not and would fail on a Unicode filesystem aware
Python build.

There's no difference. The only points where the type of a Path
object' underlying string is decided are Path.cwd() and the
Path constructor.
Reinhold

Jul 22 '05 #52

Michael Hoffman

Reinhold Birkenfeld wrote:

Andrew Dalke wrote:
I disagree. I've tried using a class which wasn't derived from
a basestring and kept running into places where it didn't work well.
For example, "open" and "mkdir" take strings as input. There is no
automatic coercion.
Well, as a Path object provides both open() and mkdir() functions, these use
cases are covered. And that's the point of the Path class: Every common use
you may have for a path is implemented as a method.

Except when you pass the path to a function written by someone else
So, it's maybe a good thing that for uncommon uses you have to explicitly
"cast" the path to a string.

Where uncommon uses include passing the path object to any code you
don't control? The stdlib can be fixed, this other stuff can't.

The solutions to this are:
1) make the path object be derived from str or unicode. Doing
this does not conflict with any OO design practice (eg, Liskov
substitution).

2) develop a new "I represent a filename" protocol, probably done
via adapt().

I've considered the second of these but I think it's a more
complicated solution and it won't fit well with existing APIs
which do things like
if isinstance(input, basestring):
input = open(input, "rU")
for line in input:
print line

I showed several places in the stdlib and in 3rd party packages
where this is used.

That's a valid point. However, if Path is not introduced as a string,
most new users will not try to use a Path instance where a string is
needed, just as you wouldn't try to pass a list where a string is wanted.

But many functions were written expecting lists as arguments but also
work for strings, and do not require an explicit list(mystring) before
calling the function.
--
Michael Hoffman

Jul 22 '05 #53

Michael Hoffman

Peter Hansen wrote:

Most prominent change is that it doesn't inherit from str/unicode
anymore.
I found this distinction important, because as a str subclass the Path
object
has many methods that don't make sense for it.

On this topic, has anyone ask the original author (Jason Orendorff)
whether he has some background on this decision that might benefit the
discussion?

My impression is that he doesn't have a lot of spare cycles for this. He
didn't have anything to add to the python-dev discussion when I informed
him of it. I'd love to hear what he had to say about the design.
--
Michael Hoffman

Jul 22 '05 #54

John Machin

Daniel Dittmar wrote:

Duncan Booth wrote:
I would have expected a path object to be a sequence of path elements
rather than a sequence of characters.

Maybe it's nitpicking, but I don't think that a path object should be a
'sequence of path elements' in an iterator context.

This means that

for element in pathobject:

has no intuitive meaning for me, so it shouldn't be allowed.

Try this:

A file-system is a maze of twisty little passages, all alike. Junction
== directory. Cul-de-sac == file. Fortunately it is signposted. You are
dropped off at one of the entrance points ("current directory", say).
You are given a route (a "path") to your destination. The route consists
of a list of intermediate destinations.

for element in pathobject:
follow_sign_post_to(element)

Exception-handling strategy: Don't forget to pack a big ball of string.
Anecdotal evidence is that breadcrumbs are unreliable.

Cheers,
John

Jul 22 '05 #55

John Machin

Michael Hoffman wrote:

John Roth wrote:
However, a path as a sequence of characters has even less
meaning - I can't think of a use, while I have an application
where traversing a path as a sequence of path elements makes
perfect sense: I need to descend the directory structure, directory
by directory, looking for specific files and types.

I *have* used a path as a sequence of characters before. I had to deal
with a bunch of filenames that were formatted like "file_02832.a.txt"

Ya, ya , ya, only two days ago I had to lash up a quick script to find
all files matching r"\d{8,8}[A-Za-z]{0,3}$" and check that the expected
number were present in each sub-directory (don't ask!).

BUT you are NOT using "a path as a sequence of characters". Your
filename is a path consisting of one element. The *element* is an
instance of basestring, to which you can apply all the string methods
and the re module.

I can see the case for a path as a sequence of elements, although in
practice, drive letters, extensions, and alternate streams complicate
things.

Jul 22 '05 #56

Peter Hansen

Michael Hoffman wrote:

Peter Hansen wrote:
When files are opened through a "path" object -- e.g.
path('name').open() -- then file.name returns the path object that was
used to open it.

Also works if you use file(path('name')) or open(path('name')).

Since that's exactly what the path module does, it's not surprising.
Practically everything that path does, with a few useful exceptions, is
a thin wrapper around the existing calls. path.open, for example is
merely this:

def open(self, mode='r'):
return file(self, mode)

-Peter

Jul 22 '05 #57

George Sakkis

"Andrew Dalke" <da***@dalkescientific.com> wrote:

George Sakkis wrote:
You're right, conceptually a path
HAS_A string description, not IS_A string, so from a pure OO point of
view, it should not inherit string.
How did you decide it's "has-a" vs. "is-a"?

All C calls use a "char *" for filenames and paths,
meaning the C model file for the filesystem says
paths are strings.

Bringing up how C models files (or anything else other than primitive types for that matter) is not
a particularly strong argument in a discussion on OO design ;-)
Paths as strings fit the Liskov substitution principle
in that any path object can be used any time a
string is used (eg, "loading from " + filename)
Liskov substitution principle imposes a rather weak constraint on when inheritance should not be
used, i.e. it is a necessary condition, but not sufficient. Take for example the case where a
PhoneNumber class is subclass of int. According to LSP, it is perfectly ok to add phone numbers
together, subtract them, etc, but the result, even if it's a valid phone number, just doesn't make
sense.
Good information hiding suggests that a better API
is one that requires less knowledge. I haven't
seen an example of how deriving from (unicode)
string makes things more complicated than not doing so.
I wouldn't say more complicated, but perhaps less intuitive in a few cases, e.g.:
path(r'C:\Documents and Settings\Guest\Local Settings').split()

['C:\\Documents', 'and', 'Settings\\Guest\\Local', 'Settings']
instead of
['C:', 'Documents and Settings', 'Guest', 'Local Settings']

I just noted that conceptually a path is a composite object consisting of many properties (dirname,
extension, etc.) and its string representation is just one of them. Still, I'm not suggesting that a
'pure' solution is better that a more practical that covers most usual cases.

George

Jul 22 '05 #58

Michael Hoffman

Peter Hansen wrote:

Practically everything that path does, with a few useful exceptions, is
a thin wrapper around the existing calls.

If the implementation is easy to explain, it may be a good idea.

OT: I just realized you can now type in "python -m this" at the command
line, which is convenient, but strange.
--
Michael Hoffman

Jul 23 '05 #59

Neil Hodgson

Scott David Daniels:

Isn't it even worse than this?
On Win2K & XP, don't the file systems have something to do with the
encoding? So D: (a FAT drive) might naturally be str, while C:
(an NTFS drive) might naturally be unicode.
This is generally safe as Windows is using unicode internally and
provides full-fidelity access to the FAT drive using unicode strings.
You can produce failures if you try to create files with names that can
not be represented but you would see a similar failure with byte string
access.
Even worse, would be a
path that switches in the middle (which it might do if we get to a
ZIP file or use the newer dir-in-file file systems.

If you are promoting from byte strings with a known encoding to
unicode path objects then this should always work.

Neil

Jul 23 '05 #60

Andrew Dalke

George Sakkis wrote:

Bringing up how C models files (or anything else other than primitive types
for that matter) is not a particularly strong argument in a discussion on
OO design ;-)
While I have worked with C libraries which had a well-developed
OO-like interface, I take your point.

Still, I think that the C model of a file system should be a
good fit since after all C and Unix were developed hand-in-hand. If
there wasn't a good match then some of the C path APIs should be
confusing or complicated. Since I don't see that it suggests that
the "path is-a string" is at least reasonable.
Liskov substitution principle imposes a rather weak constraint
Agreed. I used that as an example of the direction I wanted to
go. What principles guide your intuition of what is a "is-a"
vs a "has-a"?
Take for example the case where a PhoneNumber class is subclass
of int. According to LSP, it is perfectly ok to add phone numbers
together, subtract them, etc, but the result, even if it's a valid
phone number, just doesn't make sense.
Mmm, I don't think an integer is a good model of a phone number.
For example, in the US
00148762040828
will ring a mobile number in Sweden while
148762040828
will give a "this isn't a valid phone number" message.

Yet both have the same base-10 representation. (I'm not using
a syntax where leading '0' indicates an octal number. :)
I wouldn't say more complicated, but perhaps less intuitive in a few cases, e.g.:
path(r'C:\Documents and Settings\Guest\Local Settings').split() ['C:\\Documents', 'and', 'Settings\\Guest\\Local', 'Settings']
instead of
['C:', 'Documents and Settings', 'Guest', 'Local Settings']

That is why the path module using a different method to split
on pathsep vs. whitespace. I get what you are saying, I just think
it's roughly equivalent to appealing to LSP in terms of weight.

Mmm, then there's a question of the usefulness of ".lower()" and
".expandtabs()" and similar methods. Hmmm....
I just noted that conceptually a path is a composite object consisting of
many properties (dirname, extension, etc.) and its string representation
is just one of them. Still, I'm not suggesting that a 'pure' solution is
better that a more practical that covers most usual cases.

For some reason I think that

path.dirname()

is better than

path.dirname

Python has properties now so the implementation of the latter is
trivial - put a @property on the line before the "def dirname(self):".

I think that the string representation of a path is so important that
it *is* the path. The other things you call properties aren't quite
properties in my model of a path and are more like computable values.

I trust my intuition on this, I just don't know how to justify it, or
correct it if I'm wrong.

Andrew
da***@dalkescientific.com

Jul 23 '05 #61

George Sakkis

"Andrew Dalke" <da***@dalkescientific.com> wrote:

[snipped]
Take for example the case where a PhoneNumber class is subclass
of int. According to LSP, it is perfectly ok to add phone numbers
together, subtract them, etc, but the result, even if it's a valid
phone number, just doesn't make sense.
Mmm, I don't think an integer is a good model of a phone number.
For example, in the US
00148762040828
will ring a mobile number in Sweden while
148762040828
will give a "this isn't a valid phone number" message.

That's why phone numbers would be a subset of integers, i.e. not every integer would correspond to a
valid number, but with the exception of numbers starting with zeros, all valid numbers would be an
integers. Regardless, this was not my point; the point was that adding two phone numbers or
subtracting them never makes sense semantically.
[snipped]
I just noted that conceptually a path is a composite object consisting of
many properties (dirname, extension, etc.) and its string representation
is just one of them. Still, I'm not suggesting that a 'pure' solution is
better that a more practical that covers most usual cases.
For some reason I think that

path.dirname()

is better than

path.dirname

Python has properties now so the implementation of the latter is
trivial - put a @property on the line before the "def dirname(self):".

Sorry, I used the term 'property' in the broad sense, as the whole exposed API, not the specific
python feature; I've no strong preference between path.dirname and path.dirname().
I think that the string representation of a path is so important that
it *is* the path.
There are (at least) two frequently used path string representations, the absolute and the relative
to the working directory. Which one *is* the path ? Depending on the application, one of them woud
be more natural choice than the other.
I trust my intuition on this, I just don't know how to justify it, or
correct it if I'm wrong.

My intuition also happens to support subclassing string, but for practical reasons rather than
conceptual.

George

Jul 23 '05 #62

Andrew Dalke

George Sakkis wrote:

That's why phone numbers would be a subset of integers, i.e. not every
integer would correspond to a valid number, but with the exception of
numbers starting with zeros, all valid numbers would be an integers.
But it's that exception which violates the LSP.

With numbers, if x==y then (x,y) = (y,x) makes no difference.
If phone numbers are integers then 001... == 01... but swapping
those two numbers makes a difference. Hence they cannot be modeled
as integers.
Regardless, this was not my point; the point was that adding
two phone numbers or subtracting them never makes sense semantically.
I agree. But modeling them as integers doesn't make sense either.
Your example of adding phone numbers depends on them being represented
as integers. Since that representation doesn't work, it makes sense
that addition of phone number is suspect.
There are (at least) two frequently used path string representations,
the absolute and the relative to the working directory. Which one *is*
the path ? Depending on the application, one of them woud be more
natural choice than the other.

Both. I don't know why one is more natural than the other.

I trust my intuition on this, I just don't know how to justify it, or
correct it if I'm wrong.

My intuition also happens to support subclassing string, but for
practical reasons rather than conceptual.

As you may have read elsewhere in this thread, I give some examples
of why subclassing from string fits best with existing code.

Even if there was no code base, I think deriving from string is the
right approach. I have a hard time figuring out why though. I think
if the lowest level Python/C interface used a "get the filename"
interface then perhaps it wouldn't make a difference. Which means
I'm also more guided by practical reasons than conceptual.

Andrew
da***@dalkescientific.com

Jul 23 '05 #63

paul

Michael Hoffman wrote:

Reinhold Birkenfeld wrote:
Probably as Terry said: a path is both a list and a string.

[...]
One way to divide this is solely based on path separators:

['c:', 'windows', 'system32:altstream', 'test.dir',
'myfile.txt.zip:altstream']
I would argue that any proposed solution has to work with VMS
pathnames. ;-)
The current stdlib solution, os.path.splitext(os.path.splitext(filename)
[0])[0] is extremely clunky, and I have long desired something better.
(OK, using filename.split(os.extsep) works a little better, but you get
the idea.)

And also with unusual (eg. RISC OS) filename extensions.

To do any justice to the existing solutions, any PEP should review at
least the following projects:

* The path module (of course):
http://www.jorendorff.com/articles/python/path/

* The py.path module (or at least the ideas for it):
http://codespeak.net/py/current/doc/future.html

* itools.uri
http://www.ikaaro.org/itools

* Results from the "Object-Oriented File System Virtualisation"
project in the "Summer of Code" programme:
http://wiki.python.org/moin/SummerOfCode

And I hope that the latter project is reviewing some of the other work,
if only to avoid the "framework proliferation" that people keep
complaining about.

Paul

Jul 23 '05 #64

Peter Hansen

George Sakkis wrote:

"Andrew Dalke" <da***@dalkescientific.com> wrote:
I think that the string representation of a path is so important that
it *is* the path.

There are (at least) two frequently used path string representations,
the absolute and the relative to the working directory. Which one
*is* the path ? Depending on the application, one of them woud
be more natural choice than the other.

Sorry, George, but that's now how it works.

Whether using the regular string-based Python paths or the new path
module, a path *is* either absolute or relative, but cannot be both at
the same time.

This is therefore not an issue of "representation" but one of state.

-Peter

Jul 23 '05 #65

Duncan Booth

Peter Hansen wrote:

Just a note, though you probably know, that this is intended to be
written this way with path:
p / q

path(u'a/b/c/d')

I know, but it really doesn't look right to me.

I think that my fundamental problem with all of this is that by making path
a subclass of str/unicode it inherits inappropriate definitions of some
common operations, most obviously addition, iteration and subscripting.

These operations have obvious meaning for paths which is not the same as
the meaning for string. Therefore (in my opinion) the two ought to be
distinct.

Jul 23 '05 #66

Bengt Richter

On Sat, 23 Jul 2005 07:05:05 +1000, John Machin <sj******@lexicon.net> wrote:

Daniel Dittmar wrote:
Duncan Booth wrote:
I would have expected a path object to be a sequence of path elements
rather than a sequence of characters.

Maybe it's nitpicking, but I don't think that a path object should be a
'sequence of path elements' in an iterator context.

This means that

for element in pathobject:

has no intuitive meaning for me, so it shouldn't be allowed.

Try this:

A file-system is a maze of twisty little passages, all alike. Junction
== directory. Cul-de-sac == file. Fortunately it is signposted. You are
dropped off at one of the entrance points ("current directory", say).
You are given a route (a "path") to your destination. The route consists
of a list of intermediate destinations.

for element in pathobject:
follow_sign_post_to(element)

Exception-handling strategy: Don't forget to pack a big ball of string.
Anecdotal evidence is that breadcrumbs are unreliable.

<indulging what="my penchant for seeking the general behind the specific ;-)" >

ISTM a path is essentially a representation of a script whose interpretation
by an orderly choice of interpreters finally leads to accessing to some entity,
typically a serial data representation, through an object, perhaps a local proxy,
that has standard methods for accessing the utimate object's desired info.

IOW, a path sequence is like a script text that has been .splitline()'d and
and the whole sequence fed to a local interpreter, which might chew through multiple
lines on its own, or might invoke interpreters on another network to deal with the
rest of the script, or might use local interpreters for various different kinds of
access (e.g., after seeing 'c:' vs 'http://' vs '/c' vs '//c' etc. on the platform
defining the interpretation of the head element).

Turning a single path string into a complete sequence of elements is not generally possible
unless you have local knowledge of the syntax of the entire tail beyond the the prefix
you have to deal with. Therefore, a local platform-dependent Pathobject class should, I think,
only recognize prefixes that it knows how to process or delegate processing for, leaving
the interpretation of the tail to the next Pathobject instance, however selected and/or
located.

So say (this is just a sketch, mind ;-)

po = Pathobject(<string representation of whole path>)

results in a po that splits out (perhaps by regex) a prefix, a first separator/delimiter,
and the remaining tail. E.g., in class Pathobject,
def __init__(self, pathstring=None)
if pathstring is None: #do useful default??
self.pathstring = pathstring
self.prefix, self.sep, self.tail = self.splitter(pathstring)
if self.prefix in self.registered_prefixes:
self.child = self.registered_prefixes[self.prefix](self.tail)
else:
self.child = []
self.opened_obj = None

Then the loop inside a local pathobject's open method po.open()
might go something like

def open(self, *mode, **kw):
if self.child:
self.opened_obj = self.child.open(self.tail, *mode, **kw)
else:
self.opened_obj = file(self.pathstring, *mode)
return self

And closing would just go to the immediately apparent opened object, and
if that had complex closing to do, it would be its responsibility to deal
with itself and its child-derived objects.

def close(self):
self.opened_object.close()
The point is that a given pathobject could produce a new or modified pathobject child
which might be parsing urls instead of windows file system path strings or could
yield an access object producing something entirely synthetic.

A synthetic capability could easily be introduced if the local element pathobject
instance looked for e.g., 'synthetic://' as a possible first element (prefix) string representation,
and then passed the tail to a subclass defining synthetic:// path interpretation.
E.g., 'synthetic://temp_free_diskspace' could be a platform-independent way to get such info as that.

Opening 'testdata:// ...' might be an interesting way to feed test suites, if pathobject subclasses
could be registered locally and found via the head element's string representation.'

One point from this is that a path string represents an ordered sequence of elements, but is heterogenous,
and therefore has potentially heterogenous syntax reflected in string tails with syntax that should be
interpreted differently from the prefix syntax.

Each successive element of a path string effectively requires an interpreter for that stage of access
pursuit, and the chain of processing may result in different path entities/objects/representations
on different systems, with different interpretations going on, sharing only that they are part of the
process of getting access to something and providing access services, if it's not a one-shot access.

This would also be a potential way to create access to a foreign file system in pure python if desired,
so long as there was a way of accessing the raw data to build on, e.g. a raw stuffit floppy, or a raw
hard disk if there's the required privileges. Also 'zip://' or 'bzip2://' could be defined
and registered by a particular script or in an automatic startup script. 'encrypted://' might be interesting.
Or if polluting the top namespace was a problem, a general serialized data access header element
might work, e.g., 'py_sda://encrypted/...'

This is very H[ot]OTTOMH (though it reflects some thoughts I've had before, so be kind ;-)

For compatibility with the current way of doing things, you might want to do an automatic open
in the Pathobject constructor, but I don't really like that. It's easy enough to tack on ".open()"

po = Pathobject('py_sda://encrypted/...')
po.open() # plain read_only text default file open, apparently, but encrypted does binary behind the scenes
print po.read()
po.close()
Say, how about

if Pathobject('gui://message_box/yn/continue processing?').open().read().lower()!='y':
raise SystemExit, "Ok, really not continuing ;-)"

An appropriate registered subclass for the given platform, returned when the
Pathobject base class instantiates and looks at the first element on open() and delegates
would make that possible, and spelled platform-independently as in the code above.

</indulging>

Regards,
Bengt Richter

Jul 23 '05 #67

Daniel Dittmar

John Roth wrote:

However, a path as a sequence of characters has even less
meaning - I can't think of a use, while I have an application
That's true. But the arguments for path objects as strings go more in
the direction of using existing functions that expect strings.
where traversing a path as a sequence of path elements makes
perfect sense: I need to descend the directory structure, directory
by directory, looking for specific files and types.

But then your loop doesn't need the individual path elements, but rather
sub-path objects

for p in pathobj.stepdown ('/usr/local/bin'):
if p.join (searchedFile):
whatever

I'm not saying that there isn't any use for having a list of path
elements. But it isn't that common, so it should get an methodname to
make it more explicit.

Daniel

Jul 25 '05 #68

Daniel Dittmar

Terry Reedy wrote:

for dir in pathobject:
if isdir(dir): cd(dir)

*is*, in essence, what the OS mainly does with paths (after splitting the
string representation into pieces).
That's why there is rarely a need to to it in Python code.
Directory walks also work with paths as sequences (stacks, in particular).

I'd say it works with stacks of pathes, not with stacks of path elements.

I'm not saying that there isn't any use for having a list of path
elements. But it isn't that common, so it should get an methodname to
make it more explicit.

Daniel

Jul 25 '05 #69

Ron Adam

Bengt Richter wrote:

<indulging what="my penchant for seeking the general behind the specific ;-)" >
<fold>

Say, how about

if Pathobject('gui://message_box/yn/continue processing?').open().read().lower()!='y':
raise SystemExit, "Ok, really not continuing ;-)"

An appropriate registered subclass for the given platform, returned when the
Pathobject base class instantiates and looks at the first element on open() and delegates
would make that possible, and spelled platform-independently as in the code above.
I like it. ;-)

No reason why a path can't be associated to any tree based object.
</indulging>

Regards,
Bengt Richter

<more indulging>

I wasn't sure what to comment on, but it does point out some interesting
possibilities I think.

A path could be associated to any file-like object in an external (or
internal) tree structure. I don't see any reason why not.

In the case of an internal file-like object, it could be a series of
keys in nested dictionaries. Why not use a path as a dictionary
interface?

So it sort of raises the question of how tightly a path object should be
associated to a data structure? When and where should the path object
determine what the final path form should be? And how smart should it
be as a potential file-like object?

Currently the device name is considered part of the path, but if instead
you treat the device as an object, it could open up more options.

(Which would extend the pattern of your example above. I think.)

(also a sketch.. so something like...)

# Initiate a device path object.
apath = device('C:').path(initial_path)

# Use it to get and put data
afile = apath.open(mode,position,units) # defaults ('r','line',next)
aline = afile.read().next() # read next unit, or as an iterator.
afile.write(line)
afile.close()

# Manually manipulate the path
apath.append('something') # add to end of path
apath.remove() # remove end of path
alist = apath.split() # convert to a list
apath.join(alist) # convert list to a path
astring = str(apath()) # get string from path
apath('astring') # set path to string
apath.validate() # make sure it's valid

# Iterate and navigate the path
apath.next() # iterate path objects
apath.next(search_string) # iterate with search string
apath.previous() # go back
apath.enter() # enter directory
apath.exit() # exit directory

# Close it when done.
apath.close()

etc...
With this you can iterate a file system as well as it's files. ;-)

(Add more or less methods as needed of course.)
apath = device(dev_obj).path(some_path_sting)
apath.open().write(data).close()
or if you like...

device(dev_obj).append(path_sting).open().write(da ta).close()

Just a few thoughts,

Cheers,
Ron

Jul 30 '05 #70

qvx

Ron Adam wrote:

Bengt Richter wrote:
<indulging what="my penchant for seeking the general behind the specific ;-)"

There is a thing called "Asynchronous pluggable protocol". It is
Microsoft's technology (don't flame me now):

"""
Asynchronous pluggable protocols enable developers to create pluggable
protocol handlers, MIME filters, and namespace handlers that work with
Microsoft® Internet Explorer...

Applications can use pluggable protocol handlers to handle a custom
Uniform Resource Locator (URL) protocol scheme or filter data for a
designated MIME type.
"""

In other words you can develop you own plugin which would allow
Internet Explorer to open URLs like "rar://c/my/doc/book.rar". (I was
going to write plugin for .rar in order to enable offsite browsing of
downloaded portions of web sites, all from an archive file).
You could give it a look. If only to see that it is Mycrosofthonic:

http://msdn.microsoft.com/workshop/n...iews_entry.asp.
Qvx

Aug 1 '05 #71

Similar topics