PRE-PEP: new Path class

John Roth

I'm adding a thread for comments on Gerrit Holl's pre-pep, which
can be found here:

http://tinyurl.com/2578q

Frankly, I like the idea. It's about time that all of the file
and directory stuff in the os module got objectified
properly (or at least with some semblance of OO propriety!)

In the issues section:

1) Should path be a subclass of str?

No. Outside of the difficulty of deciding whether it's a
subclass of single byte or unicode strings, it's a pure
and simple case of Feature Envy. Granted, there will
be times a developer wants to use string methods, but
the most common operations should be supported directly.

2) virtual file system extensibility.

No opinion at this time. I'd like to see a couple
of attempts at an implementation first before
settling on a single design.

3) Should the / operator map joinpath.

I agree. No way. In the first place, that's a unixism
(Windows uses \, the Mac uses :) In the second
place it doesn't fit the common use of /, which is
to divide (separate) things. If we want an operator
for join (not a bad idea) I'd suggest using '+'. String
already overloads it for concatenation, and as I said
above, I'd just as soon *not* have this be a subclass
of string.

4) Should path expose an iterator for listdir(?)

I don't see why not, as long as the path is to a
directory.

5) Should == operator be the same as os.path.samefile()?

Why not...

6) Path.open()?

Of course.

7) Should the various gettime methods return Datetime
objects.

Of course.

8) Touch method?

Of course.

9) Current OS constants?

What are they? Are we talking about the four
constants in the access() function, or about something
else?

10) Commonprefix, walk and sameopenfile?

Commonprefix should be a string or list method,
it doesn't fit here.

walk is a nice function, but it should be redone to
use the visitor pattern directly, with different method
names for files, directories and whatever else a
particular file system has in it's warped little mind.

sameopenfile doesn't belong in the os.path module
in the first place. It belongs in the os module under
6.1.3 - File Descriptor Operations.

11) rename join and split.

I wouldn't bother. Since I'm against making it a
subclass of str(), the issue doesn't arise.

12) Should == compare file sizes.

No. Might have a method to do that.

13) chdir, chmod, etc?

No. This has nothing to do with pathname.

14. Unicode filenames

Have to have them on Windows and probably
on the Mac.

15. Should files and directories be the same
class.

Probably not. While they share a lot of common
functionality (which should be spelled out as an
interface) they also have a lot of dissimilar
functionality. Separating them also makes it easy
to create objects for things like symbolic links.

In addition to this, we should have the ability
to update the other times (utime()) directly
using another file or directory object as well
as a Datetime object.

John Roth

Jul 18 '05 #1

Subscribe Post Reply

3857

Just

In article <vv************@news.supernews.com>,
"John Roth" <ne********@jhrothjr.com> wrote:

I'm adding a thread for comments on Gerrit Holl's pre-pep, which
can be found here:

http://tinyurl.com/2578q

Frankly, I like the idea. It's about time that all of the file
and directory stuff in the os module got objectified
properly (or at least with some semblance of OO propriety!)

In the issues section:
[ snipping those points where I agree with John ]
4) Should path expose an iterator for listdir(?)

I don't see why not, as long as the path is to a
directory.
_An_ iterator, sure, but not __iter__. How about path.listdir()? :)
__iter__ could also iterate over the path elements, so it's ambiguous at
least.
15. Should files and directories be the same
class.

Probably not. While they share a lot of common
functionality (which should be spelled out as an
interface) they also have a lot of dissimilar
functionality. Separating them also makes it easy
to create objects for things like symbolic links.

But what about paths for not-yet-existing files of folders? I don't
think you should actually _hit_ the file system, if all your doing is
path.join().

Just

Jul 18 '05 #2

John Roth

"Just" <ju**@xs4all.nl> wrote in message
news:ju************************@news1.news.xs4all. nl...

In article <vv************@news.supernews.com>,
"John Roth" <ne********@jhrothjr.com> wrote:
I'm adding a thread for comments on Gerrit Holl's pre-pep, which
can be found here:

http://tinyurl.com/2578q

Frankly, I like the idea. It's about time that all of the file
and directory stuff in the os module got objectified
properly (or at least with some semblance of OO propriety!)

In the issues section:
[ snipping those points where I agree with John ]
4) Should path expose an iterator for listdir(?)

I don't see why not, as long as the path is to a
directory.

_An_ iterator, sure, but not __iter__. How about path.listdir()? :)
__iter__ could also iterate over the path elements, so it's ambiguous at
least.

I see what you're saying. I'd argue (softly) that iterating over
the directory entries is the natural interpretation, though.

15. Should files and directories be the same
class.

Probably not. While they share a lot of common
functionality (which should be spelled out as an
interface) they also have a lot of dissimilar
functionality. Separating them also makes it easy
to create objects for things like symbolic links.

But what about paths for not-yet-existing files of folders? I don't
think you should actually _hit_ the file system, if all you're doing is
path.join().

I agree here. I haven't looked at any of the candidate implementations
yet, so I don't know what they're doing. I'm thinking of a
three class structure: the parent class is just the path manipulations;
it has two subclasses, one for real files and one for real directories.
That way they can not only inherit all of the common path manipulation
stuff, but the developer can instantiate a pure path manipulation
class as well.

There might also be a mixin that encapsulates the stuff that's common
to real files and directories like accessing and changing dates and
permissions.

I'm sure there are use cases that will throw a curve at that structure
as well.

Just

Jul 18 '05 #3

Just

In article <vv************@news.supernews.com>,
"John Roth" <ne********@jhrothjr.com> wrote:

4) Should path expose an iterator for listdir(?)

I don't see why not, as long as the path is to a
directory.

_An_ iterator, sure, but not __iter__. How about path.listdir()? :)
__iter__ could also iterate over the path elements, so it's ambiguous at
least.

I see what you're saying. I'd argue (softly) that iterating over
the directory entries is the natural interpretation, though.

It's far too implicit to my taste; for one since it's a folder-only
operation (and I don't see much merit in having separate classes for
folder and file paths). Would you also be in favor of interating over
file-paths meaning iterating over the lines in the file?

Just

Jul 18 '05 #4

Oren Tirosh

On Mon, Jan 05, 2004 at 10:06:59AM -0500, John Roth wrote:

I'm adding a thread for comments on Gerrit Holl's pre-pep, which
can be found here:

http://tinyurl.com/2578q

Frankly, I like the idea. It's about time that all of the file
and directory stuff in the os module got objectified
properly (or at least with some semblance of OO propriety!)
"Peroperly"? There is nothing particularly "proper" or "improper"
about objects or any other programming paradigm supported by Python.
Objectifying is not a goal in itself. I like the Path object because
the interface is easier to learn and use, not because it is
"objectified".
5) Should == operator be the same as os.path.samefile()?

Why not...
No. Symbolic links are something you would sometimes want to treat as
distinct from the files they point to.
walk is a nice function, but it should be redone to
use the visitor pattern directly, with different method
names for files, directories and whatever else a
particular file system has in it's warped little mind.
I find a generator and a couple of elifs are much easier to read. No
need to define a class, pass context information to the methods of
that class, etc.
13) chdir, chmod, etc?

No. This has nothing to do with pathname.

What's the difference between chmod and touch? They both affect the
file metadata in similar ways.

Oren

Jul 18 '05 #5

Gerrit Holl

John Roth wrote:

Subject: PRE-PEP: new Path class I'm adding a thread for comments on Gerrit Holl's pre-pep, which
can be found here:

http://tinyurl.com/2578q

I will update the Pre-PEP tomorrow based on the comments I already have
seen in this PEP. Note that it is very 'pre' and opinions expressed in
the PEP are not guaranteed to be consistent in any way ;)

I will comment on the comments later.

yours,
Gerrit.

--
Asperger's Syndrome - a personal approach:
http://people.nl.linux.org/~gerrit/english/

Jul 18 '05 #6

Mike C. Fletcher

John Roth wrote:

"Just" <ju**@xs4all.nl> wrote in message
news:ju************************@news1.news.xs4all .nl...

In article <vv************@news.supernews.com>,
"John Roth" <ne********@jhrothjr.com> wrote:

....

15. Should files and directories be the same
class.

Probably not. While they share a lot of common
functionality (which should be spelled out as an
interface) they also have a lot of dissimilar
functionality. Separating them also makes it easy
to create objects for things like symbolic links.

But what about paths for not-yet-existing files of folders? I don't
think you should actually _hit_ the file system, if all you're doing is
path.join().

I agree here. I haven't looked at any of the candidate implementations
yet, so I don't know what they're doing. I'm thinking of a
three class structure: the parent class is just the path manipulations;
it has two subclasses, one for real files and one for real directories.
That way they can not only inherit all of the common path manipulation
stuff, but the developer can instantiate a pure path manipulation
class as well.

There might also be a mixin that encapsulates the stuff that's common
to real files and directories like accessing and changing dates and
permissions.

I'm sure there are use cases that will throw a curve at that structure
as well.

My implementation combines the two into a single class. Here's the logic:

* There is no necessary distinction between files and directories at
the path level
o Particularly with upcoming ReiserFS 4, where structured
storage shows up, it's possible to have files behaving much
like directories.
o Zip files also come to mind if we have virtual file system
support eventually
* These objects represent paths, not the things pointed to by the paths.
o They allow you to operate on the path, which is "almost" the
filesystem, but not quite.
o In the space of "paths", there's no distinction between a
file and a directory, really.
o Even a path that traverses a few symbolic links, and drops
into a zip-file is still just a path, it's a way of
specifying something, similar to a "name" or "location" class.
o You can find out what the path-object points to via the path
methods, but the path itself isn't those objects.
* Don't want to have to explicitly cast your paths to file/directory
to get the basic file/directory operations when joining paths.
o Mix-ins require changing the class of the instance by
somehow figuring out that it's a file, that requires a
file-system access (to what may be a non-existent or very
expensive-to-access file).
o There's not much of a conflict between the file/directory
path operations

Enjoy,
Mike

_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/

Jul 18 '05 #7

Mike C. Fletcher

John Roth wrote:
....

1) Should path be a subclass of str?

No. Outside of the difficulty of deciding whether it's a
subclass of single byte or unicode strings, it's a pure
and simple case of Feature Envy. Granted, there will
be times a developer wants to use string methods, but
the most common operations should be supported directly.

It's not the methods that make me want to use str/unicode as the base,
it's the ability to pass the resulting instances to built-in methods
that explicitly expect/require str/unicode types. Not sure how many of
the core functions/libraries still have such requirements, but I'd guess
it's a few.

That said, I don't mind making path it's own base-class, I just *really*
want to be able to pass them to path-unaware code without extra coercian
(otherwise switching a module to *producing* paths instead of raw
strings will cause the *clients* of that module to break, which is a
serious problem for rapid adoption).
3) Should the / operator map joinpath.

Agreed, no. As for using + for join, that will break a lot of code that
does things like this:

p = mymodule.getSomeFilename()
backup = p + '.bak'
copyfile( p, backup )
open( p, 'w').write( whatever )

i.e. we're thinking of returning these things in a lot of situations
where strings were previously returned, string-like operations should
IMO, be the norm. But then we disagree on that anyway ;) .
4) Should path expose an iterator for listdir(?)

I don't see why not, as long as the path is to a
directory.

Seems ambiguous to me. Also seems silly to use a generator when we're
producing a list anyway from the underlying call, might as well return
the list to allow length checks and random access. Iterators for
"ancestors" might be useful, but again, doesn't really seem like it
needs to be __iter__ instead of "ancestors".
5) Should == operator be the same as os.path.samefile()?

Why not...

__eq__ sounds about right. I gather this call goes out to the
filesystem first, though. Might be good to first check for absolute
equality (i.e. the same actual path) before doing that.
6) Path.open()?

Of course.

Ditto.
7) Should the various gettime methods return Datetime
objects.

Of course.

What are we doing for Python 2.2 then? I agree with the principle, but
we should likely have a fallback when datetime isn't available.
8) Touch method?

Of course.

Neutral, seems fine.
9) Current OS constants?

What are they? Are we talking about the four
constants in the access() function, or about something
else?

Don't know myself.
10) Commonprefix, walk and sameopenfile?

Commonprefix should be a string or list method,
it doesn't fit here.

Path commonprefix are different operations from str commonprefix. Paths
should only accept entire path-segments (names) as being equal, while
strings should accept any set of characters:

'/this/that/those/them'
'/this/thatly/those/them'

should see '/this/' as the commonprefix for the paths, not '/this/that'.
walk is a nice function, but it should be redone to
use the visitor pattern directly, with different method
names for files, directories and whatever else a
particular file system has in it's warped little mind.

Reworking walk is probably a good idea. I'll let others worry about it,
as I've re-implemented the functionality so many times for my own code
that I'm just sick of it :) .
11) rename join and split.

I wouldn't bother. Since I'm against making it a
subclass of str(), the issue doesn't arise.

No real preference one way or another here. join -> "append" for paths
seems fine. split -> "elements" or "steps" for paths also seems fine.
12) Should == compare file sizes.

No. Might have a method to do that.

Agreed, though even then, if we have a method that returns file-sizes:

path( ... ).size() == path( ... ).size()

seems almost as reasonable as having a method for it?
13) chdir, chmod, etc?

No. This has nothing to do with pathname.

chmod has to do with altering the access mode of a file/directory by
specifying it's path, no? Seems like it could readily be a method of
the path. chdir should accept a path, otherwise doesn't seem like it
should be a method.
14. Unicode filenames

Have to have them on Windows and probably
on the Mac.

Yes.
15. Should files and directories be the same
class.

Replied to this in the sub-thread...

Enjoy all,
Mike

_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/

Jul 18 '05 #8

John Roth

"Mike C. Fletcher" <mc******@rogers.com> wrote in message
news:ma**************************************@pyth on.org...

John Roth wrote:
...
1) Should path be a subclass of str?

No. Outside of the difficulty of deciding whether it's a
subclass of single byte or unicode strings, it's a pure
and simple case of Feature Envy. Granted, there will
be times a developer wants to use string methods, but
the most common operations should be supported directly.

It's not the methods that make me want to use str/unicode as the base,
it's the ability to pass the resulting instances to built-in methods
that explicitly expect/require str/unicode types. Not sure how many of
the core functions/libraries still have such requirements, but I'd guess
it's a few.

That said, I don't mind making path it's own base-class, I just *really*
want to be able to pass them to path-unaware code without extra coercian
(otherwise switching a module to *producing* paths instead of raw
strings will cause the *clients* of that module to break, which is a
serious problem for rapid adoption).

That's an excellent point, but it begs the question of which
string class it should subclass. Unless it's got some way of
changing its base class depending on the system it's running
on. That, in turn, probably violates the Principle of Least
Astonishment.

5) Should == operator be the same as os.path.samefile()?

Why not...

__eq__ sounds about right. I gather this call goes out to the
filesystem first, though. Might be good to first check for absolute
equality (i.e. the same actual path) before doing that.

I think this has to do with "conceptual integrity." Are we talking
about a path object that happens to have the ability to do file
system operations in appropriate circumstances, or are we talking
about a file system object that includes all of the usual path
manipulations? You seem to be thinking of the first approach,
and I'm thinking of the second. You're beginning to convince me,
though.

7) Should the various gettime methods return Datetime
objects.

Of course.

What are we doing for Python 2.2 then? I agree with the principle, but
we should likely have a fallback when datetime isn't available.

Do we care? If this is going into Python, it will be in 2.4 at the
earliest, with a possible addon to a late 2.3 release. I don't see
it going into 2.2 at all, although a backwards version would
be nice.

10) Commonprefix, walk and sameopenfile?

Commonprefix should be a string or list method,
it doesn't fit here.

Path commonprefix are different operations from str commonprefix. Paths
should only accept entire path-segments (names) as being equal, while
strings should accept any set of characters:

'/this/that/those/them'
'/this/thatly/those/them'

should see '/this/' as the commonprefix for the paths, not '/this/that'.

Good point if you're thinking of heterogenous collections. If you're
thinking (as I am) that an object can represent a directory, then it
seems like a singularly useless method.

walk is a nice function, but it should be redone to
use the visitor pattern directly, with different method
names for files, directories and whatever else a
particular file system has in it's warped little mind. Reworking walk is probably a good idea. I'll let others worry about it,
as I've re-implemented the functionality so many times for my own code
that I'm just sick of it :) .
I can understand that. [grin]
13) chdir, chmod, etc?

No. This has nothing to do with pathname.

chmod has to do with altering the access mode of a file/directory by
specifying it's path, no? Seems like it could readily be a method of
the path.

Right. I forgot that these are two totally different issues.
chdir should accept a path, otherwise doesn't seem like it
should be a method.
If the path object describes a directory, then I'd see
a .chdir() method as useful. Otherwise, it belongs
somewhere else, although I don't have a clue where
at the moment.

Enjoy all,
Mike

John Roth

Jul 18 '05 #9

Dan Bishop

"John Roth" <ne********@jhrothjr.com> wrote in message news:<vv************@news.supernews.com>...

I'm adding a thread for comments on Gerrit Holl's pre-pep, which
can be found here:

http://tinyurl.com/2578q
.... 1) Should path be a subclass of str?

No.

So will the file constructor be "overloaded" to accept path objects?
What about all those functions in the os module?

Jul 18 '05 #10

Just

[Mike C. Fletcher]

That said, I don't mind making path it's own base-class, I just *really*
want to be able to pass them to path-unaware code without extra coercian
(otherwise switching a module to *producing* paths instead of raw
strings will cause the *clients* of that module to break, which is a
serious problem for rapid adoption).

[John Roth] That's an excellent point, but it begs the question of which
string class it should subclass. Unless it's got some way of
changing its base class depending on the system it's running
on. That, in turn, probably violates the Principle of Least
Astonishment.

That's in fact exactly what Jason Orendorff's path module does. But it's
buggy due to os.path.supports_unicode_filenames being buggy.

It would be interesting to figure out to what extent non-string (and
non-unicode) path objects can or can't be made to work for existing
string-accepting code. I would very much prefer a path _not_ to inherit
from str or unicode, but Mike's point is an important one. What is
missing in Python to allow non-string objects to act like (unicode)
strings?

Just

Jul 18 '05 #11

Christoph Becker-Freyseng

John Roth wrote:

I'm adding a thread for comments on Gerrit Holl's pre-pep, which
can be found here:

http://tinyurl.com/2578q

Frankly, I like the idea. It's about time that all of the file
and directory stuff in the os module got objectified
properly (or at least with some semblance of OO propriety!)

Great I've been thinking of something like this, while I've been writing
a "File-Class" (I'll open a thread for this ...)

[1] I think Path being a subclass of str is odd. There are a lot of
string-operations that don't fit to path (some of them should be
implemented in a different way e.g. __mul__ if at all).
However the point with the old os function etc. is very sound. So it
might be a good idea to have Path being a subclass of str *for
transition*. But finally all those functions should call str(argument)
instead of of demanding a str-object as argument (if they don't already
today).
This determines Path.__str__ to return a valid path-string.
Path's constructor should be able to use such a string.
(when there are Path-Classes for other stuff like URLs we maybe need a
factory ...)
[21] I think file-paths and directory-paths shouldn't be the same class
(they have different meaning. Think about a "walk"-function for dirs)
But it might be hard if the path doesn't exist, how to decide whether
it's a file-path or a dir-path?
You could do the follwing (for Unixes): if its string-representation
ends with '/' it's a directory otherwise it's a file.
Bash is autocompleting this way but "cd /home" is valid to and would
cause trouble with this algorithm.
(If the path exists it's easier)

file-path and directory-path should have a common superclass.
There seems to be a distinction between existing and non-existing paths.
Probably a lot of things shared between file-paths and directory-paths
are valid for non-existing-paths.
"about a path object that happens to have the ability to do file
system operations in appropriate circumstances" [John Roth]
This is a good thing but I think there are problems: maybe the given
path does not exist.
This takes me to my last point:
What about invalid paths?
Should Path-Class take care of always being a valid path (this doesn't
necessarily mean a path of an existing file/directory)
Especially if someone uses string-methods on a Path-object there could
arise invalid paths, even if finaly the path is valid again.
The validity of filesystem-paths is os/filesystem dependendt.

Christoph Becker-Freyseng

Jul 18 '05 #12

Christoph Becker-Freyseng

Dan Bishop wrote:

"John Roth" <ne********@jhrothjr.com> wrote in message news:<vv************@news.supernews.com>...
I'm adding a thread for comments on Gerrit Holl's pre-pep, which
can be found here:

http://tinyurl.com/2578q

...
1) Should path be a subclass of str?

No.

So will the file constructor be "overloaded" to accept path objects?
What about all those functions in the os module?

IMO this is the bettter way.
However it might be useful for transition subclass str. (see my other
posting)

Jul 18 '05 #13

Gerrit Holl

Hi,

I have updated the PEP, partly based on comments by other, partly on my
own thoughts. Tinyurl seems down, the latest version is available at:

http://people.nl.linux.org/~gerrit/c.../pep-xxxx.html

yours,
Gerrit.

--
263. If he kill the cattle or sheep that were given to him, he shall
compensate the owner with cattle for cattle and sheep for sheep.
-- 1780 BC, Hammurabi, Code of Law
--
Asperger's Syndrome - a personal approach:
http://people.nl.linux.org/~gerrit/english/

Jul 18 '05 #14

Gerrit Holl

Mike C. Fletcher wrote:

9) Current OS constants?

What are they? Are we talking about the four
constants in the access() function, or about something
else?

Don't know myself.

I meant os.path constants: curdir, pathsep, defpath, etc.
They should be included.

use the visitor pattern directly, with different method
names for files, directories and whatever else a
particular file system has in it's warped little mind.

Reworking walk is probably a good idea. I'll let others worry about it,
as I've re-implemented the functionality so many times for my own code
that I'm just sick of it :) .

I think os.walk is good as it is.

yours,
Gerrit.

--
34. If a ... or a ... harm the property of a captain, injure the
captain, or take away from the captain a gift presented to him by the
king, then the ... or ... shall be put to death.
-- 1780 BC, Hammurabi, Code of Law
--
Asperger's Syndrome - a personal approach:
http://people.nl.linux.org/~gerrit/english/

Jul 18 '05 #15

Gerrit Holl

Christoph Becker-Freyseng wrote:

[1] I think Path being a subclass of str is odd. There are a lot of
string-operations that don't fit to path (some of them should be
implemented in a different way e.g. __mul__ if at all).
However the point with the old os function etc. is very sound. So it
might be a good idea to have Path being a subclass of str *for
transition*. But finally all those functions should call str(argument)
instead of of demanding a str-object as argument (if they don't already
today).
Another possibility, which I have put in the Pre-PEP, is;

We can add a method .openwith(), which takes a callable as it's first
argument: p.openwith(f, *args) would result in f(str(p), *args). This
would make p.open(*args) a shorthand for p.openwith(file, args).

What do you think?
This takes me to my last point:
What about invalid paths?
Should Path-Class take care of always being a valid path (this doesn't
necessarily mean a path of an existing file/directory)
It may be a good idea to do so. At first, I didn't understand what it
meant, an 'invalid path', but let's define it as anything that triggers
a TypeError when passed to open or listdir. On POSIX, I know only one
case: \0 in path. It may be a lot more difficult on Windows or the Mac.
I'm not sure about this idea yet.
Especially if someone uses string-methods on a Path-object there could
arise invalid paths, even if finaly the path is valid again.

Yes. But I can't really think of a use case for doing operations on a
path which make it invalid. Does it occur in practice?

yours,
Gerrit.

--
123. If he turn it over for safe keeping without witness or contract,
and if he to whom it was given deny it, then he has no legitimate claim.
-- 1780 BC, Hammurabi, Code of Law
--
Asperger's Syndrome - a personal approach:
http://people.nl.linux.org/~gerrit/english/

Jul 18 '05 #16

Gerrit Holl

Just wrote:

In article <vv************@news.supernews.com>,
"John Roth" <ne********@jhrothjr.com> wrote:
4) Should path expose an iterator for listdir(?)

I don't see why not, as long as the path is to a
directory.

_An_ iterator, sure, but not __iter__. How about path.listdir()? :)
__iter__ could also iterate over the path elements, so it's ambiguous at
least.

I think it should be called .list(): this way, it is better extendable
to archive files like zip and tar. Indeed: I know at least 3 different
possibilities for path.__iter__. Because of "In the face of ambiguity,
refuse the temptation to guess.", I think there should be no __iter__
(which is even another reason not to subclass __str__, by the way) [0].

15. Should files and directories be the same
class.

Probably not. While they share a lot of common
functionality (which should be spelled out as an
interface) they also have a lot of dissimilar
functionality. Separating them also makes it easy
to create objects for things like symbolic links.

But what about paths for not-yet-existing files of folders? I don't
think you should actually _hit_ the file system, if all your doing is
path.join().

Another problem is dat I may not know whether I have a file or a
directory. If a directory is a different type than a file, it would
probably have a different constructor as well, and I may end up doing:

p = path(foo)
if p.isdir():
p = dirpath(foo)

If this is done implicitly, you can't create a path without
fs-interaction, which is bad for virtual-fs extensibility and confusing
if it doesn't mean a path always exists [1].

[0] http://people.nl.linux.org/~gerrit/c...-foo-in-mypath
[1] http://people.nl.linux.org/~gerrit/c...l#absent-paths

yours,
Gerrit.

--
257. If any one hire a field laborer, he shall pay him eight gur of
corn per year.
-- 1780 BC, Hammurabi, Code of Law
--
Asperger's Syndrome - a personal approach:
http://people.nl.linux.org/~gerrit/english/

Jul 18 '05 #17

Gerrit Holl

[PEP]

7) Should the various gettime methods return Datetime
objects.
[John Roth]Of course.
I was hasitating, because of the backwards-compability, but this should
not be a reason not to make things better, of course (TV isn't backward
compatible with radio either ;)

[Mike C. Fletcher]
What are we doing for Python 2.2 then? I agree with the principle, but
we should likely have a fallback when datetime isn't available.
[John Roth]
Do we care? If this is going into Python, it will be in 2.4 at the
earliest, with a possible addon to a late 2.3 release. I don't see
it going into 2.2 at all, although a backwards version would
be nice.
If the PEP will be finished and may be accepted, I think that the
roadmap of introducing the feature will be like sets: In 2.4, it's a
library, and if it's succesful/popular, it may become a builtin in 2.5.
Path commonprefix are different operations from str commonprefix. Paths
should only accept entire path-segments (names) as being equal, while
strings should accept any set of characters:

'/this/that/those/them'
'/this/thatly/those/them'

should see '/this/' as the commonprefix for the paths, not '/this/that'.
In should... it doesn't seem to do so currently.
Good point if you're thinking of heterogenous collections. If you're
thinking (as I am) that an object can represent a directory, then it
seems like a singularly useless method.

The only place where I can think of a use is a tarfile/zipfile. For a
path, it means nothing. It can be useful but since it needs multiple
paths, it can't be a method. The only thing I can think of is a
classmethod, a constructor, but I don't really like the idea much.
13) chdir, chmod, etc?

No. This has nothing to do with pathname.

Is p.chdir() better or worse than chdir(p)?
The latter reads better, but that may be because we're used to it.
But on the other hand, that may be a convincing argument just as well :)
chdir should accept a path, otherwise doesn't seem like it
should be a method.

If the path object describes a directory, then I'd see
a .chdir() method as useful. Otherwise, it belongs
somewhere else, although I don't have a clue where
at the moment.

I think there should be no distinction between files and directories,
and that p.chdir() for a non-directory should raise the same exception
as currently (OSError Errno 20).

yours,
Gerrit.

--
27. If a chieftain or man be caught in the misfortune of the king
(captured in battle), and if his fields and garden be given to another and
he take possession, if he return and reaches his place, his field and
garden shall be returned to him, he shall take it over again.
-- 1780 BC, Hammurabi, Code of Law
--
Asperger's Syndrome - a personal approach:
http://people.nl.linux.org/~gerrit/english/

Jul 18 '05 #18

Bernhard Herzog

"John Roth" <ne********@jhrothjr.com> writes:

5) Should == operator be the same as os.path.samefile()?

Why not...
ISTM, that it would essentially make path objects non-hashable.
posixpath.samefile compares the os.stat values for both filenames.
These values can change over time vor various reasons so object equality
changes too. That it changes is desired, obviously, but what do you use
as hash value? p1 == p2 has to imply that hash(p1) == hash(p2) but the
only way to achieve that is to give all path objects the same hash
value. That would make dicts with path objects as keys very
inefficient, though.

As for the path objects in general: Cute idea, but IMO strings for
filenames work fine and there's nothing unpythonic about it. The
virtual filesytem bit seems like a good reason not to introduce the path
type just yet:
2) virtual file system extensibility.

No opinion at this time. I'd like to see a couple
of attempts at an implementation first before
settling on a single design.

At this point it doesn't seem clear what virtual filesystems would mean
for Python, so it's unclear, too, what it would mean for a Path class.
Introduce a Path class once there is a need for having several distinct
classes, not earlier.

Bernhard

--
Intevation GmbH http://intevation.de/
Sketch http://sketch.sourceforge.net/
Thuban http://thuban.intevation.org/

Jul 18 '05 #19

Gerrit Holl

Christoph Becker-Freyseng wrote:

openwith would be a nice add-on. I see two problems with it:
1.) it's long. I mean
f(str(path), *args)
is shorter than
path.openwith(f, *args)
This is indead a disadvantage. On the other hand, although p.openwith is
longer, I do think it is more readable. It occurs often that shorter is
not more readable: just think of all those 'obfuscated-oneliners'
contests in C and Perl.
path > (f, arg1, arg2, ...)

(this works by overwriting path.__gt__)
I think this is not a good idea. In my opinion, any __gt__ method should
always compare, no more, no less. Further, it's very unusal to call
something like this.

Another possibility is defining __call__:

path(f, *args) == f(str(path), *args)

which may be unconvinient as well, however. Is it intuitive to let
calling mean opening?
2.) the position of the argument for the path can only be the first one.
(maybe one could misuse even more operators e.g. the __r...__ ones; But
I think this will result in obscure code)
Hm, I think almost all file constructors have the path as the first
argument. Are there counter-examples?
path.open shouldn't always call the ordinary file-constructor. I mean it
should be changed for special path-classes like FTPPath ...
(Of course for ordinary file-paths the ordinary file-class is the right
one.)
Yes... I was planning to define it in a class which is able to 'touch'
the filesystem, so an FTPPath could subclass basepath without the need
to overload open, or subclass ReadablePath with this need.
Additionaly path-validity is filesystem-dependend. And worse on system
like Linux there can be more than one file system within the same root /
and they all could have different restrictions!
(if I were a viscious guy I could even umount and mount a different fs
making a valid path possibly invalid)
I think this makes validating a path essentially impossible to get
right. Let's say we can declare path to be invalid, but we can't declare
a path to be valid. Is it a good thing to add a method for it then? (I
think yes)
So I think the better solution would be to define a
path.isValid()
I agree that it's better. We should allow invalid paths after all.
We also need a
path.exists()
method.
Sure.
I'm not sure how both should interact ??
Well, if path.exists(), path.isValid(). The question is - should
path.isValid() read the filesystem?
Another Point:
Should Path be immutable like string?

I have though about this, too. It should certainly not be fully mutable,
because if a path changes, it changes. But maybe we could have a
..normalise_inplace() which mutates the Path? What consequences would
this have for hashability?

I like paths to be hashable. so they probably should be immutable.

yours,
Gerrit.

--
16. If any one receive into his house a runaway male or female slave of
the court, or of a freedman, and does not bring it out at the public
proclamation of the major domus, the master of the house shall be put to
death.
-- 1780 BC, Hammurabi, Code of Law
--
PrePEP: Builtin path type
http://people.nl.linux.org/~gerrit/c.../pep-xxxx.html
Asperger's Syndrome - a personal approach:
http://people.nl.linux.org/~gerrit/english/

Jul 18 '05 #20

Gerrit Holl

John Roth wrote:

I'm adding a thread for comments on Gerrit Holl's pre-pep, which
can be found here:

http://tinyurl.com/2578q

I have updated a lot of things on the PEP in the past few days.
There are still larger and smaller open issues, though, besides the
usual 'I-can-change-my-mind-and-the-PEP-can-change-its-mind' things:

Quoting my own PEP:

Should path.__eq__ match path.samefile?

There are at least 2 possible ways to do it:

- Normalize both operands by checking to which actual file they
point (same (l)stat).
- Try to find out whether the paths point to the same filesystem
entry, without doing anything with the filesystem.

pro
- A path usually points to a certain place on the filesystem, and
two paths with different string representations may point to the same place,
which means they are essentially equal in usage.
con
- We would have to choose a way, so we should first decide which is
better and whether the difference is intuitive enough.
- It makes hashing more difficult/impossible.
conclusion
- I don't know.
links
- Bernard Herzog `points out
<http://mail.python.org/pipermail/python-list/2004-January/201857.html>`__
that it would essentialy make path-objects non-hashable.
- `James Orendorff's Path`_ inherits str.__eq__.
- `Mike C. Fletcher's Path`_ chooses for the first variant.

Do we need to treat unicode filenames specially?

I have no idea.

links
- An `explanation
<http://mail.python.org/pipermail/python-list/2004-January/201418.html>`__
by Martin von Loewis.

- should os.tempnam be included?
- can normpath be coded using only os.path constants (if so, it's in the
'platform-independent' class? (I think no)
- Should normalize be called normalized or not?
- Should stat be defined in the platform-dependent or -independent class?
- Should we include chdir and chroot? If so, where?
- Should rename return a new path object?
- Should renames be included?

And one meta-question:

Shall I submit this as an official PEP? Or shall I first fill in more
open issues and perhaps give the possibility to change "closed" issues?

See also: http://people.nl.linux.org/~gerrit/c.../pep-xxxx.html

yours,
Gerrit.

--
202. If any one strike the body of a man higher in rank than he, he
shall receive sixty blows with an ox-whip in public.
-- 1780 BC, Hammurabi, Code of Law
--
PrePEP: Builtin path type
http://people.nl.linux.org/~gerrit/c.../pep-xxxx.html
Asperger's Syndrome - a personal approach:
http://people.nl.linux.org/~gerrit/english/

Jul 18 '05 #21

Christoph Becker-Freyseng

Gerrit Holl wrote:

Christoph Becker-Freyseng wrote:
openwith would be a nice add-on. I see two problems with it:
1.) it's long. I mean
f(str(path), *args)
is shorter than
path.openwith(f, *args)

This is indead a disadvantage. On the other hand, although p.openwith is
longer, I do think it is more readable. It occurs often that shorter is
not more readable: just think of all those 'obfuscated-oneliners'
contests in C and Perl.

Sure. However this case seems to be quite obvious both ways.
path > (f, arg1, arg2, ...)

(this works by overwriting path.__gt__)

I think this is not a good idea. In my opinion, any __gt__ method should
always compare, no more, no less. Further, it's very unusal to call
something like this.

Additionaly getting the right documentation for these "operator-tricks"
is harder.
Another possibility is defining __call__:

path(f, *args) == f(str(path), *args)

which may be unconvinient as well, however. Is it intuitive to let
calling mean opening? I like this one. What else could calling a path mean?
2.) the position of the argument for the path can only be the first one.
(maybe one could misuse even more operators e.g. the __r...__ ones; But
I think this will result in obscure code)

Hm, I think almost all file constructors have the path as the first
argument. Are there counter-examples?

The whole openwith (other then path.open) is IMO mainly for
"backward-compatibility" if the function doesn't know the path-class.
I think openwith or better __call__ could be used for other things, too
--- not only for opening a file. E.g. there could be some
"FileWatcher-Modules" that might only accept strings and have a call like:
watchFile(onChangeFunc, path_string)

For different postition of the path_string we could make a special case
if the path-object is given as an argument of the call.
path(f, arg1, arg2, path, arg3, arg4, ...)
results in: f(arg1, arg2, str(path), arg3, arg4, ...)

Changing old code to use the new Path-class could be done with a minimal
amount of work then.
OLD: result= f(arg1, arg2, path, arg3, arg4, ...) # path is/was a string
here
..... result= f,arg1, arg2, path, arg3, arg4, ...)
..... result= (f, arg1, arg2, path, arg3, arg4, ...)
NEW: result= path(f, arg1, arg2, path, arg3, arg4, ...)

path.open shouldn't always call the ordinary file-constructor. I mean it
should be changed for special path-classes like FTPPath ...
(Of course for ordinary file-paths the ordinary file-class is the right
one.)

Yes... I was planning to define it in a class which is able to 'touch'
the filesystem, so an FTPPath could subclass basepath without the need
to overload open, or subclass ReadablePath with this need.

Fine :-)
Christoph Becker-Freyseng
P.S.: ((I'll post the other things in different Re:'s))

Jul 18 '05 #22

Christoph Becker-Freyseng

Gerrit Holl wrote:

Christoph Becker-Freyseng wrote:
Additionaly path-validity is filesystem-dependend. And worse on system
like Linux there can be more than one file system within the same root /
and they all could have different restrictions!
(if I were a viscious guy I could even umount and mount a different fs
making a valid path possibly invalid)

I think this makes validating a path essentially impossible to get
right. Let's say we can declare path to be invalid, but we can't declare
a path to be valid. Is it a good thing to add a method for it then? (I
think yes)

So I think the better solution would be to define a
path.isValid()

I agree that it's better. We should allow invalid paths after all.

Yes. path.isValid would it make possible to check better for situations
were calling things like mkdir and mkdirs (they directly depend on the
path being valid) makes trouble.
We could also add an InvalidPathException. Which will at least help
debugging. isValid could have a default argument "raiseExc=False" to
make checking in these functions convienent e.g.
def mkdir(self):
self.isValid(raiseExc=True)
moreStuff ...
If the path is invalid it will stop with an InvalidPathException.

Also path.exists should depend on path.isValid (not the other way).
If the full-path doesn't exist one can go up all the parent-dirs until
one exist. Here we can check if the specified sub-path is valid by
getting some information about the filesystem where the parent-dir is
stored. *this implicit makes isValid a reading method* --- however AFAIK
isValid is only needed for reading and writing methods.
We also need a
path.exists()
method.

Sure.

I'm not sure how both should interact ??

Well, if path.exists(), path.isValid(). The question is - should
path.isValid() read the filesystem?

Yes as stated above.

path.exists should at first check if the path isValid at all. If it
isn't a statement about it's existance is senseless. In this case it
should return None, which evaluates also False but is different (it's a
"dreiwertige Logik" --- when you have 3 states (true, false, unknown)
how is this called in English)

FIXME: We have to finetune the "recursive" behavior of isValid and
exists otherwise we have a lot of unnecessary calls as exists and
isValid call each other going one dir up ...
Christoph Becker-Freyseng

Jul 18 '05 #23

Christoph Becker-Freyseng

Gerrit Holl wrote:

Christoph Becker-Freyseng wrote:
Should Path be immutable like string?

I have though about this, too. It should certainly not be fully mutable,
because if a path changes, it changes. But maybe we could have a
.normalise_inplace() which mutates the Path? What consequences would
this have for hashability?

I like paths to be hashable. so they probably should be immutable.

Yes. (already in the PEP)
While paths aren't strings they have a lot in common because paths (as I
now think of them) are not directly associated with files. (Paths can be
nonexistent or even invalid)

Moreover the basic operations like __eq__ shouldn't be reading methods ()!

__hash__ has to be compatible with __eq__.
hash(p1) == hash(p2) <<<=== p1 == p2

Also
hash(p1) == hash(p2) ===>>> p1 == p2
should be true as far as possible.

I think
def __hash__(self):
return hash(str(self.normalized()))
would do this fine.

So for __eq__ it follows naturaly
def __eq__(self, other):
FIXME: isinstance checking
return (str(self.normalized()) == str(other.normalized()))
It cares about nonexistent paths, too. (The samefile-solution won't ---
we might code a special case for it ...)
What about __cmp__?
I've to admit that __cmp__ comparing the file-sizes is nice (__eq__=
samefile is attractive, too --- they're both evil temptations :-) )

However __eq__ and __cmp__ returning possibly different results is odd.
Finaly implementing __cmp__ that way would make it a reading method and
is problematic for nonexistent paths.

I'd like an implementation of __cmp__ which is more path specific than
just string.__cmp__. But it should be consistent with __eq__.
Could we do something about parent and sub dirs?

Christoph Becker-Freyseng

Jul 18 '05 #24

Christoph Becker-Freyseng

I think the implementation should be changed for the "NormalFSPath".

def exists(self):
try:
os.stat(str(self))
return True
except OSError, exc: # Couldn't stat so what's up
if exc.errno == errno.ENOENT: # it simply doesn't exist
return False
return None # the path is invalid

def isValid(self, raiseExc=False):
if self.exists() is None:
if raiseExc:
raise InvalidPath
else:
return False
else:
return True

Christoph Becker-Freyseng

Jul 18 '05 #25

Bernhard Herzog

Christoph Becker-Freyseng <we*******@beyond-thoughts.com> writes:

So for __eq__ it follows naturaly
def __eq__(self, other):
FIXME: isinstance checking
return (str(self.normalized()) == str(other.normalized()))
It cares about nonexistent paths, too. (The samefile-solution won't ---
we might code a special case for it ...)

What exactly does normalized() do? If it's equivalent to
os.path.normpath, then p1 == p2 might be true even though they refer to
different files (on posix, a/../b is not necessarily the same file as
b). OTOH, if it also called os.path.realpath too to take symlinks into
account, __eq__ would depend on the state of the filesystem which is
also bad.

IMO __eq__ should simply compare the strings without any modification.
If you want to compare normalized paths you should have to normalize
them explicitly.
Bernhard

--
Intevation GmbH http://intevation.de/
Sketch http://sketch.sourceforge.net/
Thuban http://thuban.intevation.org/

Jul 18 '05 #26

Christoph Becker-Freyseng

Bernhard Herzog wrote:

Christoph Becker-Freyseng <we*******@beyond-thoughts.com> writes:

So for __eq__ it follows naturaly
def __eq__(self, other):
FIXME: isinstance checking
return (str(self.normalized()) == str(other.normalized()))
It cares about nonexistent paths, too. (The samefile-solution won't ---
we might code a special case for it ...)

What exactly does normalized() do? If it's equivalent to
os.path.normpath, then p1 == p2 might be true even though they refer to

IMO yes. different files (on posix, a/../b is not necessarily the same file as
b). OTOH, if it also called os.path.realpath too to take symlinks into
account, __eq__ would depend on the state of the filesystem which is
also bad.

IMO __eq__ should simply compare the strings without any modification.
If you want to compare normalized paths you should have to normalize
them explicitly.

I agree with that. While it would be nice if __eq__ could match such
things it is ambiguous.
So better let __eq__ be a bit strict than faulty.

Christoph Becker-Freyseng

Jul 18 '05 #27

Christoph Becker-Freyseng

As I pointed out path.__cmp__ should not be used for e.g. comparing
filesizes.

But features like sorting on filesizes are very useful.
I'm not sure if Gerrit Holl already meant this in his conclusion on
"Comparing files" in the PEP.
I'll outline it a bit ...

I propose a callable singleton class which only instance we assign to
sort_on (defined in the path-module).
It will have methods like: filesize, extension, filename, etc.
They will all be defined like:
def filesize(self, path1, path2):
try:
return path1._cmp_filesize(path2)
except XXX: # catch Exceptions that are raised because path1 doesn't
know how to compare with path2 (for different path-subclasses)
XXX
try:
return (-1) * path2._cmp_filesize(path1) # is this the best way to do
this?
except XXX:
XXX
raise YYY # "path1 and path2 can't be compared on filesize; class1 and
class2 are not compatible"

And
def __call__(self, *args):
if len(args) == 0:
return self.filesize # example!
elif len(args) == 1: # allow comparing uncommon things for subclasses
of path e.g. ServerName/IPs for FTPPath ...
def cmp_x(path1, path2, what=str(args[0])):
# like filesize but
pathCmpFunc= getattr(path1, '_cmp_'+what)
return pathCmpFunc(path2)
# Catch exceptions ...
return cmp_x
elif len(args) == 2: # default comparison
return self.filesize(path1, path2) # example!
else:
raise "Won't work ... FIXME"

Then we can have things like:

l= [path1, path2, path3]
l.sort(path.sort_on.filesize)
l.sort(path.sort_on.extension)

.....
I like this :-)

What do You think?

Christoph Becker-Freyseng

Jul 18 '05 #28

Jp Calderone

On Fri, Jan 09, 2004 at 07:41:32PM +0100, Christoph Becker-Freyseng wrote:

As I pointed out path.__cmp__ should not be used for e.g. comparing
filesizes.

But features like sorting on filesizes are very useful.
I'm not sure if Gerrit Holl already meant this in his conclusion on
"Comparing files" in the PEP.
I'll outline it a bit ...

This seems to be covered by the new builtin DSU support which will exist
in 2.4. See the (many, many) posts on python-dev on the "groupby" iterator:

http://mail.python.org/pipermail/pyt...er/thread.html

In particular, the ones talking about `attrget'.

Jp

Jul 18 '05 #29

Christoph Becker-Freyseng

Gerrit Holl wrote:

John Roth wrote:
I'm adding a thread for comments on Gerrit Holl's pre-pep, which
[...] Shall I submit this as an official PEP? Or shall I first fill in more
open issues and perhaps give the possibility to change "closed" issues?

I think there are still a lot of issues. I think letting settle things
down at first is wiser. And then present a PEP where many (even better
all) contributors agree. (I didn't like the result of PEP308 very much ...)

In additions there are still good points in the older discussions and
existing modules that should be integrated (You linked in the prePEP).
Moreover I'd like to extend the pre-PEP (and PEP):
"PEP xxx: new path module"
Because there is more than just the Path-Class: BaseClasses, Exceptions,
Helper-Functions ...

Christoph Becker-Freyseng

Jul 18 '05 #30

Gerrit Holl

Christoph Becker-Freyseng wrote:

Gerrit Holl wrote:
John Roth wrote:
I'm adding a thread for comments on Gerrit Holl's pre-pep, which
[...]
Shall I submit this as an official PEP? Or shall I first fill in more
open issues and perhaps give the possibility to change "closed" issues?

I think there are still a lot of issues. I think letting settle things
down at first is wiser. And then present a PEP where many (even better
all) contributors agree. (I didn't like the result of PEP308 very much ...)

Yes. But of course, a PEP being an official PEP does not mean there
can't be any more changes to it. So the question is, at what point does
a pre-PEP become a PEP? Some PEPs have a $Revision: 1.20$, after all.
In additions there are still good points in the older discussions and
existing modules that should be integrated (You linked in the prePEP).

Yes, that's true. I'll do that.

yours,
Gerrit.

--
49. If any one take money from a merchant, and give the merchant a
field tillable for corn or sesame and order him to plant corn or sesame in
the field, and to harvest the crop; if the cultivator plant corn or sesame
in the field, at the harvest the corn or sesame that is in the field shall
belong to the owner of the field and he shall pay corn as rent, for the
money he received from the merchant, and the livelihood of the cultivator
shall he give to the merchant.
-- 1780 BC, Hammurabi, Code of Law
--
PrePEP: Builtin path type
http://people.nl.linux.org/~gerrit/c.../pep-xxxx.html
Asperger's Syndrome - a personal approach:
http://people.nl.linux.org/~gerrit/english/

Jul 18 '05 #31

Christoph Becker-Freyseng

I've found some links that might be interresting.

http://www.w3.org/2000/10/swap/uripath.py
http://dev.w3.org/cvsweb/2000/10/swa...h.html?rev=1.9 [the doc for it]

Links of the operator.attrgetter / "groupby" iterator discussion in
python-dev (Jp Calderone posted a hint on it already --- I chose some
exemplary posts)

http://mail.python.org/pipermail/pyt...er/040614.html
http://mail.python.org/pipermail/pyt...er/040628.html
http://mail.python.org/pipermail/pyt...er/040643.html

Christoph Becker-Freyseng

Jul 18 '05 #32

PRE-PEP: new Path class

Similar topics