[path-PEP] Path inherits from basestring again

Reinhold Birkenfeld

Hi,

the arguments in the previous thread were convincing enough, so I made the
Path class inherit from str/unicode again.

It still can be found in CVS: /python/nondist/sandbox/path/{path.py,test_path.py}

One thing is still different, though: a Path instance won't compare to a regular
string.

Other minor differences, as requested on python-dev, are:

* size property -> getsize() method.
* atime/mtime/ctime properties -> atime()/mtime()/ctime() methods

* dirname() method -> directory property
* no parent property
* basename() method -> basename property
* no name property

* listdir() method -> children() method
* there is still a listdir() method, but with the semantics of os.listdir
* dirs() method -> subdirs() method
* joinpath() method -> added alias joinwith()
* splitall() method -> parts() method

* Default constructor: Path() == Path(os.curdir)
* staticmethod Path.getcwd() -> Path.cwd()

* bytes() / lines() / text() -> read_file_{bytes,lines,text} methods
* write_{bytes,lines,text} -> write_file_{bytes,lines,text} methods

These may be removed though.

Reinhold

Jul 23 '05 #1

Subscribe Post Reply

3213

Peter Hansen

Reinhold Birkenfeld wrote:

One thing is still different, though: a Path instance won't compare to a regular
string.
Could you please expand on what this means? Are you referring to doing
< and >= type operations on Paths and strings, or == and != or all those
or something else entirely?
Other minor differences, as requested on python-dev, are:

* size property -> getsize() method.
* atime/mtime/ctime properties -> atime()/mtime()/ctime() methods
What does this mean? The .size property and a getsize() method both
already exist (in my copy of path.py anyway) and do the same thing.
Same with the other ones mentioned above. Is someone working from an
out-of-date copy of path.py?
* dirs() method -> subdirs() method
Given that .files() exists, and returns a list of the files contained in
a path which represents a folder, why would one want to use subdirs()
instead of just dirs() to do the same operation for contained folders?
If subdirs() is preferred, then I suggest subfiles() as well. Otherwise
the change seems arbitrary and ill-conceived.
* joinpath() method -> added alias joinwith()
* splitall() method -> parts() method
This reminds me of the *one* advantage I can think of for not
subclassing basestring, though it still doesn't make the difference in
my mind: strings already have "split()", so Jason had to go with
"splitpath()" for the basic split operation to avoid a conflict. A
minor wart I guess.
* Default constructor: Path() == Path(os.curdir)
To construct an empty path then one can still do Path('') ?
* staticmethod Path.getcwd() -> Path.cwd()

* bytes() / lines() / text() -> read_file_{bytes,lines,text} methods
* write_{bytes,lines,text} -> write_file_{bytes,lines,text} methods

Under Linux isn't it possible to open and read from directories much as
with files? If that's true, the above would seem to conflict with that
in some way. As with the the .subdirs() suggestion above, these changes
seem to me somewhat arbitrary. .bytes() and friends have felt quite
friendly in actual use, and I suspect .read_file_bytes() will feel quite
unwieldy. Not a show-stopper however.

-Peter

Jul 23 '05 #2

Robert Kern

Peter Hansen wrote:

Under Linux isn't it possible to open and read from directories much as
with files?

Not really, no.

Python 2.3.4 (#2, Jan 5 2005, 08:24:51)
[GCC 3.3.5 (Debian 1:3.3.5-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

d = open('/usr/bin') Traceback (most recent call last):
File "<stdin>", line 1, in ?
IOError: [Errno 21] Is a directory

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Jul 23 '05 #3

Reinhold Birkenfeld

Peter Hansen wrote:

Reinhold Birkenfeld wrote:
One thing is still different, though: a Path instance won't compare to a regular
string.

Could you please expand on what this means? Are you referring to doing
< and >= type operations on Paths and strings, or == and != or all those
or something else entirely?

All of these. Do you need them?

Other minor differences, as requested on python-dev, are:

* size property -> getsize() method.
* atime/mtime/ctime properties -> atime()/mtime()/ctime() methods

What does this mean? The .size property and a getsize() method both
already exist (in my copy of path.py anyway) and do the same thing.
Same with the other ones mentioned above. Is someone working from an
out-of-date copy of path.py?

No. But the size of a file is somewhat volatile, and does not feel like
a "property" of the path to it. Remember: the path is not the file. Same
goes with the xtime() methods.

Different is the basename/directory/etc.: as long as the path stays the same,
these properties will stay the same.

* dirs() method -> subdirs() method

Given that .files() exists, and returns a list of the files contained in
a path which represents a folder, why would one want to use subdirs()
instead of just dirs() to do the same operation for contained folders?
If subdirs() is preferred, then I suggest subfiles() as well. Otherwise
the change seems arbitrary and ill-conceived.

Well, I think that's right. Will change back to dirs().

* joinpath() method -> added alias joinwith()
* splitall() method -> parts() method

This reminds me of the *one* advantage I can think of for not
subclassing basestring, though it still doesn't make the difference in
my mind: strings already have "split()", so Jason had to go with
"splitpath()" for the basic split operation to avoid a conflict. A
minor wart I guess.

At the moment, I think about overriding certain string methods that make
absolutely no sense on a path and raising an exception from them.

* Default constructor: Path() == Path(os.curdir)

To construct an empty path then one can still do Path('') ?

Yes.

* staticmethod Path.getcwd() -> Path.cwd()

* bytes() / lines() / text() -> read_file_{bytes,lines,text} methods
* write_{bytes,lines,text} -> write_file_{bytes,lines,text} methods

Under Linux isn't it possible to open and read from directories much as
with files? If that's true, the above would seem to conflict with that
in some way. As with the the .subdirs() suggestion above, these changes
seem to me somewhat arbitrary. .bytes() and friends have felt quite
friendly in actual use, and I suspect .read_file_bytes() will feel quite
unwieldy. Not a show-stopper however.

It has even been suggested to throw them out, as they don't have so much to
do with a path per se. When the interface is too burdened, we'll have less
chance to be accepted. Renaming these makes clear that they are not operations
on the path, but on a file the path points to.

Phillip J. Eby suggested these to be set_file_xxx and get_file_xxx to demonstrate
that they do not read or write a stream; how about that?

Reinhold

Jul 23 '05 #4

Peter Hansen

Reinhold Birkenfeld wrote:

Peter Hansen wrote (on Paths not allowing comparison with strings):
Could you please expand on what this means? Are you referring to doing
< and >= type operations on Paths and strings, or == and != or all those
or something else entirely?
All of these. Do you need them?

I believe so. If they are going to be basestring subclasses, why should
they be restricted in any particular way? I suppose that if you wanted
to compare a Path to a string, you could just wrap the string in a Path
first, but if the Path is already a basestring subclass, why make
someone jump through that particular hoop?

Other minor differences, as requested on python-dev, are:

* size property -> getsize() method.
* atime/mtime/ctime properties -> atime()/mtime()/ctime() methods

What does this mean? The .size property and a getsize() method both
already exist (in my copy of path.py anyway) and do the same thing.
Same with the other ones mentioned above. Is someone working from an
out-of-date copy of path.py?

No. But the size of a file is somewhat volatile, and does not feel like
a "property" of the path to it. Remember: the path is not the file. Same
goes with the xtime() methods.

Oh, so your original text was meant to imply that those properties *were
being removed*. That wasn't at all clear to me.

I understand the reasoning, but I'm unsure I agree with it. I fully
accept that the path is not the file, and yet I have a feeling this is a
pedanticism: most of the time when one is dealing with the _file_ one is
concerned with the content, and not much else. When one is dealing with
the _path_ one often wants to check the size, the modification time, and
so forth. For example, once one has the file open, one very rarely is
interested in when it was last modified.

In other words, I feel once again that Jason's original intuition here
was excellent, and that he chose practicality over purity in appropriate
ways, in a very Pythonic fashion. I confess to feeling that the
suggested changes are being proposed by those who have never actually
tried to put path.py to use in practical code, though I'm sure that's
not the case for everyone making those suggestions.

Still, once again this doesn't seem a critical issue to me and I'm happy
with either approach, if it means Path gets accepted in the stdlib.
At the moment, I think about overriding certain string methods that make
absolutely no sense on a path and raising an exception from them.
That would seem reasonable. It seems best to be very tolerant about
what "makes no sense", though istitle() would surely be one of those to
go first. Also capitalize() (in spite of what Windows Explorer seems to
do sometimes), center(), expandtabs(), ljust(), rjust(), splitlines(),
title(), and zfill(). Hmm... maybe not zfill() actually. I could
imagine an actual (if rare) use for that.

.bytes() and friends have felt quite
friendly in actual use, and I suspect .read_file_bytes() will feel quite
unwieldy. Not a show-stopper however.

It has even been suggested to throw them out, as they don't have so much to
do with a path per se. When the interface is too burdened, we'll have less
chance to be accepted. Renaming these makes clear that they are not operations
on the path, but on a file the path points to.

Here again I would claim the "practicality over purity" argument. When
one has a Path, it is very frequently because one intends to open a file
object using it and do reads and writes (obviously). Also very often,
the type of reading and writing one wants to do is an "all at once" type
of thing, as those methods support. They're merely a convenience, to
save one doing the Path(xxx).open('rb').read thing when one can merely
do Path(xxx).bytes(), in much the same way that the whole justification
for Path() is that it bundles useful and commonly used operations
together into one place.
Phillip J. Eby suggested these to be set_file_xxx and get_file_xxx to demonstrate
that they do not read or write a stream; how about that?

If they are there, they do exactly what they do, don't they? And they
do file.read() and file.write() operations, with slight nuances in the
mode passed to open() or the way the data is manipulated. Why would one
want to hide that, making it even harder to tie these operations
together with what is really going on under the covers? I think the
existing names, or at least ones with _read_ and _write_ in them
somewhere are better than set/get alternatives. It's just rare in
Python to encounter names quite as cumbersome as _write_file_bytes().

It might be good for those involved to discuss and agree on the
philosophy/principles behind using Path in the first place. If it's one
of pragmatism, then the arguments in favour of strictly differentiating
between path- and file- related operations should probably not be given
as much weight as those in favour of simple and convenient access to
commonly needed functionality. If, on the other hand, Path is seen as
some kind of a Java-esque universal path object which is cleanly and
tightly decoupled from everything else, then it would probably be best
to eliminate things like .getsize() and .read_file_bytes()/.bytes()
entirely and leave those in the hands of the cleanly defined and tightly
decoupled File object (currently spelled "file"?), again in a Java-esque
fashion. IMHO. :-)

(I'll like to say for the record that I feel that just about *any* form
of Path with even just the basics, basestring-based or not, would be a
huge improvement over the status quo, and I'm not trying to make a big
war out of this. Just offering my own view as a recent (a month or two
ago) but very enthusiastic convert to path.py.)

-Peter

-Peter

Jul 23 '05 #5

Mike Meyer

Peter Hansen <pe***@engcorp.com> writes:

* staticmethod Path.getcwd() -> Path.cwd()
* bytes() / lines() / text() -> read_file_{bytes,lines,text} methods
* write_{bytes,lines,text} -> write_file_{bytes,lines,text} methods

Under Linux isn't it possible to open and read from directories much
as with files?

The OS doesn't matter - python won't let you open a directory as a
file, even if the underlying OS will. The comment in
Objects/fileobject.c is:

/* On Unix, fopen will succeed for directories.
In Python, there should be no file objects referring to
directories, so we need a check. */

I think - but I'm not positive, and don't have a Linux box handy to
check on - that this comment is false if your Unix is really Linux.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Jul 23 '05 #6

Reinhold Birkenfeld

Peter Hansen wrote:

Reinhold Birkenfeld wrote:
Peter Hansen wrote (on Paths not allowing comparison with strings):
Could you please expand on what this means? Are you referring to doing
< and >= type operations on Paths and strings, or == and != or all those
or something else entirely?
All of these. Do you need them?

I believe so. If they are going to be basestring subclasses, why should
they be restricted in any particular way? I suppose that if you wanted
to compare a Path to a string, you could just wrap the string in a Path
first, but if the Path is already a basestring subclass, why make
someone jump through that particular hoop?

Do you have a use case for the comparison? Paths should be compared only
with other paths.

Other minor differences, as requested on python-dev, are:

* size property -> getsize() method.
* atime/mtime/ctime properties -> atime()/mtime()/ctime() methods

What does this mean? The .size property and a getsize() method both
already exist (in my copy of path.py anyway) and do the same thing.
Same with the other ones mentioned above. Is someone working from an
out-of-date copy of path.py?

No. But the size of a file is somewhat volatile, and does not feel like
a "property" of the path to it. Remember: the path is not the file. Same
goes with the xtime() methods.

Oh, so your original text was meant to imply that those properties *were
being removed*. That wasn't at all clear to me.

I understand the reasoning, but I'm unsure I agree with it. I fully
accept that the path is not the file, and yet I have a feeling this is a
pedanticism: most of the time when one is dealing with the _file_ one is
concerned with the content, and not much else. When one is dealing with
the _path_ one often wants to check the size, the modification time, and
so forth. For example, once one has the file open, one very rarely is
interested in when it was last modified.

My line of thought is that a path may, but does not need to refer to an
existing, metadata-readable file. For this, I think a property is not
proper.
In other words, I feel once again that Jason's original intuition here
was excellent, and that he chose practicality over purity in appropriate
ways, in a very Pythonic fashion. I confess to feeling that the
suggested changes are being proposed by those who have never actually
tried to put path.py to use in practical code, though I'm sure that's
not the case for everyone making those suggestions.

Still, once again this doesn't seem a critical issue to me and I'm happy
with either approach, if it means Path gets accepted in the stdlib.
At the moment, I think about overriding certain string methods that make
absolutely no sense on a path and raising an exception from them.
That would seem reasonable. It seems best to be very tolerant about
what "makes no sense", though istitle() would surely be one of those to
go first. Also capitalize() (in spite of what Windows Explorer seems to
do sometimes), center(), expandtabs(), ljust(), rjust(), splitlines(),
title(), and zfill(). Hmm... maybe not zfill() actually. I could
imagine an actual (if rare) use for that.

I'll look into it. What about iteration and indexing? Should it support
"for element in path" or "for char in path" or nothing?

.bytes() and friends have felt quite
friendly in actual use, and I suspect .read_file_bytes() will feel quite
unwieldy. Not a show-stopper however.

It has even been suggested to throw them out, as they don't have so much to
do with a path per se. When the interface is too burdened, we'll have less
chance to be accepted. Renaming these makes clear that they are not operations
on the path, but on a file the path points to.

Here again I would claim the "practicality over purity" argument. When
one has a Path, it is very frequently because one intends to open a file
object using it and do reads and writes (obviously). Also very often,
the type of reading and writing one wants to do is an "all at once" type
of thing, as those methods support. They're merely a convenience, to
save one doing the Path(xxx).open('rb').read thing when one can merely
do Path(xxx).bytes(), in much the same way that the whole justification
for Path() is that it bundles useful and commonly used operations
together into one place.
Phillip J. Eby suggested these to be set_file_xxx and get_file_xxx to demonstrate
that they do not read or write a stream; how about that?

If they are there, they do exactly what they do, don't they? And they
do file.read() and file.write() operations, with slight nuances in the
mode passed to open() or the way the data is manipulated. Why would one
want to hide that, making it even harder to tie these operations
together with what is really going on under the covers? I think the
existing names, or at least ones with _read_ and _write_ in them
somewhere are better than set/get alternatives. It's just rare in
Python to encounter names quite as cumbersome as _write_file_bytes().

I think it is not exactly bad that these names are somehow outstanding,
as that demonstrates that something complex and special happens.
It might be good for those involved to discuss and agree on the
philosophy/principles behind using Path in the first place. If it's one
of pragmatism, then the arguments in favour of strictly differentiating
between path- and file- related operations should probably not be given
as much weight as those in favour of simple and convenient access to
commonly needed functionality. If, on the other hand, Path is seen as
some kind of a Java-esque universal path object which is cleanly and
tightly decoupled from everything else, then it would probably be best
to eliminate things like .getsize() and .read_file_bytes()/.bytes()
entirely and leave those in the hands of the cleanly defined and tightly
decoupled File object (currently spelled "file"?), again in a Java-esque
fashion. IMHO. :-)
Hm. No, that's not my intention either. I think that path as it is is already
very good. The PEP must follow, and stress this point.
(I'll like to say for the record that I feel that just about *any* form
of Path with even just the basics, basestring-based or not, would be a
huge improvement over the status quo, and I'm not trying to make a big
war out of this. Just offering my own view as a recent (a month or two
ago) but very enthusiastic convert to path.py.)

That's a basis we can build on. ;)

Reinhold

Jul 23 '05 #7

John Roth

"Reinhold Birkenfeld" <re************************@wolke7.net> wrote in
message news:3k************@individual.net...

I'll look into it. What about iteration and indexing? Should it support
"for element in path" or "for char in path" or nothing?
I frankly can't think of a use for iterating over the characters in
the path, but I have a number of programs that check elements,
iterate over them and index them (frequently backwards).

I also like to know the number of elements, which seems to make
sense as len(path). Again, the number of characters in the path seems
to be utterly useless information - at least, I can't imagine a use for
it.

John Roth

Reinhold

Jul 24 '05 #8

Peter Hansen

Reinhold Birkenfeld wrote:
[on comparing Paths and stings]

Do you have a use case for the comparison? Paths should be compared only
with other paths.
I can think of lots, though I don't know that I've used any in my
existing (somewhat limited) code that uses Path, but they all involve
cases where I would expect, if comparisons were disallowed, to just wrap
the string in a Path first, even though to me that seems like it should
be an unnecessary step:

if mypath.splitpath()[0] == 'c:/temp':

if 'tests' in mypath.dirs():

and lots of other uses which start by treating a Path as a string
first, such as by doing .endswith('_unit.py')

Any of these could be resolved by ensuring both are Paths, but then I'm
not sure there's much justification left for using a baseclass of
basestring in the first place:

if mypath.splitpath()[0] == Path('c:/temp'):

if Path('tests') in mypath.dirs():

Question: would this latter one actually work? Would this check items
in the list using comparison or identity? Identity would simply be
wrong here.

[on removing properties in favour of methods for volatile data] My line of thought is that a path may, but does not need to refer to an
existing, metadata-readable file. For this, I think a property is not
proper.
Fair enough, though in either case an attempt to access that information
leads to the same exception. I can't make a strong argument in favour
of properties (nor against them, really).
What about iteration and indexing? Should it support
"for element in path" or "for char in path" or nothing?
As John Roth suggests, the former seems a much more useful thing to do.
The latter is probably as rarely needed as it is with regular strings
(which I believe is roughly "never" in Python).

[on .read_file_bytes() etc] I think it is not exactly bad that these names are somehow outstanding,
as that demonstrates that something complex and special happens.

Point taken. What about ditching the "file" part, since it is redundant
and obvious that a file is in fact what is being accessed. Thus:
..read_bytes(), .read_text(), .write_lines() etc.

-Peter

Jul 24 '05 #9

Steven D'Aprano

On Sat, 23 Jul 2005 17:51:31 -0600, John Roth wrote:

I also like to know the number of elements, which seems to make
sense as len(path). Again, the number of characters in the path seems
to be utterly useless information - at least, I can't imagine a use for
it.

There are (were?) operating systems that could only deal with a maximum
length for pathnames. If I recall correctly, and I probably don't, Classic
Mac (pre-OS X) was limited to file names of 31 or fewer characters and no
more than 250-odd for the entire pathname. At the very least, some file
manager routines would work and some would not.

If you are printing the pathname, you may care about the length so that
you can truncate it:

longname = "C:\really\really\really\really\really\long\path\n ame.txt"
if len(longname) > 30:
# do magic here
print "C:\really\ ... \path\name.txt"
else:
print longname

--
Steven.

Jul 24 '05 #10

Michael Hoffman

Peter Hansen wrote:

Point taken. What about ditching the "file" part, since it is redundant
and obvious that a file is in fact what is being accessed. Thus:
.read_bytes(), .read_text(), .write_lines() etc.

+1. Although I've always been somewhat -0 on these methods to start with.
--
Michael Hoffman

Jul 24 '05 #11

Reinhold Birkenfeld

Peter Hansen wrote:

Reinhold Birkenfeld wrote:
[on comparing Paths and stings]
Do you have a use case for the comparison? Paths should be compared only
with other paths.
I can think of lots, though I don't know that I've used any in my
existing (somewhat limited) code that uses Path, but they all involve
cases where I would expect, if comparisons were disallowed, to just wrap
the string in a Path first, even though to me that seems like it should
be an unnecessary step:

if mypath.splitpath()[0] == 'c:/temp':

if 'tests' in mypath.dirs():

and lots of other uses which start by treating a Path as a string
first, such as by doing .endswith('_unit.py')

endswith is okay, since it is an inherited method from str.
Any of these could be resolved by ensuring both are Paths, but then I'm
not sure there's much justification left for using a baseclass of
basestring in the first place:

if mypath.splitpath()[0] == Path('c:/temp'):
But you must admit that that't the cleaner solution.
if Path('tests') in mypath.dirs():

Question: would this latter one actually work? Would this check items
in the list using comparison or identity? Identity would simply be
wrong here.
Yes, it works. I didn't do anything to make it work, but Path seems to inherit
the immutableness from str.
[on removing properties in favour of methods for volatile data]
My line of thought is that a path may, but does not need to refer to an
existing, metadata-readable file. For this, I think a property is not
proper.

Fair enough, though in either case an attempt to access that information
leads to the same exception. I can't make a strong argument in favour
of properties (nor against them, really).

Okay.

What about iteration and indexing? Should it support
"for element in path" or "for char in path" or nothing?

As John Roth suggests, the former seems a much more useful thing to do.
The latter is probably as rarely needed as it is with regular strings
(which I believe is roughly "never" in Python).

[on .read_file_bytes() etc]
I think it is not exactly bad that these names are somehow outstanding,
as that demonstrates that something complex and special happens.

Point taken. What about ditching the "file" part, since it is redundant
and obvious that a file is in fact what is being accessed. Thus:
.read_bytes(), .read_text(), .write_lines() etc.

Hm. Is it so clear that a it's about a file? A path can point to anything,
so I think it's better to clearly state that this is only for a file at the
path, if it exists.

Reinhold

Jul 24 '05 #12

Reinhold Birkenfeld

Reinhold Birkenfeld wrote:

Hi,

the arguments in the previous thread were convincing enough, so I made the
Path class inherit from str/unicode again.

Further changes by now:

* subdirs() is now dirs().
* fixed compare behaviour for unicode base (unicode has no rich compare)
* __iter__() iterates over the parts().
* the following methods raise NotImplemented:
capitalize, expandtabs, join, splitlines, title, zfill

Open issues:

What about the is* string methods?

What about __contains__ and __getitem__?

What about path * 4?

Reinhold

Jul 24 '05 #13

Michael Hoffman

Reinhold Birkenfeld wrote:

* __iter__() iterates over the parts().
* the following methods raise NotImplemented:
capitalize, expandtabs, join, splitlines, title, zfill

Why? They *are* implemented. I do not understand this desire to wantonly
break basestring compatiblity for the sake of breaking compatibility.

Once you break compatibility with basestring you can no longer use a
path anywhere that you could have used a str or unicode before. With
compatibility broken, the only possible supported way of passing paths
to third-party functions will be to cast the path with
path.__bases__[0](mypath) before passing it anywhere else. You can't
even use str() because you don't know what the base class of the path
is. What a pain.

From the original path.py documentation:

"""
os.path.join doesn't map to path.join(), because there's a string method
with that name. Instead it's path.joinpath(). This is a nuisance, but
changing the semantics of base class methods is worse. (I know, I tried
it.) The same goes for split().
"""

It ain't broke. Please stop breaking it.
--
Michael Hoffman

Jul 24 '05 #14

Ivan Van Laningham

Hi All--

Reinhold Birkenfeld wrote:

Reinhold Birkenfeld wrote:
Hi,

the arguments in the previous thread were convincing enough, so I made the
Path class inherit from str/unicode again.

Thanks.
* the following methods raise NotImplemented:
capitalize, expandtabs, join, splitlines, title, zfill

If path inherits from str or unicode, why not leave these? I can
certainly see uses for capitalize(), title() and zfill() when trying to
coerce Windows to let me use the case that I put there in the first
place;-) What if I wanted to take a (legitimate) directory name
'parking\tlot' and change it to 'parking lot'?
Open issues:

What about the is* string methods?
What about them? What makes you think these wouldn't be useful?
Imagine directory names made up of all numbers; wouldn't it be useful to
know which directories in a tree of, say, digital camera images,
comprise all numbers, all hex numbers, or alpha only?
What about __contains__ and __getitem__?
I find it hard to imagine what would be returned when asking a path for
say, path["c:"], other than the index. n=path["c:"] = 0 ?
What about path * 4?

This one makes my brain hurt, I admit;-)
Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/worksh...oceedings.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours

Jul 24 '05 #15

Reinhold Birkenfeld

Michael Hoffman wrote:

Reinhold Birkenfeld wrote:
* __iter__() iterates over the parts().
* the following methods raise NotImplemented:
capitalize, expandtabs, join, splitlines, title, zfill

Why? They *are* implemented. I do not understand this desire to wantonly
break basestring compatiblity for the sake of breaking compatibility.

Once you break compatibility with basestring you can no longer use a
path anywhere that you could have used a str or unicode before. With
compatibility broken, the only possible supported way of passing paths
to third-party functions will be to cast the path with
path.__bases__[0](mypath) before passing it anywhere else. You can't
even use str() because you don't know what the base class of the path
is. What a pain.

From the original path.py documentation:

"""
os.path.join doesn't map to path.join(), because there's a string method
with that name. Instead it's path.joinpath(). This is a nuisance, but
changing the semantics of base class methods is worse. (I know, I tried
it.) The same goes for split().
"""

It ain't broke. Please stop breaking it.

Okay. While a path has its clear use cases and those don't need above methods,
it may be that some brain-dead functions needs them.

Reinhold

Jul 24 '05 #16

Peter Hansen

Reinhold Birkenfeld wrote:

Peter Hansen wrote:
if mypath.splitpath()[0] == 'c:/temp':
vs.
if mypath.splitpath()[0] == Path('c:/temp'):

But you must admit that that't the cleaner solution.

"Cleaner"? Not at all. I'd say it's the more expressive solution,
perhaps, but I definitely wouldn't choose the word "cleaner" for
something which, to me, adds fairly unnecessary text.

But it's clearly a subjective matter, and as the one of us not involved
in doing the real work here, I'll bow to your judgement on the matter. ;-)

-Peter

Jul 24 '05 #17

Mike Meyer

Steven D'Aprano <st***@REMOVETHIScyber.com.au> writes:

On Sat, 23 Jul 2005 17:51:31 -0600, John Roth wrote:
I also like to know the number of elements, which seems to make
sense as len(path). Again, the number of characters in the path seems
to be utterly useless information - at least, I can't imagine a use for
it.

There are (were?) operating systems that could only deal with a maximum
length for pathnames. If I recall correctly, and I probably don't, Classic
Mac (pre-OS X) was limited to file names of 31 or fewer characters and no
more than 250-odd for the entire pathname. At the very least, some file
manager routines would work and some would not.

Are. But I think they're a lot longer now.

bhuda% grep PATH_MAX /usr/include/sys/syslimits.h
#define PATH_MAX 1024 /* max bytes in pathname */

<mike

--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Jul 24 '05 #18

Reinhold Birkenfeld

Peter Hansen wrote:

Reinhold Birkenfeld wrote:
Peter Hansen wrote:
if mypath.splitpath()[0] == 'c:/temp':
vs.
if mypath.splitpath()[0] == Path('c:/temp'):

But you must admit that that't the cleaner solution.

"Cleaner"? Not at all. I'd say it's the more expressive solution,
perhaps, but I definitely wouldn't choose the word "cleaner" for
something which, to me, adds fairly unnecessary text.

But it's clearly a subjective matter, and as the one of us not involved
in doing the real work here, I'll bow to your judgement on the matter. ;-)

I'm in no way the last instance on this.
For example, everyone with CVS access is free to change the files ;)

Honestly, I'm in constant fear that allowing too much and loading too much
features won't increase the acceptance of python-dev <wink>

Reinhold

Jul 24 '05 #19

Michael Hoffman

Reinhold Birkenfeld wrote:

I'm in no way the last instance on this.
For example, everyone with CVS access is free to change the files ;)
I don't have CVS write access :(, so I'll have to keep kibitzing for now.
Honestly, I'm in constant fear that allowing too much and loading too much
features won't increase the acceptance of python-dev <wink>

What do you mean by this? To me code like this:

if _base is str:
def __eq__(self, other):
return isinstance(other, Path) and _base.__eq__(self, other)
[...]
else:
# Unicode has no rich compare methods
def __cmp__(self, other):
if isinstance(other, Path):
return _base.__cmp__(self, other)
return NotImplemented

is the feature that you do not need: the feature of not returning True.
You don't need this feature, and I consider it to be harmful. It breaks
duck-typing unnecessarily, and means that people who want to use some
other path library, or just str/unicode as they do today cannot compare
those paths against stdlib Paths.

We should retain the design principle of the original path.py that Path
objects should be drop-in replacements for the str or unicode objects
they replace, as much as possible. We cannot predict all the things
people are doing with strings today, and attempting to do so can only
lead to bugs.

In the current implementation, the only cases where a path object cannot
be used as a drop-in replacement for a string are (a) some extension
modules, and (b) code that tests the object class using type() instead
of using isinstance(). I think these are unavoidable but other
incompatibilities, like changing the semantics of comparisons or join()
are avoidable.

I've started a Wiki page for design principles and discussion here:

http://wiki.python.org/moin/PathClass
--
Michael Hoffman

Jul 24 '05 #20

Andrew Dalke

Reinhold Birkenfeld wrote:

Okay. While a path has its clear use cases and those don't need above methods,
it may be that some brain-dead functions needs them.

"brain-dead"?

Consider this code, which I think is not atypical.

import sys

def _read_file(filename):
if filename == "-":
# Can use '-' to mean stdin
return sys.stdin
else:
return open(filename, "rU")
def file_sum(filename):
total = 0
for line in _read_file(filename):
total += int(line)
return total

(Actually, I would probably write it

def _read_file(file):
if isinstance(file, basestring):
if filename == "-":
# Can use '-' to mean stdin
return sys.stdin
else:
return open(filename, "rU")
return file

)

Because the current sandbox Path doesn't support
the is-equal test with strings, the above function
won't work with a filename = path.Path("-"). It
will instead raise an exception saying
IOError: [Errno 2] No such file or directory: '-'

(Yes, the code as-is can't handle a file named '-'.
The usual workaround (and there are many programs
which support '-' as an alias for stdin) is to use "./-"

% cat > './-'
This is a file
% cat ./-
This is a file
% cat -
I'm typing directly into stdin.
^D
I'm typing directly into stdin.
%
)
If I start using the path.Path then in order to use
this function my upstream code must be careful on
input to distinguish between filenames which are
really filenames and which are special-cased pseudo
filenames.

Often the code using the API doesn't even know which
names are special. Even if it is documented,
the library developer may decide in the future to
extend the list of pseudo filenames to include, say,
environment variable style expansion, as
$HOME/.config

Perhaps the library developer should have come up
with a new naming system to include both types of
file naming schemes, but that's rather overkill.

As a programmer calling the API should I convert
all my path.Path objects to strings before using it?
Or to Unicode? How do I know which filenames will
be treated specially through time?

Is there a method to turn a path.Path into the actual
string? str() and unicode() don't work because I
want the result to be unicode if the OS&Python build
support it, otherwise string.

Is that library example I mentioned "brain-dead"?
I don't think so. Instead I think you are pushing
too much for purity and making changes that will
cause problems - and hard to fix problems - with
existing libraries.

Here's an example of code from an existing library
which will break in several ways if it's passed a
path object instead of a string. It comes from
spambayes/mboxutils.py

#################

This is mostly a wrapper around the various useful classes in the
standard mailbox module, to do some intelligent guessing of the
mailbox type given a mailbox argument.

+foo -- MH mailbox +foo
+foo,bar -- MH mailboxes +foo and +bar concatenated
+ALL -- a shortcut for *all* MH mailboxes
/foo/bar -- (existing file) a Unix-style mailbox
/foo/bar/ -- (existing directory) a directory full of .txt and .lorien
files
/foo/bar/ -- (existing directory with a cur/ subdirectory)
Maildir mailbox
/foo/Mail/bar/ -- (existing directory with /Mail/ in its path)
alternative way of spelling an MH mailbox

....

def getmbox(name):
"""Return an mbox iterator given a file/directory/folder name."""

if name == "-":
return [get_message(sys.stdin)]

if name.startswith("+"):
# MH folder name: +folder, +f1,f2,f2, or +ALL
name = name[1:]
import mhlib
mh = mhlib.MH()
if name == "ALL":
names = mh.listfolders()
elif ',' in name:
names = name.split(',')
else:
names = [name]
mboxes = []
mhpath = mh.getpath()
for name in names:
filename = os.path.join(mhpath, name)
mbox = mailbox.MHMailbox(filename, get_message)
mboxes.append(mbox)
if len(mboxes) == 1:
return iter(mboxes[0])
else:
return _cat(mboxes)

if os.path.isdir(name):
# XXX Bogus: use a Maildir if /cur is a subdirectory, else a MHMailbox
# if the pathname contains /Mail/, else a DirOfTxtFileMailbox.
if os.path.exists(os.path.join(name, 'cur')):
mbox = mailbox.Maildir(name, get_message)
elif name.find("/Mail/") >= 0:
mbox = mailbox.MHMailbox(name, get_message)
else:
mbox = DirOfTxtFileMailbox(name, get_message)
else:
fp = open(name, "rb")
mbox = mailbox.PortableUnixMailbox(fp, get_message)
return iter(mbox)

It breaks with the current sandbox path because:
- a path can't be compared to "-"
- range isn't supported, as "name = name[1:]"

note that this example uses __contains__ ("," in name)
Is this function brain-dead? Is it reasonable that people might
want to pass a path.Path() directly to it? If not, what's
the way to convert the path.Path() into the correct string
object?

Andrew
da***@dalkescientific.com

Jul 24 '05 #21

Carl Banks

Reinhold Birkenfeld wrote:

Peter Hansen wrote:
Reinhold Birkenfeld wrote:
One thing is still different, though: a Path instance won't compare to a regular
string.
Could you please expand on what this means? Are you referring to doing
< and >= type operations on Paths and strings, or == and != or all those
or something else entirely?

All of these. Do you need them?

[snip]
At the moment, I think about overriding certain string methods that make
absolutely no sense on a path and raising an exception from them.

Ick. This reeks of the sort of hubris from people who think they
anticipate all valid uses of something.

Is it a basestring or not? If it is, then let it be a basestring. It
is unreasonable to want to format a pathame for printing? We might
want to retain ljust and friends. Maybe there's a filenaming scheme
where files are related by having a character changed here or there.
So we might want to iterate though the characters in a pathname. How
do you know how people are going to use it? We're all supposed to be
adults here.

Let me suggest that wanting to remove all these methods/operations
suggests that one doesn't really think it ought to be a basestring.
The way I see it, the only compelling reason for it to be a basestring
is to accommodate poorly designed functions that test whether an
argument is a filename or a file object using isinstance(basestring,x)
on it. But the best thing to do is fix those interfaces, and let path
be what it should be, and not a hack to accommodate poor code.
--
CARL BANKS

Jul 24 '05 #22

Reinhold Birkenfeld

Andrew Dalke wrote:

Reinhold Birkenfeld wrote:
Okay. While a path has its clear use cases and those don't need above methods,
it may be that some brain-dead functions needs them.

"brain-dead"?

Consider this code, which I think is not atypical.

Okay, convinced.

Reinhold

Jul 25 '05 #23

Reinhold Birkenfeld

Reinhold Birkenfeld wrote:

Hi,

the arguments in the previous thread were convincing enough, so I made the
Path class inherit from str/unicode again.

Current change:

* Add base() method for converting to str/unicode.
* Allow compare against normal strings.

Reinhold

Jul 25 '05 #24

Michael Hoffman

Reinhold Birkenfeld wrote:

* Add base() method for converting to str/unicode.

+1
--
Michael Hoffman

Jul 25 '05 #25

Peter Hansen

Reinhold Birkenfeld wrote:

Current change:

* Add base() method for converting to str/unicode.

Would basestring() be a better name? Partly because that seems to be
exactly what it's doing, but more because there are (or used to be?)
other things in Path that used the word "base", such as "basename".

-1 on that specific name if it could be easily confused with "basename"
types of things.

-Peter

Jul 25 '05 #26

Reinhold Birkenfeld

Peter Hansen wrote:

Reinhold Birkenfeld wrote:
Current change:

* Add base() method for converting to str/unicode.

Would basestring() be a better name? Partly because that seems to be
exactly what it's doing, but more because there are (or used to be?)
other things in Path that used the word "base", such as "basename".

-1 on that specific name if it could be easily confused with "basename"
types of things.

Right, that was a concern of mine, too.
"tobase"?
"tostring"?
"tobasestring"?

Alternative is to set a class attribute "Base" of the Path class. Or export
PathBase as a name from the module (but that's not quite useful, because I
expect Path to be imported via "from os.path import Path").

Reinhold

Jul 25 '05 #27

Peter Hansen

Reinhold Birkenfeld wrote:

Peter Hansen wrote:
Would basestring() be a better name?
"tobase"?
"tostring"?
"tobasestring"?
Of these choices, the latter would be preferable.
Alternative is to set a class attribute "Base" of the
Path class. Or export PathBase as a name from the module
(but that's not quite useful, because I
expect Path to be imported via "from os.path import Path").

I don't understand how that would work. An attribute on the *class*?
What would it be, a callable? So mypath.Base(mypath) or something?
Please elaborate...

What about just .basestring, as a read-only attribute on the Path object?

-Peter

Jul 25 '05 #28

Reinhold Birkenfeld

Peter Hansen wrote:

Reinhold Birkenfeld wrote:
> Peter Hansen wrote:
>> Would basestring() be a better name?
> "tobase"?
> "tostring"?
> "tobasestring"?

Of these choices, the latter would be preferable.
> Alternative is to set a class attribute "Base" of the
> Path class. Or export PathBase as a name from the module
> (but that's not quite useful, because I
> expect Path to be imported via "from os.path import Path").

I don't understand how that would work. An attribute on the *class*?
What would it be, a callable? So mypath.Base(mypath) or something?
Please elaborate...

[_base is str or unicode]

class Path:
Base = _base
[...]

So you could do "Path.Base(mypath)" or "mypath.Base(mypath)".
What about just .basestring, as a read-only attribute on the Path object?

Reasonable, though the term as such is preoccupied too.

Reinhold

Jul 25 '05 #29

skip

Reinhold> Right, that was a concern of mine, too.
Reinhold> "tobase"?
Reinhold> "tostring"?
Reinhold> "tobasestring"?

If we're on a filesystem that understands unicode, would somepath.tostring()
return a unicode object or a string object encoded with some
to-be-determined encoding?

Why not just add __str__ and __unicode__ methods to the class and let the
user use str(somepath) or unicode(somepath) as needed?

Or am I missing something fundamental about what the base() method is
supposed to do?

Skip

Jul 25 '05 #30

Reinhold Birkenfeld

sk**@pobox.com wrote:

Reinhold> Right, that was a concern of mine, too.
Reinhold> "tobase"?
Reinhold> "tostring"?
Reinhold> "tobasestring"?

If we're on a filesystem that understands unicode, would somepath.tostring()
return a unicode object or a string object encoded with some
to-be-determined encoding?
Whatever the base of the Path object is. It selects its base class based on
os.path.supports_unicode_filenames.
Why not just add __str__ and __unicode__ methods to the class and let the
user use str(somepath) or unicode(somepath) as needed?

Or am I missing something fundamental about what the base() method is
supposed to do?

It should provide an alternative way of spelling Path.__bases__[0](path).

Reinhold

Jul 25 '05 #31

Andrew Dalke

> Reinhold Birkenfeld wrote:

Current change:

* Add base() method for converting to str/unicode.

Now that [:] slicing works, and returns a string,
another way to convert from path.Path to str/unicode
is path[:]

Andrew
da***@dalkescientific.com

Jul 25 '05 #32

Bengt Richter

On Mon, 25 Jul 2005 17:33:51 +0200, Reinhold Birkenfeld <re************************@wolke7.net> wrote:

Peter Hansen wrote:
Reinhold Birkenfeld wrote:
Current change:

* Add base() method for converting to str/unicode.
Would basestring() be a better name? Partly because that seems to be
exactly what it's doing, but more because there are (or used to be?)
other things in Path that used the word "base", such as "basename".

-1 on that specific name if it could be easily confused with "basename"
types of things.

Right, that was a concern of mine, too.
"tobase"?

-1"tostring"? +1"tobasestring"? -0
Alternative is to set a class attribute "Base" of the Path class. Or export
PathBase as a name from the module (but that's not quite useful, because I
expect Path to be imported via "from os.path import Path").

Reinhold

Regards,
Bengt Richter

Jul 25 '05 #33

NickC

[Re: how to get at the base class]

Do you really want to have a "only works for Path" way to get at the
base class, rather than using the canonical Path.__bases__[0]?

How about a new property in the os.path module instead? Something like
os.path.path_type.

Then os.path.path_type is unicode if and only if
os.path.supports_unicode_filenames is True. Otherwise,
os.path.path_type is str.

Then converting a Path to str or unicode is possible using:

as_str_or_unicode = os.path.path_type(some_path)

The other thing is that you can simply make Path inherit from
os.path.path_type.

Regards,
Nick C.

Jul 30 '05 #34

Reinhold Birkenfeld

NickC wrote:

[Re: how to get at the base class]

Do you really want to have a "only works for Path" way to get at the
base class, rather than using the canonical Path.__bases__[0]?

How about a new property in the os.path module instead? Something like
os.path.path_type.

Then os.path.path_type is unicode if and only if
os.path.supports_unicode_filenames is True. Otherwise,
os.path.path_type is str.

Then converting a Path to str or unicode is possible using:

as_str_or_unicode = os.path.path_type(some_path)

The other thing is that you can simply make Path inherit from
os.path.path_type.

That's what I suggested with Path.Base. It has the advantage that you don't have
to import os.path to get at it (Path is meant so that you can avoid os.path).

Reinhold

Jul 30 '05 #35

[path-PEP] Path inherits from basestring again

Similar topics