473,769 Members | 2,134 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

PEP on path module for standard library

Many of you are familiar with Jason Orendorff's path module
<http://www.jorendorff. com/articles/python/path/>, which is frequently
recommended here on c.l.p. I submitted an RFE to add it to the Python
standard library, and Reinhold Birkenfeld started a discussion on it in
python-dev
<http://mail.python.org/pipermail/python-dev/2005-June/054438.html>.

The upshot of the discussion was that many python-dev'ers wanted path
added to the stdlib, but Guido was not convinced and said it must have a
PEP. So Reinhold and I are going to work on one. Reinhold has already
made some changes to the module to fit the python-dev discussion and put
it in CPython CVS at nondist/sandbox/path.

For the PEP, do any of you have arguments for or against including path?
Code samples that are much easier or more difficult with this class
would also be most helpful.

I use path in more of my modules and scripts than any other third-party
module, and I know it will be very helpful when I no longer have to
worry about deploying it.

Thanks in advance,
--
Michael Hoffman
Jul 21 '05
70 4115
Peter Hansen wrote:
When files are opened through a "path" object -- e.g.
path('name').op en() -- then file.name returns the path object that was
used to open it.


Also works if you use file(path('name ')) or open(path('name ')).
--
Michael Hoffman
Jul 22 '05 #51
Andrew Dalke wrote:
Duncan Booth wrote:
Personally I think the concept of a specific path type is a good one, but
subclassing string just cries out to me as the wrong thing to do.
I disagree. I've tried using a class which wasn't derived from
a basestring and kept running into places where it didn't work well.
For example, "open" and "mkdir" take strings as input. There is no
automatic coercion.


Well, as a Path object provides both open() and mkdir() functions, these use
cases are covered. And that's the point of the Path class: Every common use
you may have for a path is implemented as a method.

So, it's maybe a good thing that for uncommon uses you have to explicitly
"cast" the path to a string.
class Spam: ... def __getattr__(sel f, name):
... print "Want", repr(name)
... raise AttributeError, name
... open(Spam()) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found import os
os.mkdir(Spam() ) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found
The solutions to this are:
1) make the path object be derived from str or unicode. Doing
this does not conflict with any OO design practice (eg, Liskov
substitution).

2) develop a new "I represent a filename" protocol, probably done
via adapt().

I've considered the second of these but I think it's a more
complicated solution and it won't fit well with existing APIs
which do things like
if isinstance(inpu t, basestring):
input = open(input, "rU")
for line in input:
print line

I showed several places in the stdlib and in 3rd party packages
where this is used.


That's a valid point. However, if Path is not introduced as a string,
most new users will not try to use a Path instance where a string is
needed, just as you wouldn't try to pass a list where a string is wanted.
You should need an explicit call to convert a path to a string and that
forces you when passing the path to something that requires a string to
think whether you wanted the string relative, absolute, UNC, uri etc.


You are broadening the definition of a file path to include URIs?
That's making life more complicated. Eg, the rules for joining
file paths may be different than the rules for joining URIs.
Consider if I have a file named "mail:da***@exa mple.com" and I
join that with "file://home/dalke/badfiles/".

Additionally, the actions done on URIs are different than on file
paths. What should os.listdir("htt p://www.python.org/") do?


I agree. Path is only for local filesystem paths (well, in UNIX they could
as well be remote, but that's thanks to the abstraction the filesystem
provides, not Python).
As I mentioned, I tried some classes which emulated file
paths. One was something like

class TempDir:
"""removes the directory when the refcount goes to 0"""
def __init__(self):
self.filename = ... use a function from the tempfile module
def __del__(self):
if os.path.exists( self.filename):
shutil.rmtree(s elf.filename)
def __str__(self):
return self.filename

I could do

dirname = TempDir()

but then instead of

os.mkdir(dirnam e)
tmpfile = os.path.join(di rname, "blah.txt")

I needed to write it as

os.mkdir(str(di rname))
tmpfile = os.path.join(st r(dirname), "blah.txt") )

or have two variables, one which could delete the
directory and the other for the name. I didn't think
that was good design.
I can't follow. That's clearly not a Path but a custom object of yours.
However, I would have done it differently: provide a "name" property
for the object, and don't call the variable "dirname", which is confusing.
If I had derived from str/unicode then things would
have been cleaner.

Please note, btw, that some filesystems are unicode
based and others are not. As I recall, one nice thing
about the path module is that it chooses the appropriate
base class at import time. My "str()" example above
does not and would fail on a Unicode filesystem aware
Python build.


There's no difference. The only points where the type of a Path
object' underlying string is decided are Path.cwd() and the
Path constructor.
Reinhold
Jul 22 '05 #52
Reinhold Birkenfeld wrote:
Andrew Dalke wrote:
I disagree. I've tried using a class which wasn't derived from
a basestring and kept running into places where it didn't work well.
For example, "open" and "mkdir" take strings as input. There is no
automatic coercion.
Well, as a Path object provides both open() and mkdir() functions, these use
cases are covered. And that's the point of the Path class: Every common use
you may have for a path is implemented as a method.


Except when you pass the path to a function written by someone else
So, it's maybe a good thing that for uncommon uses you have to explicitly
"cast" the path to a string.


Where uncommon uses include passing the path object to any code you
don't control? The stdlib can be fixed, this other stuff can't.
The solutions to this are:
1) make the path object be derived from str or unicode. Doing
this does not conflict with any OO design practice (eg, Liskov
substitution) .

2) develop a new "I represent a filename" protocol, probably done
via adapt().

I've considered the second of these but I think it's a more
complicated solution and it won't fit well with existing APIs
which do things like
if isinstance(inpu t, basestring):
input = open(input, "rU")
for line in input:
print line

I showed several places in the stdlib and in 3rd party packages
where this is used.


That's a valid point. However, if Path is not introduced as a string,
most new users will not try to use a Path instance where a string is
needed, just as you wouldn't try to pass a list where a string is wanted.


But many functions were written expecting lists as arguments but also
work for strings, and do not require an explicit list(mystring) before
calling the function.
--
Michael Hoffman
Jul 22 '05 #53
Peter Hansen wrote:
Most prominent change is that it doesn't inherit from str/unicode
anymore.
I found this distinction important, because as a str subclass the Path
object
has many methods that don't make sense for it.


On this topic, has anyone ask the original author (Jason Orendorff)
whether he has some background on this decision that might benefit the
discussion?


My impression is that he doesn't have a lot of spare cycles for this. He
didn't have anything to add to the python-dev discussion when I informed
him of it. I'd love to hear what he had to say about the design.
--
Michael Hoffman
Jul 22 '05 #54
Daniel Dittmar wrote:
Duncan Booth wrote:
I would have expected a path object to be a sequence of path elements
rather than a sequence of characters.

Maybe it's nitpicking, but I don't think that a path object should be a
'sequence of path elements' in an iterator context.

This means that

for element in pathobject:

has no intuitive meaning for me, so it shouldn't be allowed.


Try this:

A file-system is a maze of twisty little passages, all alike. Junction
== directory. Cul-de-sac == file. Fortunately it is signposted. You are
dropped off at one of the entrance points ("current directory", say).
You are given a route (a "path") to your destination. The route consists
of a list of intermediate destinations.

for element in pathobject:
follow_sign_pos t_to(element)

Exception-handling strategy: Don't forget to pack a big ball of string.
Anecdotal evidence is that breadcrumbs are unreliable.

Cheers,
John
Jul 22 '05 #55
Michael Hoffman wrote:
John Roth wrote:
However, a path as a sequence of characters has even less
meaning - I can't think of a use, while I have an application
where traversing a path as a sequence of path elements makes
perfect sense: I need to descend the directory structure, directory
by directory, looking for specific files and types.

I *have* used a path as a sequence of characters before. I had to deal
with a bunch of filenames that were formatted like "file_02832.a.t xt"


Ya, ya , ya, only two days ago I had to lash up a quick script to find
all files matching r"\d{8,8}[A-Za-z]{0,3}$" and check that the expected
number were present in each sub-directory (don't ask!).

BUT you are NOT using "a path as a sequence of characters". Your
filename is a path consisting of one element. The *element* is an
instance of basestring, to which you can apply all the string methods
and the re module.


I can see the case for a path as a sequence of elements, although in
practice, drive letters, extensions, and alternate streams complicate
things.


Jul 22 '05 #56
Michael Hoffman wrote:
Peter Hansen wrote:
When files are opened through a "path" object -- e.g.
path('name').op en() -- then file.name returns the path object that was
used to open it.


Also works if you use file(path('name ')) or open(path('name ')).


Since that's exactly what the path module does, it's not surprising.
Practically everything that path does, with a few useful exceptions, is
a thin wrapper around the existing calls. path.open, for example is
merely this:

def open(self, mode='r'):
return file(self, mode)

-Peter
Jul 22 '05 #57
"Andrew Dalke" <da***@dalkesci entific.com> wrote:
George Sakkis wrote:
You're right, conceptually a path
HAS_A string description, not IS_A string, so from a pure OO point of
view, it should not inherit string.
How did you decide it's "has-a" vs. "is-a"?

All C calls use a "char *" for filenames and paths,
meaning the C model file for the filesystem says
paths are strings.


Bringing up how C models files (or anything else other than primitive types for that matter) is not
a particularly strong argument in a discussion on OO design ;-)
Paths as strings fit the Liskov substitution principle
in that any path object can be used any time a
string is used (eg, "loading from " + filename)
Liskov substitution principle imposes a rather weak constraint on when inheritance should not be
used, i.e. it is a necessary condition, but not sufficient. Take for example the case where a
PhoneNumber class is subclass of int. According to LSP, it is perfectly ok to add phone numbers
together, subtract them, etc, but the result, even if it's a valid phone number, just doesn't make
sense.
Good information hiding suggests that a better API
is one that requires less knowledge. I haven't
seen an example of how deriving from (unicode)
string makes things more complicated than not doing so.
I wouldn't say more complicated, but perhaps less intuitive in a few cases, e.g.:
path(r'C:\Docum ents and Settings\Guest\ Local Settings').spli t()

['C:\\Documents' , 'and', 'Settings\\Gues t\\Local', 'Settings']
instead of
['C:', 'Documents and Settings', 'Guest', 'Local Settings']

I just noted that conceptually a path is a composite object consisting of many properties (dirname,
extension, etc.) and its string representation is just one of them. Still, I'm not suggesting that a
'pure' solution is better that a more practical that covers most usual cases.

George
Jul 22 '05 #58
Peter Hansen wrote:
Practically everything that path does, with a few useful exceptions, is
a thin wrapper around the existing calls.


If the implementation is easy to explain, it may be a good idea.

OT: I just realized you can now type in "python -m this" at the command
line, which is convenient, but strange.
--
Michael Hoffman
Jul 23 '05 #59
Scott David Daniels:
Isn't it even worse than this?
On Win2K & XP, don't the file systems have something to do with the
encoding? So D: (a FAT drive) might naturally be str, while C:
(an NTFS drive) might naturally be unicode.
This is generally safe as Windows is using unicode internally and
provides full-fidelity access to the FAT drive using unicode strings.
You can produce failures if you try to create files with names that can
not be represented but you would see a similar failure with byte string
access.
Even worse, would be a
path that switches in the middle (which it might do if we get to a
ZIP file or use the newer dir-in-file file systems.


If you are promoting from byte strings with a known encoding to
unicode path objects then this should always work.

Neil
Jul 23 '05 #60

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
24852
by: Jeff Wagner | last post by:
Is it possible to append a new Path to some file permanently? It seems like a sys.path.append('/my/new/path') statement is temporary. In other words, where and in what file on a Win32 box or Linux box is the sys.path info kept. I have a couple of paths I keep practice files in I want to add to the path permanently. Thanks
3
9734
by: Stephen Ferg | last post by:
I need a little help here. I'm developing some introductory material on Python for non-programmers. The first draft includes this statement. Is this correct? ----------------------------------------------------------------- When loading modules, Python looks for modules in the following places in the following order:
5
1762
by: chirayuk | last post by:
Hi, I am trying to treat an environment variable as a python list - and I'm sure there must be a standard and simple way to do so. I know that the interpreter itself must use it (to process $PATH / %PATH%, etc) but I am not able to find a simple function to do so. os.environ.split(os.sep) is wrong on Windows for the case when PATH="c:\\A;B";c:\\D; where there is a ';' embedded in the quoted path.
34
3267
by: Reinhold Birkenfeld | last post by:
Hi, the arguments in the previous thread were convincing enough, so I made the Path class inherit from str/unicode again. It still can be found in CVS: /python/nondist/sandbox/path/{path.py,test_path.py} One thing is still different, though: a Path instance won't compare to a regular string.
17
1742
by: chris.atlee | last post by:
Hi there, I haven't seen this topic pop up in a while, so I thought I'd raise it again... What is the status of the path module/class PEP? Did somebody start writing one, or did it die? I would really like to see something like Jason Orendorff's path class make its way into the python standard library.
11
26638
by: cybervigilante | last post by:
I can't seem to change the include path on my local winmachine no matter what I do. It comes up as includ_path .;C:\php5\pear in phpinfo() but there is no such file. I installed the WAMP package and PEAR is in c:\wamp\php\pear I modified php.ini in the c:\wamp\php directory to reflect the actual path, but even stopping and restarting my server shows the c: \php5\pear path. I can't change it no matter what I do I also tried the...
0
9589
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
1
9997
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8873
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6675
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5310
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5448
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3965
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3565
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2815
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.