By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
458,224 Members | 1,233 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 458,224 IT Pros & Developers. It's quick & easy.

python24.zip

P: n/a
Investigating a query about the python path I see that my win32 installation has
c:/windows/system32/python24.zip (which is non existent) second on sys.path
before the actual python24/lib etc etc.

Firstly should python start up with non-existent entries on the path?
Secondly is this entry be the default for some other kind of python installation?
--
Robin Becker

Jul 19 '05 #1
Share this Question
Share on Google+
24 Replies


P: n/a
Robin Becker wrote:
Firstly should python start up with non-existent entries on the path?
Yes, this is by design.
Secondly is this entry be the default for some other kind of python
installation?


Yes. People can package everything they want in python24.zip (including
site.py). This can only work if python24.zip is already on the path
(and I believe it will always be sought in the directory where
python24.dll lives).

Regards,
Martin
Jul 19 '05 #2

P: n/a
"Martin v. Lwis" <ma****@v.loewis.de> writes on Fri, 20 May 2005 18:13:56 +0200:
Robin Becker wrote:
Firstly should python start up with non-existent entries on the path?


Yes, this is by design.
Secondly is this entry be the default for some other kind of python
installation?


Yes. People can package everything they want in python24.zip (including
site.py). This can only work if python24.zip is already on the path
(and I believe it will always be sought in the directory where
python24.dll lives).


The question was:

"should python start up with **non-existent** objects on the path".

I think there is no reason why path needs to contain an object
which does not exist (at the time the interpreter starts).

In your use case, "python24.zip" does exist and therefore may
be on the path. When "python24.zip" does not exist, it does
not contain anything and especially not "site.py".
I recently analysed excessive import times and
saw thousands of costly and unneccesary filesystem operations due to:

* long "sys.path", especially containing non-existing objects

Although non-existent, about 5 filesystem operations are
tried on them for any module not yet located.

* a severe weakness in Python's import hook treatment

When there is an importer "i" for a path "p" and
this importer cannot find module "m", then "p" is
treated as a directory and 5 file system operations
are tried to locate "p/m". Of course, all of them fail
when "p" happens to be a zip archive.
Dieter
Jul 19 '05 #3

P: n/a
Dieter Maurer wrote:
......

The question was:

"should python start up with **non-existent** objects on the path".

I think there is no reason why path needs to contain an object
which does not exist (at the time the interpreter starts).

In your use case, "python24.zip" does exist and therefore may
be on the path. When "python24.zip" does not exist, it does
not contain anything and especially not "site.py".

I think this was my intention, but also I think I have some concern over
having two possible locations for the standard library. It seems non pythonic
and liable to cause confusion if some package should manage to install
python24.zip while I believe that python24\lib is being used.

I recently analysed excessive import times and
saw thousands of costly and unneccesary filesystem operations due to:

* long "sys.path", especially containing non-existing objects

Although non-existent, about 5 filesystem operations are
tried on them for any module not yet located.

* a severe weakness in Python's import hook treatment

When there is an importer "i" for a path "p" and
this importer cannot find module "m", then "p" is
treated as a directory and 5 file system operations
are tried to locate "p/m". Of course, all of them fail
when "p" happens to be a zip archive.
Dieter


I suppose that's a reason for eliminating duplicates and non-existent entries.

--
Robin Becker
Jul 19 '05 #4

P: n/a
Dieter Maurer wrote:
The question was:

"should python start up with **non-existent** objects on the path".

I think there is no reason why path needs to contain an object
which does not exist (at the time the interpreter starts).
There is. When the interpreter starts, it doesn't know what object
do or do not exist. So it must put python24.zip on the path
just in case.
In your use case, "python24.zip" does exist and therefore may
be on the path. When "python24.zip" does not exist, it does
not contain anything and especially not "site.py".
Yes, but the interpreter cannot know in advance whether
python24.zip will be there when it starts.
I recently analysed excessive import times and
saw thousands of costly and unneccesary filesystem operations due to:


Hmm. In my Python 2.4 installation, I only get 154 open calls, and
63 stat calls on an empty Python file. So somebody must have messed
with sys.path really badly if you saw thoughsands of file operations
(although I wonder what operating system you use so that failing
open operations are costly; most operating systems should do them
very efficiently).

Regards,
Martin
Jul 19 '05 #5

P: n/a
Robin Becker wrote:
Dieter Maurer wrote: [...]
I think this was my intention, but also I think I have some concern over
having two possible locations for the standard library. It seems non pythonic
and liable to cause confusion if some package should manage to install
python24.zip while I believe that python24\lib is being used.

I recently analysed excessive import times and
saw thousands of costly and unneccesary filesystem operations due to:

* long "sys.path", especially containing non-existing objects

Although non-existent, about 5 filesystem operations are
tried on them for any module not yet located.

* a severe weakness in Python's import hook treatment

When there is an importer "i" for a path "p" and
this importer cannot find module "m", then "p" is
treated as a directory and 5 file system operations
are tried to locate "p/m". Of course, all of them fail
when "p" happens to be a zip archive.
Dieter

I suppose that's a reason for eliminating duplicates and non-existent entries.

There are some aspects of Python's initialization that are IMHO a bit
too filesystem-dependent. I mentioned one in
http://sourceforge.net/tracker/index...70&atid=105470

but I'd appreciate further support. Ideally there should be some means
for hooked import mechanisms to provide answers that are currently
sought from the filestore.

regards
Steve
--
Steve Holden +1 703 861 4237 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/

Jul 19 '05 #6

P: n/a
"Martin v. Lwis" <ma****@v.loewis.de> writes on Sat, 21 May 2005 23:53:31 +0200:
Dieter Maurer wrote:
...
The question was:

"should python start up with **non-existent** objects on the path".

I think there is no reason why path needs to contain an object
which does not exist (at the time the interpreter starts).
There is. When the interpreter starts, it doesn't know what object
do or do not exist. So it must put python24.zip on the path
just in case.


Really?

Is the interpreter unable to call "C" functions ("stat" for example)
to determine whether an object exists before it puts it on "path".
Yes, but the interpreter cannot know in advance whether
python24.zip will be there when it starts.


Thus, it checks dynamically when it starts.
I recently analysed excessive import times and
saw thousands of costly and unneccesary filesystem operations due to:


Hmm. In my Python 2.4 installation, I only get 154 open calls, and
63 stat calls on an empty Python file. So somebody must have messed
with sys.path really badly if you saw thoughsands of file operations
(although I wonder what operating system you use so that failing
open operations are costly; most operating systems should do them
very efficiently).


The application was Zope importing about 2.500 modules
from 2 zip files "zope.zip" and "python24.zip".
This resulted in about 12.500 opens -- about 4 times more
than would be expected -- about 10.000 of them failing opens.
Dieter
Jul 19 '05 #7

P: n/a
Steve Holden <st***@holdenweb.com> writes on Sun, 22 May 2005 09:14:43 -0400:
...
There are some aspects of Python's initialization that are IMHO a bit
too filesystem-dependent. I mentioned one in
http://sourceforge.net/tracker/index...70&atid=105470
but I'd appreciate further support. Ideally there should be some means
for hooked import mechanisms to provide answers that are currently
sought from the filestore.


There are such hooks. See e.g. the "meta_path" hooks as
described by PEP 302.
Jul 19 '05 #8

P: n/a
Dieter Maurer wrote:
Really?

Is the interpreter unable to call "C" functions ("stat" for example)
to determine whether an object exists before it puts it on "path".
What do you mean, "unable to"? It just doesn't.

Could it? Perhaps, if somebody wrote a patch.
Would the patch be accepted? Perhaps, if it didn't break something
else.

In the past, there was a silent guarantee that you could add
items to sys.path, and only later create the directories behind
these items. I don't know whether people rely on this guarantee.
The application was Zope importing about 2.500 modules
from 2 zip files "zope.zip" and "python24.zip".
This resulted in about 12.500 opens -- about 4 times more
than would be expected -- about 10.000 of them failing opens.


I see. Out of curiosity: how much startup time was saved
when sys.path was explicitly stripped to only contain these
two zip files?

I would expect that importing 2500 modules takes *way*
more time than doing 10.000 failed opens.

Regards,
Martin
Jul 19 '05 #9

P: n/a
Dieter Maurer wrote:
Steve Holden <st***@holdenweb.com> writes on Sun, 22 May 2005 09:14:43 -0400:
...
There are some aspects of Python's initialization that are IMHO a bit
too filesystem-dependent. I mentioned one in
http://sourceforge.net/tracker/index...70&atid=105470
but I'd appreciate further support. Ideally there should be some means
for hooked import mechanisms to provide answers that are currently
sought from the filestore.

There are such hooks. See e.g. the "meta_path" hooks as
described by PEP 302.


Indeed I have written PEP 302-based code to import from a relational
database, but I still don't believe there's any satisfactory way to have
[such a hooked import mechanism] be a first-class component of an
architecture that specifically requires an os.py to exist in the file
store during initialization.

I wasn't asking for an import hook mechanism (since I already knew these
to exist), but for a way to allow such mechanisms to be the sole import
support for certain implementations.

regards
Steve
--
Steve Holden +1 703 861 4237 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/

Jul 19 '05 #10

P: n/a
Martin v. Lwis wrote:
Dieter Maurer wrote:
Really?

Is the interpreter unable to call "C" functions ("stat" for example)
to determine whether an object exists before it puts it on "path".
What do you mean, "unable to"? It just doesn't.

In fact, the interpreter doesn't necessarily know when it is
affecting the path.
Could it? Perhaps, if somebody wrote a patch.
Would the patch be accepted? Perhaps, if it didn't break something
else.

In the past, there was a silent guarantee that you could add
items to sys.path, and only later create the directories behind
these items. I don't know whether people rely on this guarantee.


If you only checked "lost" files/directories on the path a few
seconds later than the last time you checked, you might be able
to drive this "failed open" time down drastically without seriously
affecting those who care. Such an implementation should have a
call which allowed you to "clear" the timestamps for the "known bad"
entries.

--Scott David Daniels
Sc***********@Acm.Org
Jul 19 '05 #11

P: n/a
Scott David Daniels wrote:
Is the interpreter unable to call "C" functions ("stat" for example)
to determine whether an object exists before it puts it on "path".

What do you mean, "unable to"? It just doesn't.


In fact, the interpreter doesn't necessarily know when it is
affecting the path.


Now I remember what makes this stuff really difficult: PEP 302
introduces path hooks (sys.path_hooks), allowing imports from
other sources than files. So the items on sys.path don't have
to be directory or file names at all, and importing from them
may still succeed if though stat fails.

Regards,
Martin
Jul 19 '05 #12

P: n/a
Martin v. Lwis wrote:
.....


Now I remember what makes this stuff really difficult: PEP 302
introduces path hooks (sys.path_hooks), allowing imports from
other sources than files. So the items on sys.path don't have
to be directory or file names at all, and importing from them
may still succeed if though stat fails.

..... so is there implication of multiplicative behaviour?

ie if we have N importers and F leading failure syspath entries before the
correct one is found do we get order N*F failed stats/opens etc etc?

--
Robin Becker

Jul 19 '05 #13

P: n/a
Robin Becker wrote:
ie if we have N importers and F leading failure syspath entries before
the correct one is found do we get order N*F failed stats/opens etc etc?


No. Each path hook is supposed to provide a decision as to whether this
is a useful item on sys.path only once; the importer objects themselves
are then cached (with some operation to clear the cache). Each path hook
may apply its own algorithm, e.g. looking at the syntactical structure
or the type of the sys.path item, so not all of them need stat/open
to determine whether they support the item.

The multiplicative behaviour rather results from the different type of
modules: each path item may carry .py, .pyc, .so, module.so, etc.

Regards,
Martin

Jul 19 '05 #14

P: n/a
Robin Becker wrote:
ie if we have N importers and F leading failure syspath entries before
the correct one is found do we get order N*F failed stats/opens etc etc?


No. Each path hook is supposed to provide a decision as to whether this
is a useful item on sys.path only once; the importer objects themselves
are then cached (with some operation to clear the cache). Each path hook
may apply its own algorithm, e.g. looking at the syntactical structure
or the type of the sys.path item, so not all of them need stat/open
to determine whether they support the item.

The multiplicative behaviour rather results from the different type of
modules: each path item may carry .py, .pyc, .so, module.so, etc.

Regards,
Martin
Jul 19 '05 #15

P: n/a
Martin v. Lwis wrote:
Robin Becker wrote:
ie if we have N importers and F leading failure syspath entries before
the correct one is found do we get order N*F failed stats/opens etc etc?

No. Each path hook is supposed to provide a decision as to whether this
is a useful item on sys.path only once; the importer objects themselves
are then cached (with some operation to clear the cache). Each path hook
may apply its own algorithm, e.g. looking at the syntactical structure
or the type of the sys.path item, so not all of them need stat/open
to determine whether they support the item.

The multiplicative behaviour rather results from the different type of
modules: each path item may carry .py, .pyc, .so, module.so, etc.

Regards,
Martin

if the importers are tested statically how does a filesystem path ever manage
to get back into the loop if it was ever found missing? In other words if
things (eg python24.zip) are found not importable/usable in one pass how do
they get reinstated?
--
Robin Becker
Jul 19 '05 #16

P: n/a
Robin Becker wrote:
if the importers are tested statically how does a filesystem path ever
manage
to get back into the loop if it was ever found missing? In other words if
things (eg python24.zip) are found not importable/usable in one pass how do
they get reinstated?


I think (but see the code yourself) that only the successful importers
are cached.

Regards,
Martin
Jul 19 '05 #17

P: n/a
Steve Holden <st***@holdenweb.com> writes on Sun, 22 May 2005 16:19:10 -0400:
...
Indeed I have written PEP 302-based code to import from a relational
database, but I still don't believe there's any satisfactory way to
have [such a hooked import mechanism] be a first-class component of an
architecture that specifically requires an os.py to exist in the file
store during initialization.
I wasn't asking for an import hook mechanism (since I already knew
these to exist), but for a way to allow such mechanisms to be the sole
import support for certain implementations.


We do not have "os.py" (directly) on the file system.
It lives (like everything else) in a zip archive.

This works because the "zipimporter" is put on
"sys.path_hook" before the interpreter starts executing Python code.

Thus, all you have to do: use a different Python startup
and ensure that you special importer (able to import e.g. "os")
is already set up, before you start executing Python code.
Dieter
Jul 19 '05 #18

P: n/a
"Martin v. Lwis" <ma****@v.loewis.de> writes on Sun, 22 May 2005 21:24:41 +0200:
...
What do you mean, "unable to"? It just doesn't.
The original question was: "why does Python put non-existing
entries on 'sys.path'".

Your answer seems to be: "it just does not do it -- but it might
be changed if someone does the work".

This fine with me.
...
In the past, there was a silent guarantee that you could add
items to sys.path, and only later create the directories behind
these items. I don't know whether people rely on this guarantee.
I do not argue that Python should prevent adding non-existing
items on "path". This would not work as Python may not
know what "existing" means (due to "path_hooks").

I only argue that it should not *itself* (automatically) put items on path
where it knows the responsible importers and knows (or can
easily determine) that they are non existing for them.
...
The application was Zope importing about 2.500 modules
from 2 zip files "zope.zip" and "python24.zip".
This resulted in about 12.500 opens -- about 4 times more
than would be expected -- about 10.000 of them failing opens.
I see. Out of curiosity: how much startup time was saved
when sys.path was explicitly stripped to only contain these
two zip files?


I cannot tell you precisely because it is very time consuming
to analyse cold start timing behavior (it requires a reboot for
each measurement).

We essentially have the following numbers only:

warm start cold start
(filled OS caches) (empty OS caches)

from file system 5s 13s
from ZIP archives 4s 8s
frozen 3s 5s

The ZIP archive time was measured after a patch to "import.c"
that prevents Python to view a ZIP archive member as a directory
when it cannot find the currently looked for module (of course,
this lookup fails also when the archive member is viewed as a directory).
Furthermore, all C-extensions were loaded via a "meta_path" hook (and
not "sys.path") and "sys.path" contained just the two Zip archives.
These optimizations led to about 3.000 opens (down from originally 12.500).
I would expect that importing 2500 modules takes *way*
more time than doing 10.000 failed opens.


You may be wrong: searching for non existing files may cause
disk io which is several orders of magnitude slower that
CPU activities.

The comparison between warm start (few disc io) and cold start
(much disc io) tells you that the import process is highly
io dominated (for cold starts).

I know that this does not prove that the failing opens contribute
significantly. However, a colleague reported that the
"import.c" patch (essential for the reduction of the number of opens)
resulted in significant (but not specified) improvements.
Dieter
Jul 19 '05 #19

P: n/a
Dieter Maurer wrote:
The comparison between warm start (few disc io) and cold start
(much disc io) tells you that the import process is highly
io dominated (for cold starts).
Correct. However, I would expect that the contents of existing
directories is cached, and it might be that the absence of a directory
on sys.path is also cached (I know Linux does negative dentry caching).
I know that this does not prove that the failing opens contribute
significantly. However, a colleague reported that the
"import.c" patch (essential for the reduction of the number of opens)
resulted in significant (but not specified) improvements.


When I experimented with startup time for 2.4, I found that these
calls don't matter at all in any significant way (atleast not for
warm starts). Instead, I found that reducing the size of .pyc files,
by sharing interned strings, gives more speedup (and indeed, 2.4
changed the marshal format to accommodate shared interned strings).

So I would agree that IO makes a significant part of startup, but
I doubt it is directory reading (unless perhaps you have an
absent NFS server or some such).

Regards,
Martin
Jul 19 '05 #20

P: n/a
Dieter Maurer wrote:
"Martin v. Lwis" <ma****@v.loewis.de> writes on Sun, 22 May 2005 21:24:41 +0200:
...
The application was Zope importing about 2.500 modules
from 2 zip files "zope.zip" and "python24.zip".
This resulted in about 12.500 opens -- about 4 times more
than would be expected -- about 10.000 of them failing opens.


I'll bet this means that the 'zope.zip', 'python24.zip' would drop
you to "about 12500 - 10000 = 2500" failing opens. That should be
an easy test: sys.path.insert(0, 'zope.zip') or whatever.
If that works and you want to drop even more, make a copy of zope.zip,
update it with python24.zip, and call the result python24.zip.

--Scott David Daniels
Sc***********@Acm.Org
Jul 19 '05 #21

P: n/a
"Martin v. Lwis" <ma****@v.loewis.de> writes on Tue, 24 May 2005 23:58:03 +0200:
... 10.000 failing opens -- a cause for significant IO during startup ? ... So I would agree that IO makes a significant part of startup, but
I doubt it is directory reading (unless perhaps you have an
absent NFS server or some such).


We noticed the large difference between warm and cold start even when
we run from a zip archive. We expected that the only relevant IO would
go to the zip archives and therefore, we preloaded them to the
OS cache (by reading them sequentially) before the Python start.
To our great surprise, this did not significantly reduced Python's
(cold) startup time. We concluded that there must be other IO
not directed to the zip archives, started investigating and found
the 10.000 opens to non-existing files as the only other
significant IO contingent....
Dieter

Jul 19 '05 #22

P: n/a
Scott David Daniels <Sc***********@Acm.Org> writes on Wed, 25 May 2005 07:10:00 -0700:
...
I'll bet this means that the 'zope.zip', 'python24.zip' would drop
you to "about 12500 - 10000 = 2500" failing opens. That should be
an easy test: sys.path.insert(0, 'zope.zip') or whatever.
If that works and you want to drop even more, make a copy of zope.zip,
update it with python24.zip, and call the result python24.zip.


We can not significantly reduce the amount of opens further:

Each module import from a zip archive opens the archive.
As we have about 2.500 modules, we will get this order of opens
(as long as we use Python's "zipimporter").

The "zipimporter" uses a sequence of "stat"s to determine
whether it can handle a path item: it drops the last
component until it gets an existing file object and then
checks that it is indeed a zip archive.
Adding a cache for this check could save an additional few
hundreds of opens.
Dieter

Jul 19 '05 #23

P: n/a
Dieter Maurer wrote:
Steve Holden <st***@holdenweb.com> writes on Sun, 22 May 2005 16:19:10 -0400:
...
Indeed I have written PEP 302-based code to import from a relational
database, but I still don't believe there's any satisfactory way to
have [such a hooked import mechanism] be a first-class component of an
architecture that specifically requires an os.py to exist in the file
store during initialization.
I wasn't asking for an import hook mechanism (since I already knew
these to exist), but for a way to allow such mechanisms to be the sole
import support for certain implementations.

We do not have "os.py" (directly) on the file system.
It lives (like everything else) in a zip archive.

This works because the "zipimporter" is put on
"sys.path_hook" before the interpreter starts executing Python code.

Thus, all you have to do: use a different Python startup
and ensure that you special importer (able to import e.g. "os")
is already set up, before you start executing Python code.

It might help others like me if you were to document this setup, as I
was unable to persuade the interpreter to start without producing the
dire-sounding warning messages I mentioned in the bug report.

regards
Steve
--
Steve Holden +1 703 861 4237 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/

Jul 19 '05 #24

P: n/a
Martin v. Lwis wrote:
Scott David Daniels wrote:
Is the interpreter unable to call "C" functions ("stat" for example)
to determine whether an object exists before it puts it on "path".
What do you mean, "unable to"? It just doesn't.


In fact, the interpreter doesn't necessarily know when it is
affecting the path.

Now I remember what makes this stuff really difficult: PEP 302
introduces path hooks (sys.path_hooks), allowing imports from
other sources than files. So the items on sys.path don't have
to be directory or file names at all, and importing from them
may still succeed if though stat fails.

This new feature also makes the strategy of looking in the filestore for
"os.py" somewhat dubious, hence my bug report.

regards
Steve
--
Steve Holden +1 703 861 4237 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/

Jul 19 '05 #25

This discussion thread is closed

Replies have been disabled for this discussion.