Deprecating reload() ???

On Thu, 11 Mar 2004 15:10:59 -0500, "Ellinghaus, Lance"
<la**************@eds.com> wrote:

> Other surprises: Deprecating reload()

Reload doesn't work the way most people think
it does: if you've got any references to the old module,
they stay around. They aren't replaced.

It was a good idea, but the implementation simply
doesn't do what the idea promises.

I agree that it does not really work as most people think it does, but how
would you perform the same task as reload() without the reload()?

Seems like reload() could be *made* to work by scanning currently
loaded modules for all references to objects in the reloaded module,
and resetting them to the new objects. If an object is missing from
the new module, throw an exception.

GvR suggests using 'exec' as an alternative.
http://python.org/doc/essays/ppt/regrets/6
I don't see how that solves the problem of eliminating references to
old objects.

-- Dave

Jul 18 '05 #3

user

Ellinghaus, Lance wrote:

I agree that it does not really work as most people think it does, but how
would you perform the same task as reload() without the reload()?

Would this be done by "del sys.modules['modulename']" and then perform an
'import'?

Ah, interesting! I've been doing this:

del modulename; import modulename

but it doesn't pick up any recent changes to the
modulename.py file.

That's a better solution than exiting Python and
starting it up again. Thanks!

--
Steven D'Aprano

Jul 18 '05 #4

us**@domain.invalid wrote:

Ellinghaus, Lance wrote:
I agree that it does not really work as most people think it does, but
how
would you perform the same task as reload() without the reload()?
Would this be done by "del sys.modules['modulename']" and then perform an
'import'?

Ah, interesting! I've been doing this:

del modulename; import modulename

but it doesn't pick up any recent changes to the modulename.py file.

That's a better solution than exiting Python and starting it up again.

Just to clarify, *neither* of the above solutions does anything useful,
as far as I know. Only reload() will really reload a module. Certainly
the del mod/import mod approach is practically a no-op...

-Peter

Jul 18 '05 #5

On Fri, 12 Mar 2004 16:27:36 +1100, us**@domain.invalid wrote:

Ellinghaus, Lance wrote:
I agree that it does not really work as most people think it does, but how
would you perform the same task as reload() without the reload()?

Would this be done by "del sys.modules['modulename']" and then perform an
'import'?

Ah, interesting! I've been doing this:

del modulename; import modulename

but it doesn't pick up any recent changes to the
modulename.py file.

That's a better solution than exiting Python and
starting it up again. Thanks!

This doesn't work. It doesn't matter if you first "del modulename".
All that does is remove the reference to "modulename" from the current
namespace.

testimport.py
a = 8
print 'a =', a

import testimport a = 8 ## ==> change a to 9 in testimport.py
testimport.a 8 dir() ['__builtins__', '__doc__', '__name__', 'testimport'] del testimport
dir() ['__builtins__', '__doc__', '__name__'] import testimport
dir() ['__builtins__', '__doc__', '__name__', 'testimport'] testimport.a 8 <== The old module is still in memory !!! a = testimport.a
dir() ['__builtins__', '__doc__', '__name__', 'a', 'testimport'] reload(testimport) a = 9 testimport.a 9 <== reload() changes new fully-qualified references a 8 <== but not any other references that were set up before.

I know this is the *intended* behavior of reload(), but it has always
seemed to me like a bug. Why would you *ever* want to keep pieces of
an old module when that module is reloaded?

Seems to me we should *fix* reload, not deprecate it.

-- Dave

Jul 18 '05 #6

David MacQuigg wrote:

I know this is the *intended* behavior of reload(), but it has always
seemed to me like a bug. Why would you *ever* want to keep pieces of
an old module when that module is reloaded?
If you wanted a dynamic module, where changes would be picked up by the
application from time to time, but didn't want to terminate existing
instances of the old version. There's nothing wrong with the idea that
both old and new versions of something could exist in memory, at least
for a while until the old ones are finished whatever they are doing.

Basically, envision an application update mechanism for long-running
applications, and its special needs.
Seems to me we should *fix* reload, not deprecate it.

Reload is not broken, and certainly shouldn't be deprecated at least
until there's a better solution that won't suffer from reload's one
problem, IMHO, which is that it surprises some people by its behaviour.
I think that when you consider Python's namespace mechanism, you can't
avoid the possibility of situations like the ones reload can now lead to.

-Peter

Jul 18 '05 #7

"Ellinghaus, Lance" <la**************@eds.com> writes:

Other surprises: Deprecating reload()
Reload doesn't work the way most people think
it does: if you've got any references to the old module,
they stay around. They aren't replaced.

It was a good idea, but the implementation simply
doesn't do what the idea promises.

I missed these mails...
I agree that it does not really work as most people think it does, but how
would you perform the same task as reload() without the reload()?

Would this be done by "del sys.modules['modulename']" and then perform an
'import'?

I use reload() for many purposes, knowing how it works/does not work.

To paraphrase Tim Peters, you can have reload() when I'm dead. Just
because it doesn't and never has done what some people expect is no
argument at all for deprecating it.

Cheers,
mwh

--
But maybe I've just programmed in enough different languages to
assume that they are, in fact, different.
-- Tony J Ibbs explains why Python isn't Java on comp.lang.python

Jul 18 '05 #8

Regarding

del sys.modules['modulename']; import modulename
vs.

del modulename ; import modulename

Peter sez:

Peter> Just to clarify, *neither* of the above solutions does anything
Peter> useful, as far as I know. Only reload() will really reload a
Peter> module. Certainly the del mod/import mod approach is practically
Peter> a no-op...

Not so. del sys.modules['mod']/import mod is effectively what reload()
does. Given foo.py with this content:

a = 1
print a

here's a session which shows that it is outwardly the same as reload(...).
That is, the module-level code gets reexecuted (precisely because there's
not already a reference in sys.modules).

import foo 1 reload(foo) 1
<module 'foo' from 'foo.pyc'> import foo
import sys
del sys.modules['foo']
import foo

1

David MacQuigg's contention that it doesn't work is missing the fact that
when he executed

a = testimport.a

he simply created another reference to the original object which was outside
the scope of the names in testimport. Reloading testimport created a new
object and bound "testimport.a" to it. That has nothing to do with the
object to which "a" is bound. This is the problem reload() doesn't solve.

Given Python's current object model it would be an interesting challenge to
write a "super reload" which could identify all the objects created as a
side effect of importing a module and for those with multiple references,
locate those other references by traversing the known object spaces, then
perform the import and finally rebind the references found in the first
step. I thought I saw something like this posted in a previous thread on
this subject.

In case you're interested in messing around with this, there are three
object spaces to be careful of, builtins (which probably won't hold any
references), all the imported module globals (which you can find in
sys.modules) and all the currently active local Python functions (this is
where it gets interesting ;-). There's a fourth set of references
- those held by currently active functions which were defined in
extension modules - but I don't think you can get to them from pure Python
code. Whether or not that's a serious enough problem to derail the quest
for super_reload() function is another matter. Most of the time it probably
won't matter. Every once in a great while it will probably bite you in the
ass. Accordingly, if super_reload() can't track down all the references
which need rebinding it should probably raise a warning/exception or return
a special value to indicate that.

Skip

Jul 18 '05 #9

On Fri, 12 Mar 2004 08:45:24 -0500, Peter Hansen <pe***@engcorp.com>
wrote:

David MacQuigg wrote:
I know this is the *intended* behavior of reload(), but it has always
seemed to me like a bug. Why would you *ever* want to keep pieces of
an old module when that module is reloaded?
If you wanted a dynamic module, where changes would be picked up by the
application from time to time, but didn't want to terminate existing
instances of the old version. There's nothing wrong with the idea that
both old and new versions of something could exist in memory, at least
for a while until the old ones are finished whatever they are doing.

OK, I agree this is a valid mode of operation.
Basically, envision an application update mechanism for long-running
applications, and its special needs.
I think a good example might be some objects that need to retain the
state they had before the reload.

Seems to me we should *fix* reload, not deprecate it.

Reload is not broken, and certainly shouldn't be deprecated at least
until there's a better solution that won't suffer from reload's one
problem, IMHO, which is that it surprises some people by its behaviour.

It's worse than just a surprise. It's a serious problem when what you
need to do is what most people are expecting -- replace every
reference to objects in the old module with references to the new
objects. The problem becomes a near impossibility when those
references are scattered throughout a multi-module program.

For the *exceptional* case where we want to pick and chose which
objects to update, seems like the best solution would be to add some
options to the reload function.

def reload(<module>, objects = '*'):

The default second argument '*' updates all references. '?' prompts
for each object. A list of objects updates references to just those
objects automatically. 'None' would replicate the current behavior,
updating only the name of the module itself.

The '?' mode would normally ask one question for each object in the
module to be reloaded, and then update all references to the selected
objects. We could even have a '??' mode that would prompt for each
*reference* to each object.

In a typical debug session, there is only one object that you have
updated, so I think there would seldom be a need to enter more than
one name as the second argument.
I think that when you consider Python's namespace mechanism, you can't
avoid the possibility of situations like the ones reload can now lead to.

I don't understand. My assumption is you would normally update all
references to the selected objects in all namespaces.

-- Dave

Jul 18 '05 #10

Mel Wilson

In article <2j********************************@4ax.com>,
David MacQuigg <dm*@gain.com> wrote:

On Thu, 11 Mar 2004 15:10:59 -0500, "Ellinghaus, Lance"
<la**************@eds.com> wrote:
> Other surprises: Deprecating reload()

Reload doesn't work the way most people think
it does: if you've got any references to the old module,
they stay around. They aren't replaced.

It was a good idea, but the implementation simply
doesn't do what the idea promises.

I agree that it does not really work as most people think it does, but how
would you perform the same task as reload() without the reload()?

Seems like reload() could be *made* to work by scanning currently
loaded modules for all references to objects in the reloaded module,
and resetting them to the new objects. If an object is missing from
the new module, throw an exception.

I don't quite get this. I don't see objects _in_ a
module. I see objects referenced from the modules
namespace, but they can be referenced from other modules'
namespaces at the same time. Who is to say which module the
objects are *in*?

e.g. (untested)

#===============================
"M1.py"

f1 = file ('a')
f2 = file ('b')
#===============================
"main.py"
import M1

a_file = M1.f1
another = file ('c')
M1.f2 = another
reload (M1)

Regards. Mel.

Jul 18 '05 #11

On Fri, 12 Mar 2004 08:20:14 -0600, Skip Montanaro <sk**@pobox.com>
wrote:

Given Python's current object model it would be an interesting challenge to
write a "super reload" which could identify all the objects created as a
side effect of importing a module and for those with multiple references,
locate those other references by traversing the known object spaces, then
perform the import and finally rebind the references found in the first
step. I thought I saw something like this posted in a previous thread on
this subject.

I'm not familiar with the internals of Python, but I was assuming that
all objects could be easily associated with the module which created
them. If not, maybe what we need is the ability to put a selected
module in "debug" mode and keep a list of all objects created by that
module and all references to those objects. That would add a little
overhead, but avoid the difficulties of searching all object spaces
with every reload.

-- Dave

Jul 18 '05 #12

John Roth

"David MacQuigg" <dm*@gain.com> wrote in message
news:hg********************************@4ax.com...

On Fri, 12 Mar 2004 08:20:14 -0600, Skip Montanaro <sk**@pobox.com>
wrote:
Given Python's current object model it would be an interesting challenge towrite a "super reload" which could identify all the objects created as a
side effect of importing a module and for those with multiple references,
locate those other references by traversing the known object spaces, then
perform the import and finally rebind the references found in the first
step. I thought I saw something like this posted in a previous thread on
this subject.
I'm not familiar with the internals of Python, but I was assuming that
all objects could be easily associated with the module which created
them. If not, maybe what we need is the ability to put a selected
module in "debug" mode and keep a list of all objects created by that
module and all references to those objects. That would add a little
overhead, but avoid the difficulties of searching all object spaces
with every reload.

That's actually the wrong end of the problem. Even if you
could associate all the objects with the module that created
them, you would still have to find all the references to those
modules. That's the harder of the two tasks.

It's actually relatively easy to find the objects that would have
to be replaced: it's all of the objects that are bound at the module
level in the module you're replacing. Since CPython uses memory
addresses as IDs, it's trivially easy to stick them in a dictionary and
compare them while scanning all the places they could possibly be
bound.

John Roth
-- Dave

Jul 18 '05 #13

Skip Montanaro wrote:

Regarding
>> del sys.modules['modulename']; import modulename
vs.
>> del modulename ; import modulename

Peter sez:

Peter> Just to clarify, *neither* of the above solutions does anything
Peter> useful, as far as I know. Only reload() will really reload a
Peter> module. Certainly the del mod/import mod approach is practically
Peter> a no-op...

Not so. del sys.modules['mod']/import mod is effectively what reload()
does.

Oops, sorry. On reflection that seems logical, but I thought there was
a little more to reload() than just this. :-(

-Peter

Jul 18 '05 #14

John Roth

"David MacQuigg" <dm*@gain.com> wrote in message
news:8o********************************@4ax.com...

On Fri, 12 Mar 2004 08:45:24 -0500, Peter Hansen <pe***@engcorp.com>
wrote:
David MacQuigg wrote:
I know this is the *intended* behavior of reload(), but it has always
seemed to me like a bug. Why would you *ever* want to keep pieces of
an old module when that module is reloaded?
If you wanted a dynamic module, where changes would be picked up by the
application from time to time, but didn't want to terminate existing
instances of the old version. There's nothing wrong with the idea that
both old and new versions of something could exist in memory, at least
for a while until the old ones are finished whatever they are doing.

OK, I agree this is a valid mode of operation.
Basically, envision an application update mechanism for long-running
applications, and its special needs.

I think a good example might be some objects that need to retain the
state they had before the reload.
Seems to me we should *fix* reload, not deprecate it.

Reload is not broken, and certainly shouldn't be deprecated at least
until there's a better solution that won't suffer from reload's one
problem, IMHO, which is that it surprises some people by its behaviour.

It's worse than just a surprise. It's a serious problem when what you
need to do is what most people are expecting -- replace every
reference to objects in the old module with references to the new
objects. The problem becomes a near impossibility when those
references are scattered throughout a multi-module program.

For the *exceptional* case where we want to pick and chose which
objects to update, seems like the best solution would be to add some
options to the reload function.

def reload(<module>, objects = '*'):

The default second argument '*' updates all references. '?' prompts
for each object. A list of objects updates references to just those
objects automatically. 'None' would replicate the current behavior,
updating only the name of the module itself.

The '?' mode would normally ask one question for each object in the
module to be reloaded, and then update all references to the selected
objects. We could even have a '??' mode that would prompt for each
*reference* to each object.

In a typical debug session, there is only one object that you have
updated, so I think there would seldom be a need to enter more than
one name as the second argument.
I think that when you consider Python's namespace mechanism, you can't
avoid the possibility of situations like the ones reload can now lead

to.
I don't understand. My assumption is you would normally update all
references to the selected objects in all namespaces.
I can certainly see wanting to do that in a debug session - or even
with an editor that's broken the way PythonWin is broken (it doesn't
reload the top level module when you say Run unless it's been updated,
so it never reloads the lower level modules even if they've been updated.)

However, I don't remember the last time I used the debugger,
it's been so long. When you use TDD to develop, you find that
the debugger becomes a (thankfully) long ago memory.

However, if you want to be able to replace a module in a
running program, I suspect you'd be much better off designing
your program to make it easy, rather than depending on the
system attempting to find all the references.

John Roth

-- Dave

Jul 18 '05 #15

David MacQuigg wrote:

On Fri, 12 Mar 2004 08:45:24 -0500, Peter Hansen <pe***@engcorp.com>
wrote:
Reload is not broken, and certainly shouldn't be deprecated at least
until there's a better solution that won't suffer from reload's one
problem, IMHO, which is that it surprises some people by its behaviour.

It's worse than just a surprise. It's a serious problem when what you
need to do is what most people are expecting -- replace every
reference to objects in the old module with references to the new
objects. The problem becomes a near impossibility when those
references are scattered throughout a multi-module program.

I don't consider this a problem with reload, I consider it a design
defect. If there's a need for such a thing, it should be designed in to
the application, and certainly one would remove the "scattering" of
objects such as these which are about to be replaced en masse.

I think many applications would be inherently broken if a programmer
thought a simple "reload" of the style you envision would work without
serious but possibly quite subtle side effects.

I think that when you consider Python's namespace mechanism, you can't
avoid the possibility of situations like the ones reload can now lead to.

I don't understand. My assumption is you would normally update all
references to the selected objects in all namespaces.

I guess we're coming at this from different viewpoints. My comments
above should probably explain why I said that. Basically, it seems to
me very unlikely there are good use cases for wanting to update the
classes behind the backs of objects regardless of where references to
them are bound. I'm open to suggestions though.

-Peter

Jul 18 '05 #16

Reload is not broken, and certainly shouldn't be deprecated at least
until there's a better solution that won't suffer from reload's one
problem, IMHO, which is that it surprises some people by its
behaviour.

David> It's worse than just a surprise. It's a serious problem when
David> what you need to do is what most people are expecting -- replace
David> every reference to objects in the old module with references to
David> the new objects. The problem becomes a near impossibility when
David> those references are scattered throughout a multi-module program.

This is where I think your model of how Python works has broken down.
Objects don't live within modules. References to objects do. All objects
inhabit a space not directly associated with any particular Python
namespace. If I execute

import urllib
quote = urllib.quote
reload(urllib)

quote and urllib.quote refer to different objects and compare False. (I
suppose the cmp() routine for functions could compare code objects and other
attributes which might change.) The object referred to by urllib.quote is
not bound in any other way to the urllib module other than the reference
named "quote" in the urllib module's dict. reload() simply decrements the
reference count to urllib, which decrements the reference count to its dict,
which causes it to clean up and decrement the reference count for all its
key/value pairs.

David> For the *exceptional* case where we want to pick and chose which
David> objects to update, seems like the best solution would be to add
David> some options to the reload function.

David> def reload(<module>, objects = '*'):

David> The default second argument '*' updates all references. '?'
David> prompts for each object. A list of objects updates references to
David> just those objects automatically. 'None' would replicate the
David> current behavior, updating only the name of the module itself.

If you want to implement this and all you're interested in are class,
function and method definitions and don't care about local variables I think
you can make a reasonable stab at this. I wouldn't recommend you overload
reload() with this code, at least not initially. Write a super_reload() in
Python. If/when you get that working, then rewrite it in C, then get that
working, then you're free to recommend that the builtin reload() function be
modified.

David> I don't understand. My assumption is you would normally update
David> all references to the selected objects in all namespaces.

As Denis Otkidach pointed out in response to an earlier post of mine,
objects on shared free lists can't be rebound using this mechanism. I think
it's safe initially to restrict super_reload() to classes, functions and
methods.

Skip

Jul 18 '05 #17

On Fri, 12 Mar 2004 13:34:36 -0500, Peter Hansen <pe***@engcorp.com>
wrote:

David MacQuigg wrote:
On Fri, 12 Mar 2004 08:45:24 -0500, Peter Hansen <pe***@engcorp.com>
wrote:
Reload is not broken, and certainly shouldn't be deprecated at least
until there's a better solution that won't suffer from reload's one
problem, IMHO, which is that it surprises some people by its behaviour.

It's worse than just a surprise. It's a serious problem when what you
need to do is what most people are expecting -- replace every
reference to objects in the old module with references to the new
objects. The problem becomes a near impossibility when those
references are scattered throughout a multi-module program.

I don't consider this a problem with reload, I consider it a design
defect. If there's a need for such a thing, it should be designed in to
the application, and certainly one would remove the "scattering" of
objects such as these which are about to be replaced en masse.

I agree, most programs should not have 'reload()' designed in, and
those that do, should be well aware of its limitations. I'm concerned
more about interactive use, specifically of programs which cannot be
conveniently restarted from the beginning. I guess I'm spoiled by HP
BASIC, where you can change the program statements while the program
is running! (half wink)

I think that when you consider Python's namespace mechanism, you can't
avoid the possibility of situations like the ones reload can now lead to.

I don't understand. My assumption is you would normally update all
references to the selected objects in all namespaces.

I guess we're coming at this from different viewpoints. My comments
above should probably explain why I said that. Basically, it seems to
me very unlikely there are good use cases for wanting to update the
classes behind the backs of objects regardless of where references to
them are bound. I'm open to suggestions though.

Objects derived from classes are a different, and probably unsolvable
problem. Attempting to update those would be like trying to put the
program in the state it would have been, had the module changes been
done some time in the past. We would have to remember the values at
the time the object was created of all variables that went into the
__init__ call. Classes don't have this problem, and they should be
updatable.

Here is a use-case for classes. I've got hundreds of variables in a
huge hierarchy of "statefiles". In my program, that hierarchy is
handled as a hierarchy of classes. If I want to access a particular
variable, I say something like:
wavescan.window1.plot2.xaxis.label.font.size = 12
These classes have no methods, just names and values and other
classes.

If I reload a module that changes some of those variables, I would
like to not have to hunt down every reference in the running program
and change it manually.

-- Dave

Jul 18 '05 #18

On Fri, 12 Mar 2004 11:42:06 -0600, Skip Montanaro <sk**@pobox.com>
wrote:

David> It's worse than just a surprise. It's a serious problem when
David> what you need to do is what most people are expecting -- replace
David> every reference to objects in the old module with references to
David> the new objects. The problem becomes a near impossibility when
David> those references are scattered throughout a multi-module program.

This is where I think your model of how Python works has broken down.
Objects don't live within modules. References to objects do. All objects
inhabit a space not directly associated with any particular Python
namespace. If I execute

import urllib
quote = urllib.quote
reload(urllib)

quote and urllib.quote refer to different objects and compare False. (I
suppose the cmp() routine for functions could compare code objects and other

[ snip ]

Understood. When I said "objects in the old module", I should have
said "objects from the old module". I wasn't making any assumption
about where these objects reside once loaded. I'm still assuming it
is possible ( even if difficult ) to locate and change all current
references to these objects. This may require a special "debug" mode
to keep track of this information.

Another "brute force" kind of solution would be to replace the old
objects with links to the new. Every refence, no matter where it came
from, would be re-routed. The inefficiency would only last until you
restart the program.

-- Dave

Jul 18 '05 #19

Paul Miller

It's worse than just a surprise. It's a serious problem when what you
need to do is what most people are expecting -- replace every
reference to objects in the old module with references to the new
objects. The problem becomes a near impossibility when those
references are scattered throughout a multi-module program.

....
I don't consider this a problem with reload, I consider it a design
defect. If there's a need for such a thing, it should be designed in to

....
I agree, most programs should not have 'reload()' designed in, and
those that do, should be well aware of its limitations. I'm concerned

I've been working around the problem with reload by loading my modules into
self-contained interpreters, using the multiple interpreter API. This all
worked wonderfully until Python 2.3, where it all sort of broke.

I would be able to do to all with one interpreter, *if* reload did "what I
expect".

Clearly, there needs to be SOME solution to this problem, as I'm definitely
not the only person trying to do this.

(the context comes from wanting to use Python modules as "plugins" to a C++
application - and to aid in development, be able to reload plugins on the
fly if there code is changed).

Jul 18 '05 #20

Dave> Another "brute force" kind of solution would be to replace the old
Dave> objects with links to the new. Every refence, no matter where it
Dave> came from, would be re-routed. The inefficiency would only last
Dave> until you restart the program.

That would require that you be able to transmogrify an object into a proxy
of some sort without changing its address. In theory I think this could be
done, but not in pure Python. It would require a helper written in C.

Skip

Jul 18 '05 #21

On Fri, 12 Mar 2004 11:42:06 -0600, Skip Montanaro <sk**@pobox.com>
wrote:

>> Reload is not broken, and certainly shouldn't be deprecated at least
>> until there's a better solution that won't suffer from reload's one
>> problem, IMHO, which is that it surprises some people by its
>> behaviour.

I've written a short description of what reload() does to try and help
reduce the confusion. This is intended for EEs who are new to Python.
Please see
http://ece.arizona.edu/~edatools/Python/Reload.htm

I've also started a new thread to discuss this. See "Reload()
Confusion" Comments are welcome.

-- Dave

Jul 18 '05 #22

Just an FYI, I didn't write this statement:

Dave> On Fri, 12 Mar 2004 11:42:06 -0600, Skip Montanaro <sk**@pobox.com>
Dave> wrote:

>> Reload is not broken, and certainly shouldn't be deprecated at least
>> until there's a better solution that won't suffer from reload's one
>> problem, IMHO, which is that it surprises some people by its
>> behaviour.

Dave> I've written a short description of what reload() does to try and
Dave> help reduce the confusion. This is intended for EEs who are new
Dave> to Python.

I'm not sure why you're planning to teach them reload(). I've used it
rarely in about ten years of Python programming. Its basic semantics are
straightforward, but as we've seen from the discussions in this thread
things can go subtly awry. Just tell people to either not create references
which refer to globals in other modules (e.g. "quote = urllib.quote") if
they intend to use reload() or tell them to just exit and restart their
application, at least until they understand the limitations of trying to
modify a running Python program.

Skip

Jul 18 '05 #23

On Fri, 12 Mar 2004 21:56:24 -0600, Skip Montanaro <sk**@pobox.com>
wrote:

Dave> I've written a short description of what reload() does to try and
Dave> help reduce the confusion. This is intended for EEs who are new
Dave> to Python.

I'm not sure why you're planning to teach them reload(). I've used it
rarely in about ten years of Python programming. Its basic semantics are
straightforward, but as we've seen from the discussions in this thread
things can go subtly awry. Just tell people to either not create references
which refer to globals in other modules (e.g. "quote = urllib.quote") if
they intend to use reload() or tell them to just exit and restart their
application, at least until they understand the limitations of trying to
modify a running Python program.

I don't think we can avoid reload(). A typicial design session has
several tools running, and it is a real pain to restart. Design
engineers often leave sessions open for several days.

What I will try to do is write the modules that are likely to be
reloaded in a way that minimizes the problems, accessing objects in
those modules *only* via their fully-qualified names, etc.

Again, these are interactive sessions. I don't think I will need
reload() as part of the code.

-- Dave

Jul 18 '05 #24

On Fri, 12 Mar 2004 15:17:54 -0600, Skip Montanaro <sk**@pobox.com>
wrote:

Dave> Another "brute force" kind of solution would be to replace the old
Dave> objects with links to the new. Every refence, no matter where it
Dave> came from, would be re-routed. The inefficiency would only last
Dave> until you restart the program.

That would require that you be able to transmogrify an object into a proxy
of some sort without changing its address. In theory I think this could be
done, but not in pure Python. It would require a helper written in C.

How about if we could just show the reference counts on all of the
reloaded objects? That way we could know if we've missed one in our
manual search and update. Could avoid the need for transmogrification
of objects. :>)

-- Dave

Jul 18 '05 #25

Dave> How about if we could just show the reference counts on all of the
Dave> reloaded objects?

That wouldn't work for immutable objects which can be shared. Ints come to
mind, but short strings are interned, some tuples are shared, maybe some
floats, and of course None, True and False are. You will have to define a
subset of object types to display.

Skip

Jul 18 '05 #26

On Sat, 13 Mar 2004 10:30:04 -0600, Skip Montanaro <sk**@pobox.com>
wrote:

Dave> How about if we could just show the reference counts on all of the
Dave> reloaded objects?

That wouldn't work for immutable objects which can be shared. Ints come to
mind, but short strings are interned, some tuples are shared, maybe some
floats, and of course None, True and False are. You will have to define a
subset of object types to display.

Just to make sure I understand this, I think what you are saying is
that if I have a module M1 that defines a value x = 3.1, it will be
impossible to keep track of the number of references to M1.x because
the object '3.1' may have other references to it from other modules
which use the same constant 3.1. This really does make it impossible
to do a complete reload.

I'm not sure at this point if an improved reload() is worth pursuing,
but perhaps we could do something with a "debug" mode in which the
minuscule benefit of creating these shared references is bypassed, at
least for the modules that are in "debug mode". Then it would be
possible to check after each reload and pop a warning:

reload(M1) <module 'M1' from 'M1.pyc'>
Warning: References to objects in the old module still exist:
M1.x (3)
M1.y (2)

Even if we don't update all the references after a reload, it would
sure be nice to have a warning like this. We could then avoid
creating direct (not fully-qualified) references to objects within any
module that is likely to be reloaded, and be assured that we will get
a warning if we miss one.

-- Dave

Jul 18 '05 #27

David> I'm not sure at this point if an improved reload() is worth
David> pursuing, ...

I wrote something and threw it up on my Python Bits page:

http://www.musi-cal.com/~skip/python/

See if it suits your needs.

Skip

Jul 18 '05 #28

Terry Reedy

"David MacQuigg" <dm*@gain.com> wrote in message
news:6k********************************@4ax.com...

Just to make sure I understand this, I think what you are saying is
that if I have a module M1 that defines a value x = 3.1, it will be
impossible to keep track of the number of references to M1.x because
the object '3.1' may have other references to it from other modules
which use the same constant 3.1. This really does make it impossible
to do a complete reload.
Currently, this is possible but not actual for floats, but it is actual, in
CPython, for some ints and strings. For a fresh 2.2.1 interpreter

sys.getrefcount(0) 52 sys.getrefcount(1) 50 sys.getrefcount('a')

7
Warning: References to objects in the old module still exist: creating direct (not fully-qualified) references to objects within any
module that is likely to be reloaded, and be assured that we will get
a warning if we miss one.

A module is a namespace created by external code, resulting in a namespace
with a few special attributes like __file__, __name__, and __doc__. A
namespace contains names, or if you will, name bindings. It does not,
properly speaking, contain objects -- which are in a separate, anonymous
data space. One can say, reasonably, that functions and classes defined in
a module 'belong' to that module, and one could, potentially, track down
and replace all references to such.

As you have already noticed, you can make this easier by always accessing
the functions and classess via the module (mod.fun(), mod.clas(), etc.) --
which mean no anonymous references via tuple, list, or dict slots, etc.

However, there is still the problem of instances and their __class__
attribute. One could, I believe (without trying it) give each class in a
module an __instances__ list that is updated by each call to __init__.
Then super_reload() could grab the instances lists, do a normal reload, and
then update the __instances__ attributes of the reloaded classes and the
__class__ attributes of the instances on the lists. In other words,
manually rebind instances to new classes and vice versa.

Another possibility (also untried) might be to reimport the module as
'temp' (for instance) and then manully replace, in the original module,
each of the function objects and other constants and manually update each
of the class objects. Then instance __class__ attributes would remain
valid.

Either method is obviously restricted to modules given special treatment
and planning. Either update process also needs to be 'atomic' from the
viewpoint of Python code. A switch to another Python thread that accesses
the module in the middle of the process would not be good. There might
also be other dependencies I am forgetting, but the above should be a
start.

Terry J. Reedy

Jul 18 '05 #29

On Sat, 13 Mar 2004 14:27:00 -0600, Skip Montanaro <sk**@pobox.com>
wrote:

David> I'm not sure at this point if an improved reload() is worth
David> pursuing, ...

I wrote something and threw it up on my Python Bits page:

http://www.musi-cal.com/~skip/python/

I get AttributeErrors when I try the super_reload function. Looks like
sys.modules has a bunch of items with no '__dict__'.

I'll work with Skip via email.

-- Dave

Jul 18 '05 #30

I wrote something and threw it up on my Python Bits page:

http://www.musi-cal.com/~skip/python/

Dave> I get AttributeErrors when I try the super_reload function. Looks
Dave> like sys.modules has a bunch of items with no '__dict__'.

You can put objects in sys.modules which are not module objects. I updated
the code to use getattr() and setattr() during the rebinding step. I think
that will help, though of course this entire exercise is obviously only an
approximation to a solution.

Skip

Jul 18 '05 #31

On Sun, 14 Mar 2004 02:51:13 -0500, "Terry Reedy" <tj*****@udel.edu>
wrote:

"David MacQuigg" <dm*@gain.com> wrote in message
news:6k********************************@4ax.com.. .
Just to make sure I understand this, I think what you are saying is
that if I have a module M1 that defines a value x = 3.1, it will be
impossible to keep track of the number of references to M1.x because
the object '3.1' may have other references to it from other modules
which use the same constant 3.1. This really does make it impossible
to do a complete reload.
Currently, this is possible but not actual for floats, but it is actual, in
CPython, for some ints and strings. For a fresh 2.2.1 interpreter
sys.getrefcount(0)52 sys.getrefcount(1)50 sys.getrefcount('a')

7

I'm amazed how many of these shared references there are.

[snip]
However, there is still the problem of instances and their __class__
attribute. One could, I believe (without trying it) give each class in a
module an __instances__ list that is updated by each call to __init__.
Then super_reload() could grab the instances lists, do a normal reload, and
then update the __instances__ attributes of the reloaded classes and the
__class__ attributes of the instances on the lists. In other words,
manually rebind instances to new classes and vice versa.

We need to draw a clean line between what gets updated and what
doesn't. I would not update instances, because in general, that will
be impossible. Here is a section from my update on Reload Basics at
http://ece.arizona.edu/~edatools/Python/Reload.htm I need to provide
my students with a clear explanation, hopefully with sensible
motivation, for what gets updated and what does not. Comments are
welcome.

Background on Reload
Users often ask why doesn't reload just "do what we expect" and update
everything. The fundamental problem is that the current state of
objects in a running program can be dependent on the conditions which
existed when the object was created, and those conditions may have
changed. Say you have in your reloaded module:

class C1:
def __init__(self, x, y ):
...

Say you have an object x1 created from an earlier version of class C1.
The current state of x1 depends on the values of x and y at the time
x1 was created. Asking reload to "do what we expect" in this case, is
asking to put the object x1 into the state it would be now, had we
made the changes in C1 earlier.

If you are designing a multi-module program, *and* users may need to
reload certain modules, *and* re-starting everything may be
impractical, then you should avoid any direct references to objects
within the modules to be reloaded. Direct references are created by
statements like 'x = M1.x' or 'from M1 import x'. Always access these
variables via the fully-qualified names, like M1.x, and you will avoid
leftover references to old objects after a reload. This won't solve
the object creation problem, but at least it will avoid some surprises
when you re-use the variable x.

--- end of section ---

I *would* like to do something about numbers and strings and other
shared objects not getting updated, because that is going to be hard
to explain. Maybe we could somehow switch off the generation of
shared objects for modules in a 'debug' mode.

-- Dave

Jul 18 '05 #32

John Roth

"David MacQuigg" <dm*@gain.com> wrote in message
news:rh********************************@4ax.com...

I *would* like to do something about numbers and strings and other
shared objects not getting updated, because that is going to be hard
to explain. Maybe we could somehow switch off the generation of
shared objects for modules in a 'debug' mode.
It doesn't matter if numbers and strings get updated. They're
immutable objects, so one copy of a number is as good as
another. In fact, that poses a bit of a problem since quite
a few of them are singletons. There's only one object that
is an integer 1 in the system, so if the new version changes
it to, say 2, and you go around and rebind all references to
1 to become references to 2, you might have a real mess
on your hands.

On the other hand, if you don't rebind the ones that came out
of the original version of the module, you've got a different
mess on your hands.

John Roth
-- Dave

Jul 18 '05 #33

Greg Ewing (using news.cis.dfn.de)

Skip Montanaro wrote:

Not so. del sys.modules['mod']/import mod is effectively what reload()
does.

Not quite -- reload() keeps the existing module object and changes
its contents, whereas the above sequence creates a new module
object.

The difference will be apparent if any other modules have done
'import mod' before the reload.

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg

Jul 18 '05 #34

Hung Jung Lu

> >> On Fri, 12 Mar 2004 08:45:24 -0500, Peter Hansen <pe***@engcorp.com>

wrote:

It's worse than just a surprise. It's a serious problem when what you
need to do is what most people are expecting -- replace every
reference to objects in the old module with references to the new
objects. The problem becomes a near impossibility when those
references are scattered throughout a multi-module program.

You could use a class instead of a module. I have done that kind of
thing with classes and weakrefs. By the way, it kind of surprises me
that no one has mentioned weakref in this thread. It's not too hard to
keep a list of weakrefs, and everytime an object is created, you
register it with that list. Now, when the new class comes in (e.g. via
reload(),) you get the list of weakrefs of the existing objects, and
re-assign their __class__, and voila, dynamic change of class
behavior. Of course, if you spend some time and push this feature into
the metaclass, everytime becomes even easier.

But it is true that in Python you have to implement dynamic refreshing
of behavior (module or class) explicitly, whereas in Ruby, as I
understand, class behavior refreshing is automatic.

David MacQuigg <dm*@gain.com> wrote in message news:<ba********************************@4ax.com>. ..
I agree, most programs should not have 'reload()' designed in, and
those that do, should be well aware of its limitations. I'm concerned
more about interactive use, specifically of programs which cannot be
conveniently restarted from the beginning. I guess I'm spoiled by HP
BASIC, where you can change the program statements while the program
is running! (half wink)
Edit-and-continue. Which is kind of important. For instance, I often
have to load in tons of data from the database, do some initial
processing, and then do the actual calculations. Or in game
programming, where you have to load up a lot of things, play quite a
few initial steps, before you arrive at the point of your interest.
Now, in these kinds of programs, where the initial state preparation
takes a long time, you really would like some "edit-and-continue"
feature while developing the program.

For GUI programs, edit-and-continue is also very helpful during
development.

And for Web applications, actually most CGI-like programs (EJB in Java
jargon, or external methods in Zope) are all reloadable while the
web/app server is running. Very often you can do open-heart surgery on
web/app servers, while the website is running live.
Here is a use-case for classes. I've got hundreds of variables in a
huge hierarchy of "statefiles". In my program, that hierarchy is
handled as a hierarchy of classes. If I want to access a particular
variable, I say something like:
wavescan.window1.plot2.xaxis.label.font.size = 12
These classes have no methods, just names and values and other
classes.
"State file" reminds me of a programming paradigm based on REQUEST and
RESPONSE. (Sometimes REQUEST alone.) Basically, your program's
information is stored in a single "workspace" object. The advantage of
this approach is: (1) All function/method calls at higher level have
unique "header", something like f(REQUEST, RESPONSE), or f(REQUEST),
and you will never need to worry about header changes. (2) The REQUEST
and/or RESPONSE object could be serialized and stored on disk, or
passed via remote calls to other computers. Since they can be
serialized, you can also intercept/modify the content and do unit
testing. This is very important in programs that take long time to
build up initial states. Basically, once you are able to serialize and
cache the state on disk (and even modify the states offline), then you
can unit test various parts of your program WITHOUT having to start
from scratch. Some people use XML for serialization to make state
modification even easier, but any other serialization format is just
as fine. This approach is also good when you want/need some
parallel/distributed computing down the future, since the serialized
states could be potentially be dispatched independently.

Today's file access time is so fast that disk operations are often
being sub-utilized. In heavy numerical crunching, having a "workspace"
serialization can make development and debugging a lot less painful.
If I reload a module that changes some of those variables, I would
like to not have to hunt down every reference in the running program
and change it manually.

In Python the appropriate tool is weakref. Per each class that
matters, keep a weakref list of the instances. This way you can
automate the refresing. I've done that before.

regards,

Hung Jung

Jul 18 '05 #35

Hung Jung> But it is true that in Python you have to implement dynamic
Hung Jung> refreshing of behavior (module or class) explicitly, whereas
Hung Jung> in Ruby, as I understand, class behavior refreshing is
Hung Jung> automatic.

That has its own attendant set of problems. If an instance's state is
created with an old version of a class definition, then updated later to
refer to a new version, who's to say that the current state of the instance
is what you would have obtained had the instance been created using the new
class from the start?

Skip

Jul 18 '05 #36

Dave> Maybe we could somehow switch off the generation of shared objects
Dave> for modules in a 'debug' mode.

You'd have to disable the integer free list. There's also code in
tupleobject.c to recognize and share the empty tuple. String interning
could be disabled as well. Everybody's ignored the gorilla in the room:

sys.getrefcount(None)

1559

In general, I don't think that disabling immutable object sharing would be
worth the effort. Consider the meaning of module level integers. In my
experience they are generally constants and are infrequently changed once
set. Probably the only thing worth tracking down during a super reload
would be function, class and method definitions.

Skip

Jul 18 '05 #37

Skip Montanaro <sk**@pobox.com> writes:

Not so. del sys.modules['mod']/import mod is effectively what
reload() does.

It's more like 'exec mod.__file__[:-1] in mod.__dict__", actually.

Cheers,
mwh

--
I don't have any special knowledge of all this. In fact, I made all
the above up, in the hope that it corresponds to reality.
-- Mark Carroll, ucam.chat

Jul 18 '05 #38

David MacQuigg <dm*@gain.com> writes:

On Sat, 13 Mar 2004 14:27:00 -0600, Skip Montanaro <sk**@pobox.com>
wrote:
David> I'm not sure at this point if an improved reload() is worth
David> pursuing, ...

I wrote something and threw it up on my Python Bits page:

http://www.musi-cal.com/~skip/python/

I get AttributeErrors when I try the super_reload function. Looks like
sys.modules has a bunch of items with no '__dict__'.

They'll be None, mostly.

Cheers,
mwh

--
C++ is a siren song. It *looks* like a HLL in which you ought to
be able to write an application, but it really isn't.
-- Alain Picard, comp.lang.lisp

Jul 18 '05 #39

>I wrote something and threw it up on my Python Bits page:
>
> http://www.musi-cal.com/~skip/python/

I get AttributeErrors when I try the super_reload function. Looks
like sys.modules has a bunch of items with no '__dict__'.

Michael> They'll be None, mostly.

What's the significance of an entry in sys.modules with a value of None?
That is, how did they get there and why are they there?

Skip

Jul 18 '05 #40

Skip Montanaro <sk**@pobox.com> writes:

>> >I wrote something and threw it up on my Python Bits page:
>> >
>> > http://www.musi-cal.com/~skip/python/
>>
>> I get AttributeErrors when I try the super_reload function. Looks
>> like sys.modules has a bunch of items with no '__dict__'.
Michael> They'll be None, mostly.

What's the significance of an entry in sys.modules with a value of None?
That is, how did they get there and why are they there?

Something to do with packags and things that could have been but
weren't relative imports, I think...

from distutils.core import setup
import sys
for k,v in sys.modules.items():

.... if v is None:
.... print k
....
distutils.distutils
distutils.getopt
encodings.encodings
distutils.warnings
distutils.string
encodings.codecs
encodings.exceptions
distutils.types
encodings.types
distutils.os
distutils.re
distutils.sys
distutils.copy

Cheers,
mwh

--
It's actually a corruption of "starling". They used to be carried.
Since they weighed a full pound (hence the name), they had to be
carried by two starlings in tandem, with a line between them.
-- Alan J Rosenthal explains "Pounds Sterling" on asr

Jul 18 '05 #41

On Sun, 14 Mar 2004 19:49:08 -0500, "John Roth"
<ne********@jhrothjr.com> wrote:

"David MacQuigg" <dm*@gain.com> wrote in message
news:rh********************************@4ax.com.. .

I *would* like to do something about numbers and strings and other
shared objects not getting updated, because that is going to be hard
to explain. Maybe we could somehow switch off the generation of
shared objects for modules in a 'debug' mode.
It doesn't matter if numbers and strings get updated. They're
immutable objects, so one copy of a number is as good as
another. In fact, that poses a bit of a problem since quite
a few of them are singletons. There's only one object that
is an integer 1 in the system, so if the new version changes
it to, say 2, and you go around and rebind all references to
1 to become references to 2, you might have a real mess
on your hands.

The immutability of numbers and strings is referring only to what you
can do via executable statements. If you use a text editor on the
original source code, clearly you can change any "immutable".

You do raise a good point, however, about the need to avoid changing
*all* references to a shared object. The ones that need to change are
those that were created via a reference to an earlier version of the
reloaded module.
On the other hand, if you don't rebind the ones that came out
of the original version of the module, you've got a different
mess on your hands.

True.

-- Dave

Jul 18 '05 #42

On Mon, 15 Mar 2004 05:49:58 -0600, Skip Montanaro <sk**@pobox.com>
wrote:

Dave> Maybe we could somehow switch off the generation of shared objects
Dave> for modules in a 'debug' mode.

You'd have to disable the integer free list. There's also code in
tupleobject.c to recognize and share the empty tuple. String interning
could be disabled as well. Everybody's ignored the gorilla in the room:
>>> sys.getrefcount(None)
1559

Implementation detail. ( half wink )
In general, I don't think that disabling immutable object sharing would be
worth the effort. Consider the meaning of module level integers. In my
experience they are generally constants and are infrequently changed once
set. Probably the only thing worth tracking down during a super reload
would be function, class and method definitions.

If you reload a module M1, and it has an attribute M1.x, which was
changed from '1' to '2', we want to change also any references that
may have been created with statements like 'x = M1.x', or 'from M1
import *' If we don't do this, reload() will continue to baffle and
frustrate new users. Typically, they think they have just one
variable 'x'

It's interesting to see how Ruby handles this problem.
http://userlinux.com/cgi-bin/wiki.pl?RubyPython I'm no expert on
Ruby, but it is my understanding that there *are* no types which are
implicitly immutable (no need for tuples vs lists, etc.). If you
*want* to make an object (any object) immutable, you do that
explicitly with a freeze() function.

I'm having trouble understanding the benefit of using shared objects
for simple numbers and strings. Maybe you can save a significant
amount of memory by having all the *system* modules share a common
'None' object, but when a user explicitly says 'M1.x = None', surely
we can afford a few bytes to provide a special None for that
reference. The benefit is that when you change None to 'something' by
editing and reloading M1, all references that were created via a
reference to M1.x will change automatically.

We should at least have a special 'debug' mode in which the hidden
sharing of objects is disabled for selected modules. You can always
explicitly share an object by simply referencing it, rather than
typing in a fresh copy.

x = "Here is a long string I want to share."
y = x
z = "Here is a long string I want to share."

In any mode, x and y will be the same object. In debug mode, we
allocate a little extra memory to make z a separate object from x, as
the user apparently intended.

If we do the updates for just certain types of objects, we will have a
non-intuitive set of rules that will be difficult for users to
understand. I would like to make things really simple and say:
"""
If you have a direct reference to an object in a reloaded module, that
reference will be updated. If the reference is created by some other
process (e.g. copying a string, or instantiation of a new object based
on a class in the reloaded module) then that reference will not be
updated. Only references to objects from the old module are updated.
The old objects are then garbage collected.
"""

We may have to pay a price in implementation cost and a little extra
storage to make things simple for the user.

-- Dave

Jul 18 '05 #43

John Roth

"David MacQuigg" <dm*@gain.com> wrote in message
news:tu********************************@4ax.com...

On Mon, 15 Mar 2004 05:49:58 -0600, Skip Montanaro <sk**@pobox.com>
wrote:

I'm having trouble understanding the benefit of using shared objects
for simple numbers and strings. Maybe you can save a significant
amount of memory by having all the *system* modules share a common
'None' object, but when a user explicitly says 'M1.x = None', surely
we can afford a few bytes to provide a special None for that
reference. The benefit is that when you change None to 'something' by
editing and reloading M1, all references that were created via a
reference to M1.x will change automatically.
I believe it's a performance optimization; the memory savings
are secondary.
We should at least have a special 'debug' mode in which the hidden
sharing of objects is disabled for selected modules. You can always
explicitly share an object by simply referencing it, rather than
typing in a fresh copy.
That would have rather disasterous concequences, since
some forms of comparison depend on there only being
one copy of the object.

-- Dave

Jul 18 '05 #44

Jeff Epler

On Mon, 15 Mar 2004 05:49:58 -0600, Skip Montanaro <sk**@pobox.com>
wrote:

You'd have to disable the integer free list. There's also code in
tupleobject.c to recognize and share the empty tuple. String interning
could be disabled as well. Everybody's ignored the gorilla in the room:
>>> sys.getrefcount(None) 1559

On Mon, Mar 15, 2004 at 10:15:33AM -0700, David MacQuigg wrote: Implementation detail. ( half wink )

I'd round that down from half to None, personally.

This is guaranteed to work:
x = None
y = None
assert x is y
by the following text in the language manual:
None
This type has a single value. There is a single object with
this value. This object is accessed through the built-in
name None. It is used to signify the absence of a value in
many situations, e.g., it is returned from functions that
don't explicitly return anything. Its truth value is false.
There are reams of code that rely on the object identity of None, so a
special debug mode where "x = <some literal>" makes x refer to something
that has a refcount of 1 will break code.

The 'is' guarantee applies to at least these built-in values:
None Ellipsis NotImplemented True False

The only problem I can see with reload() is that it doesn't do what you
want. But on the other hand, what reload() does is perfectly well
defined, and at least the avenues I've seen explored for "enhancing" it
look, well, like train wreck.

Jeff

Jul 18 '05 #45

In general, I don't think that disabling immutable object sharing
would be worth the effort. Consider the meaning of module level
integers. In my experience they are generally constants and are
infrequently changed once set. Probably the only thing worth
tracking down during a super reload would be function, class and
method definitions.

Dave> If you reload a module M1, and it has an attribute M1.x, which was
Dave> changed from '1' to '2', we want to change also any references
Dave> that may have been created with statements like 'x = M1.x', or
Dave> 'from M1 import *' If we don't do this, reload() will continue to
Dave> baffle and frustrate new users. Typically, they think they have
Dave> just one variable 'x'

Like I said, I think that sort of change will be relatively rare. Just tell
your users, "don't do that".

Dave> I'm having trouble understanding the benefit of using shared
Dave> objects for simple numbers and strings.

Can you say "space and time savings"? Ints are 12 bytes, strings are 24
bytes (plus the storage for the string), None is 8 bytes. It adds up. More
importantly, small ints, interned strings and None would constantly be
created and freed. The performance savings of sharing them are probably
much more important.

Finally, from a semantic viewpoint, knowing that None is defined by the
language to be a singleton object allows the more efficient "is" operator to
be used when testing objects against None for equality. If you allowed many
copies of that object that wouldn't work.

Dave> We should at least have a special 'debug' mode in which the hidden
Dave> sharing of objects is disabled for selected modules.

It might help with your problem but would change the semantics of the
language.

Skip

Jul 18 '05 #46

On Mon, 15 Mar 2004 11:33:04 -0600, Jeff Epler <je****@unpythonic.net>
wrote:

This is guaranteed to work:
x = None
y = None
assert x is y
by the following text in the language manual:
None
This type has a single value. There is a single object with
this value. This object is accessed through the built-in
name None. It is used to signify the absence of a value in
many situations, e.g., it is returned from functions that
don't explicitly return anything. Its truth value is false.
There are reams of code that rely on the object identity of None, so a
special debug mode where "x = <some literal>" makes x refer to something
that has a refcount of 1 will break code.

The 'is' guarantee applies to at least these built-in values:
None Ellipsis NotImplemented True False
This certainly complicates things. I *wish* they had not made this
"single object" statement. Why should how things are stored
internally matter to the user? We could have just as easily worked
with x == y, but now, as you say, it may be too late.

The same problem occurs with strings (some strings at least):
x = 'abcdefghighklmnop'
y = 'abcdefghighklmnop'
x is y True x = 'abc xyz'
y = 'abc xyz'
x is y

False

Since there is no simple way for the user to distinguish these cases,
it looks like we might break some code if the storage of equal objects
changes. The change would have to be for "debug" mode only, and for
only the modules the user specifically imports in debug mode. We
would need a big, bold warning that you should not use 'is'
comparisons in cases like the above, at least for any objects from
modules that are imported in debug mode.
The only problem I can see with reload() is that it doesn't do what you
want. But on the other hand, what reload() does is perfectly well
defined, and at least the avenues I've seen explored for "enhancing" it
look, well, like train wreck.

It's worse than just a misunderstanding. It's a serious limitation on
what we can do with editing a running program. I don't agree that
what it does now is well defined (at least not in the documentation).
The discussion in Learning Python is totally misleading. We should at
least update the description of the reload function in the Python
Library Reference. See the thread "Reload Confusion" for some
suggested text.

-- Dave

Jul 18 '05 #47

On Mon, 15 Mar 2004 12:28:50 -0600, Skip Montanaro <sk**@pobox.com>
wrote:

>> In general, I don't think that disabling immutable object sharing
>> would be worth the effort. Consider the meaning of module level
>> integers. In my experience they are generally constants and are
>> infrequently changed once set. Probably the only thing worth
>> tracking down during a super reload would be function, class and
>> method definitions.

Dave> If you reload a module M1, and it has an attribute M1.x, which was
Dave> changed from '1' to '2', we want to change also any references
Dave> that may have been created with statements like 'x = M1.x', or
Dave> 'from M1 import *' If we don't do this, reload() will continue to
Dave> baffle and frustrate new users. Typically, they think they have
Dave> just one variable 'x'

Like I said, I think that sort of change will be relatively rare.

I think wanting to change numbers in a reloaded module is very common.
Just tell your users, "don't do that".
The problem is the complexity of "that" which they can and cannot do.
Even renouned text authors don't seem to explain it clearly. I'm
opting now for "don't do anything" to try and make it simple. By that
I mean - Don't expect reloads to update anything but the reference to
the reloaded module itself. This is simple, just not very convenient.
Dave> I'm having trouble understanding the benefit of using shared
Dave> objects for simple numbers and strings.

Can you say "space and time savings"? Ints are 12 bytes, strings are 24
bytes (plus the storage for the string), None is 8 bytes. It adds up.
Maybe you can save a significant amount of memory by having all the
*system* modules share a common 'None' object, but when a user
explicitly says 'M1.x = None', surely we can afford a 8 bytes to
provide a special None for that reference.

I'm no expert on these implementation issues, but these numbers seem
small compared to the 512MB in a typical modern PC. I suppose there
are some rare cases where you need to create an array of millions of
references to a single constant. In those cases the debug mode may be
too much of a burden. In general, we ought to favor simplicity over
efficiency.
More importantly, small ints, interned strings and None would constantly be
created and freed. The performance savings of sharing them are probably
much more important.
Again, as a non-expert, this seems strange. The burden of comparing a
new object to what is already in memory, using an '==' type of
comparison, must be comparable to simply creating a new object.
Finally, from a semantic viewpoint, knowing that None is defined by the
language to be a singleton object allows the more efficient "is" operator to
be used when testing objects against None for equality. If you allowed many
copies of that object that wouldn't work.
We would lose some, assuming '==' for these small obects is slower
than 'is', but then you would not have to test '==' on a large number
of objects already in memory each time you define a new integer or
small string.
Dave> We should at least have a special 'debug' mode in which the hidden
Dave> sharing of objects is disabled for selected modules.

It might help with your problem but would change the semantics of the
language.

I assume you are referring to the symantics of 'is' when working with
small objects like None, 2, 'abc'. I agree, that is a problem for the
proposed debug mode. I don't see a way around it, other than warning
users not to expect modules imported in the debug mode to optimize the
sharing of small objects in memory. Use 'x == 2' rather than 'x is 2'
if you intend to use the debug mode.

-- Dave

Jul 18 '05 #48