change of random state when pyc created??

Alan Isaac

This may seem very strange, but it is true.
If I delete a .pyc file, my program executes with a different state!

In a single directory I have
module1 and module2.

module1 imports random and MyClass from module2.
module2 does not import random.

module1 sets a seed like this::

if __name__ == "__main__":
random.seed(314)
main()

I execute module1.py from the (Windows) shell.
I get a result, let's call it result1.
I execute it again. I get another result, say result2.
Running it again and again, I get result2.

Now I delete module2.pyc.
I execute module1.py from the shell.
I get result1.
I execute it again; I get result2.
From then on I get result2,
unless I delete module.pyc again,
in which case I once again get result1.

Can someone explain this to me?

Thank you,
Alan Isaac

May 5 '07 #1

Subscribe Post Reply

2414

Dustan

On May 4, 10:48 pm, "Alan Isaac" <ais...@american.eduwrote:

This may seem very strange, but it is true.
If I delete a .pyc file, my program executes with a different state!

In a single directory I have
module1 and module2.

module1 imports random and MyClass from module2.
module2 does not import random.

module1 sets a seed like this::

if __name__ == "__main__":
random.seed(314)
main()

I execute module1.py from the (Windows) shell.
I get a result, let's call it result1.
I execute it again. I get another result, say result2.
Running it again and again, I get result2.

Now I delete module2.pyc.
I execute module1.py from the shell.
I get result1.
I execute it again; I get result2.
From then on I get result2,
unless I delete module.pyc again,
in which case I once again get result1.

Can someone explain this to me?

Thank you,
Alan Isaac

I can't imagine why that would be, and I was unable to reproduce that
behavior, using Microsoft Windows XP and Python 2.5:

<module1.py>
import module2
import random

def main():
for i in range(10): print module2.aRandom()

if __name__ == '__main__':
random.seed(314)
main()
</module1.py>

<module2.py>
import random
print "module2 imported"

def aRandom():
return random.randrange(1000000)
</module2.py>
C:\Documents and Settings\DUSTAN\Desktop\apackage>module1.py
module2 imported
196431
111465
2638
628136
234231
207699
546775
449804
633844
179171

C:\Documents and Settings\DUSTAN\Desktop\apackage>module1.py
module2 imported
196431
111465
2638
628136
234231
207699
546775
449804
633844
179171

C:\Documents and Settings\DUSTAN\Desktop\apackage>module1.py
module2 imported
196431
111465
2638
628136
234231
207699
546775
449804
633844
179171

C:\Documents and Settings\DUSTAN\Desktop\apackage>module1.py
module2 imported
196431
111465
2638
628136
234231
207699
546775
449804
633844
179171

I deleted module2.pyc right before that last call.

May 5 '07 #2

Alan Isaac

I have documented this behavior
on two completely different systems
(Win 2000 and Win XP SP2), using Python 2.5.1.

It two modules where this happens,
as described before.
If it should not happen, there is a bug.
I am looking for potential explanation,
since I realize that finding bugs is unlikely.

Alan Isaac

May 5 '07 #3

John Machin

On May 6, 9:00 am, "Alan Isaac" <ais...@american.eduwrote:

I have documented this behavior
on two completely different systems
(Win 2000 and Win XP SP2), using Python 2.5.1.

You can't say that you have "documented" the behaviour when you
haven't published files that exhibit the alleged behaviour.

May 5 '07 #4

Alan Isaac

"John Machin" <sj******@lexicon.netwrote in message
news:11**********************@y80g2000hsf.googlegr oups.com...

You can't say that you have "documented" the behaviour when you
haven't published files that exhibit the alleged behaviour.

Fine. I have "observed" this behavior.
The files are not appropriate for posting.
I do not yet have a "minimum" case.
But surely I am not the first to notice this!
Alan Isaac
PS I'll send you the files off list.

May 5 '07 #5

John Machin

On May 5, 1:48 pm, "Alan Isaac" <ais...@american.eduwrote:

This may seem very strange, but it is true.
If I delete a .pyc file, my program executes with a different state!

In a single directory I have
module1 and module2.

module1 imports random and MyClass from module2.

That's rather ambiguous. Do you mean
(a) module1 imports random and (MyClass from module2)
or
(b) module1 imports (random and MyClass) from module2

module2 does not import random.

This statement would *appear* to rule out option (b) but appearances
can be deceptive :-)

It's a bit of a worry that you call the first file "module1" and not
"the_script". Does module2 import module1, directly or indirectly?

>
module1 sets a seed like this::

if __name__ == "__main__":
random.seed(314)
main()

I execute module1.py from the (Windows) shell.
I get a result, let's call it result1.
I execute it again. I get another result, say result2.
Running it again and again, I get result2.

Stop right there. Never mind what happens when you delete module2.pyc.
Should you not expect to get the same result each time? Is that not
the point of setting a constant seed each time you run the script?
====>>Problem 1.

>
Now I delete module2.pyc.
I execute module1.py from the shell.
I get result1.
I execute it again; I get result2.
From then on I get result2,
unless I delete module.pyc again,
in which case I once again get result1.

Can someone explain this to me?

Thank you,
Alan Isaac

Compiling module2 is causing code to be executed that probably
shouldn't be executed. ===>>Problem 2.

With all due respect to your powers of description :-) no, it can't be
explained properly, without seeing the contents of the source files. I
strongly suggest that if you continue to experience Problem1 and/or
Problem 2, you cut your two files down to the bare minima and post
them here.

Meanwhile, my deja-vu detector is kicking in ...

uh-huh (1), from 25 April:
===
%%%%% test2.py %%%%%%%%%%%%%
from random import seed
seed(314)
class Trivial:
pass
===
Is module2 (still) doing that?
Is module1 importing itself (directly or indirectly)?

uh-huh (2), the long thread about relative imports allegedly being
broken ...

It appears to me that you need to divorce the two concepts "module"
and "script" in your mind.

Modules when executed should produce only exportables: classes,
functions, NAMED_CONSTANTS, etc. It is OK to do things like process
the easier-to-create
_ds = """\
foo 1
bar 42
zot 666"""
into the easier-to-use
USEFUL_DICT = {'foo': 1, 'bar': 42, zot: 666}
but not to change global state.

Scripts which use functions etc from a module or package should be
independent of the module/package such that they don't need anything
more complicated than simple importing of the module/package. The
notion of inspecting the script's path to derive the module/package
path and then stuffing that into sys.paths is mind boggling. Are
module1/script1 and module2 parts of a package?

Here's a suggestion for how you should structure scripts:

def main():
# All productive code is inside a function to take advantage
# of access to locals being faster than access to globals
import mymodule
mymodule.do_something()
if __name__ == "__main__":
main()
else:
raise Exception("Attempt to import script containing nothing
importable")

and your modules should *start* with:
if __name__ == "__main__":
raise Exception("Attempt to execute hopefully-pure module as a
script")

HTH,
John

May 5 '07 #6

Alan Isaac

"John Machin" <sj******@lexicon.netwrote in message
news:11**********************@l77g2000hsb.googlegr oups.com...

(a) module1 imports random and (MyClass from module2)

Right.

It's a bit of a worry that you call the first file "module1" and not
"the_script". Does module2 import module1, directly or indirectly?

No.
I call a module any file meant to be imported by others.
Many of my modules include a "main" function,
which allow the module to be executed as a script.
I do not think this is unusual, even as terminology.

Should you not expect to get the same result each time? Is that not
the point of setting a constant seed each time you run the script?

Yes. That is the problem.
If I delete module2.pyc,
I do not get the same result.

With all due respect to your powers of description :-) no, it can't be
explained properly, without seeing the contents of the source files.

I sent them to you.
What behavior did you see?

from random import seed
seed(314)
class Trivial:
pass
===
Is module2 ... doing that?
Is module1 importing itself (directly or indirectly)?

No.

Separate issue
==============

Here's a suggestion for how you should structure scripts:

def main():
# All productive code is inside a function to take advantage
# of access to locals being faster than access to globals
import mymodule
mymodule.do_something()
if __name__ == "__main__":
main()
else:
raise Exception("Attempt to import script containing nothing
importable")

and your modules should *start* with:
if __name__ == "__main__":
raise Exception("Attempt to execute hopefully-pure module as a
script")

I'm not going to call this a bad practice, since it has clear virtues.
I will say that it does not seem to be a common practice, although that
may be my lack of exposure to other's code. And it still does not
address the common need of playing with a "package in progress"
or a "package under consideration" without installing it.

Cheers,
Alan Isaac

May 6 '07 #7

Dustan

On May 5, 6:30 pm, "Alan Isaac" <ais...@american.eduwrote:

"John Machin" <sjmac...@lexicon.netwrote in message

news:11**********************@y80g2000hsf.googlegr oups.com...

You can't say that you have "documented" the behaviour when you
haven't published files that exhibit the alleged behaviour.

Fine. I have "observed" this behavior.
The files are not appropriate for posting.
I do not yet have a "minimum" case.
But surely I am not the first to notice this!
Alan Isaac
PS I'll send you the files off list.

I got the files and tested them, and indeed got different results
depending on whether or not there was a pyc file. I haven't looked at
the source files in great detail yet, but I will. I would certainly
agree that there's a bug going on here; we just need to narrow down
the problem (ie come up with a "minimum" case).

May 6 '07 #8

Steven D'Aprano

On Sun, 06 May 2007 00:20:04 +0000, Alan Isaac wrote:

>Should you not expect to get the same result each time? Is that not
the point of setting a constant seed each time you run the script?

Yes. That is the problem.
If I delete module2.pyc,
I do not get the same result.

I think you have missed what John Machin is pointing out. According to
your original description, you get different results even if you DON'T
delete module2.pyc.

According to your original post, you get the _same_ behaviour the first
time you run the script, regardless of the pyc file being deleted or not.
You wrote:

[quote]
module1 sets a seed like this::

if __name__ == "__main__":
random.seed(314)
main()

I execute module1.py from the (Windows) shell.
I get a result, let's call it result1.
I execute it again. I get another result, say result2.
Running it again and again, I get result2.
[end quote]

So, with module2.pyc file existing, you get result1 the first time you
execute module1.py, and then you get result2 every time from then onwards.

How is that different from what you wrote next?

[quote]
Now I delete module2.pyc.
I execute module1.py from the shell.
I get result1.
I execute it again; I get result2.
From then on I get result2,
unless I delete module.pyc again,
in which case I once again get result1.
[end quote]

You get the same behaviour with or without module2.pyc: the first run of
the script gives different results from subsequent runs. You can reset
that first run by deleting module2.pyc.

I'm still perplexed how this is possible, but now I'm more perplexed.

If you want to send me the modules, I will have a look at them as well.
Many eyes make for shallow bugs...

--
Steven.

May 6 '07 #9

Dustan

On May 6, 1:00 am, Steven D'Aprano
<s...@REMOVE.THIS.cybersource.com.auwrote:[quote]
On Sun, 06 May 2007 00:20:04 +0000, Alan Isaac wrote:

Should you not expect to get the same result each time? Is that not
the point of setting a constant seed each time you run the script?

Yes. That is the problem.
If I delete module2.pyc,
I do not get the same result.

module1 sets a seed like this::

if __name__ == "__main__":
random.seed(314)
main()

I execute module1.py from the (Windows) shell.
I get a result, let's call it result1.
I execute it again. I get another result, say result2.
Running it again and again, I get result2.
[end quote]

So, with module2.pyc file existing, you get result1 the first time you
execute module1.py, and then you get result2 every time from then onwards.

Umm... no.

module2.pyc is created by the first run.
[quote]
How is that different from what you wrote next?

Now I delete module2.pyc.
I execute module1.py from the shell.
I get result1.
I execute it again; I get result2.
From then on I get result2,
unless I delete module.pyc again,
in which case I once again get result1.
[end quote]

You get the same behaviour with or without module2.pyc: the first run of
the script gives different results from subsequent runs. You can reset
that first run by deleting module2.pyc.

I'm still perplexed how this is possible, but now I'm more perplexed.

If you want to send me the modules, I will have a look at them as well.
Many eyes make for shallow bugs...

--
Steven.

May 6 '07 #10

John Machin

On May 6, 9:41 pm, Dustan <DustanGro...@gmail.comwrote:[quote]
On May 6, 1:00 am, Steven D'Aprano

<s...@REMOVE.THIS.cybersource.com.auwrote:

On Sun, 06 May 2007 00:20:04 +0000, Alan Isaac wrote:
>Should you not expect to get the same result each time? Is that not
>the point of setting a constant seed each time you run the script?

Yes. That is the problem.
If I delete module2.pyc,
I do not get the same result.

I think you have missed what John Machin is pointing out. According to
your original description, you get different results even if you DON'T
delete module2.pyc.

According to your original post, you get the _same_ behaviour the first
time you run the script, regardless of the pyc file being deleted or not.

You wrote:

module1 sets a seed like this::

if __name__ == "__main__":
random.seed(314)
main()

I execute module1.py from the (Windows) shell.
I get a result, let's call it result1.
I execute it again. I get another result, say result2.
Running it again and again, I get result2.
[end quote]

So, with module2.pyc file existing, you get result1 the first time you
execute module1.py, and then you get result2 every time from then onwards.

Umm... no.

module2.pyc is created by the first run.

Yes, I've realised that too.

Some news:
(1) Alan has sent Dustan and me a second smaller version of the two
files.
I'll forward them on to Steven -- they're now called test.py and
test1.py, but I'll continue to call them module[12].py
(2) The problem is definitely reproducible -- whether or not
module2.pyc has been deleted or retained from the previous run is
affecting the results. [Windows XP SP2; Python 2.5.1]
(3) module2.py appears to me not to be guilty of causing any changes
in state; it contains only rather inocuous functions and classes.
(4) I have put
print random.getstate()
before and after the call to main() in the executed script
(module1.py). Diffing the stdout of a no-pyc run and a with-pyc run
shows differences in Alan's output but NO DIFFERENCES in either the
"before" or the "after" random.getstate() output. Looks like the
problem is nothing to do with the random module.
(5) I have backported the files to Python 2.4 by replacing the use of
defaultdict in module2.py with explicit "if key in the_dict" code --
now the problem is reproducible with Python 2.4.3 as well as with
2.5.1.
(6) I've changed the 'from module2 import foo, bar, zot; foo()" to use
the "import module2; module2.foo()" style -- no effect; still has the
problem.

Cheers,
John

May 6 '07 #11

Alan Isaac

"Steven D'Aprano" <st***@REMOVE.THIS.cybersource.com.auwrote in message
news:pa****************************@REMOVE.THIS.cy bersource.com.au...

If you want to send me the modules, I will have a look at them as well.
Many eyes make for shallow bugs...

Dustan and John Machin have confirmed the
apparent bug, and I have sent you the files.
Explanation welcome!!

Cheers,
Alan Isaac

May 8 '07 #12

Steven D'Aprano

On Tue, 08 May 2007 02:12:27 +0000, Alan Isaac wrote:

"Steven D'Aprano" <st***@REMOVE.THIS.cybersource.com.auwrote in
message
news:pa****************************@REMOVE.THIS.cy bersource.com.au...
>If you want to send me the modules, I will have a look at them as well.
Many eyes make for shallow bugs...

Dustan and John Machin have confirmed the apparent bug, and I have sent
you the files. Explanation welcome!!

My testing suggests the bug is *not* to do with pyc files at all. I'm
getting different results when running the files, even when the directory
is read-only (and therefore no pyc files can be created).

My results suggest that setting the seed to the same value does NOT give
identical results, *even though* the random number generator is giving
the same results.

So I think we can discount the issue being anything to do with either
the .pyc files or the random number generator.
--
Steven.

May 8 '07 #13

Alan Isaac

"Steven D'Aprano" <st****@REMOVE.THIS.cybersource.com.auwrote in message
news:pa*********************@REMOVE.THIS.cybersour ce.com.au...

My testing suggests the bug is *not* to do with pyc files at all. I'm
getting different results when running the files, even when the directory
is read-only (and therefore no pyc files can be created).

My results suggest that setting the seed to the same value does NOT give
identical results, *even though* the random number generator is giving
the same results.

So I think we can discount the issue being anything to do with either
the .pyc files or the random number generator.

I do not know how Python handles your use of a readonly directory.
What I have seen is:

- when a test1.pyc file is present, I always get the
same outcome (result1)
- when a test1.pyc file is NOT present, I always get
the same outcome (result2)
- the two outcomes are different (result1 != result2)

Do you see something different than this if you run the
test as I suggested? If not, how can in not involve the
..pyc file (in some sense)?

Cheers,
Alan Isaac

May 8 '07 #14

Gabriel Genellina

En Tue, 08 May 2007 14:59:27 -0300, Alan Isaac <ai****@american.edu>
escribió:

What I have seen is:

- when a test1.pyc file is present, I always get the
same outcome (result1)
- when a test1.pyc file is NOT present, I always get
the same outcome (result2)
- the two outcomes are different (result1 != result2)

I've logged all Random calls (it appears to be only one shuffle call,
after the initial seed) and in both cases they get the same numbers. So
the program always starts with the same "shuffled" values.

Perhaps there is a tiny discrepancy in the marshal representation of some
floating point values. When there is no .pyc, Python parses the literal
from source; when a .pyc is found, Python loads the value from there; they
could be slightly different.
I'll investigate further... tomorrow.

--
Gabriel Genellina

May 9 '07 #15

Peter Otten

Alan Isaac wrote:

This may seem very strange, but it is true.
If I delete a .pyc file, my program executes with a different state!

Can someone explain this to me?

There is nothing wrong with the random module -- you get the same numbers on
every run. When there is no pyc-file Python uses some RAM to create it and
therefore your GridPlayer instances are located in different memory
locations and get different hash values. This in turn affects the order in
which they occur when you iterate over the GridPlayer.players_played set.

Here is a minimal example:

import test # sic

class T:
def __init__(self, name):
self.name = name
def __repr__(self):
return "T(name=%r)" % self.name

if __name__ == "__main__":
print set(T(i) for i in range(4))

$ python2.5 test.py
set([T(name=2), T(name=1), T(name=0), T(name=3)])
$ python2.5 test.py
set([T(name=3), T(name=1), T(name=0), T(name=2)])
$ python2.5 test.py
set([T(name=3), T(name=1), T(name=0), T(name=2)])
$ rm test.pyc
$ python2.5 test.py
set([T(name=2), T(name=1), T(name=0), T(name=3)])

Peter

May 9 '07 #16

Alan Isaac

"Peter Otten" <__*******@web.dewrote in message
news:f1*************@news.t-online.com...

Alan Isaac wrote:
There is nothing wrong with the random module -- you get the same numbers

every run. When there is no pyc-file Python uses some RAM to create it and
therefore your GridPlayer instances are located in different memory
locations and get different hash values. This in turn affects the order in
which they occur when you iterate over the GridPlayer.players_played set.

Thanks!!
This also explains Steven's results.

If I sort the set before iterating over it,
the "anomaly" disappears.

This means that currently the use of sets
(and, I assume, dictionaries) as iterators
compromises replicability. Is that a fair
statement?

For me (and apparently for a few others)
this was a very subtle problem. Is there
a warning anywhere in the docs? Should
there be?

Thanks again!!

Alan Isaac

May 9 '07 #17

Diez B. Roggisch

Alan Isaac wrote:

>
"Peter Otten" <__*******@web.dewrote in message
news:f1*************@news.t-online.com...
>Alan Isaac wrote:
There is nothing wrong with the random module -- you get the same numbers
on
>every run. When there is no pyc-file Python uses some RAM to create it
and therefore your GridPlayer instances are located in different memory
locations and get different hash values. This in turn affects the order
in which they occur when you iterate over the GridPlayer.players_played
set.

Thanks!!
This also explains Steven's results.

If I sort the set before iterating over it,
the "anomaly" disappears.

This means that currently the use of sets
(and, I assume, dictionaries) as iterators
compromises replicability. Is that a fair
statement?

Yes.

For me (and apparently for a few others)
this was a very subtle problem. Is there
a warning anywhere in the docs? Should
there be?

Not really, but that depends on what you know about the concept of sets and
maps as collections of course.

The contract for sets and dicts doesn't imply any order whatsoever. Which is
essentially the reason why

set(xrange(10))[0]

doesn't exist, and quite a few times cries for an ordered dictionary as part
of the standard libraries was made.

Diez

May 9 '07 #18

Alan G Isaac

Diez B. Roggisch wrote:

Not really, but that depends on what you know about the concept of sets and
maps as collections of course.

The contract for sets and dicts doesn't imply any order whatsoever. Which is
essentially the reason why

set(xrange(10))[0]

doesn't exist, and quite a few times cries for an ordered dictionary as part
of the standard libraries was made.

It seems to me that you are missing the point,
but maybe I am missing your point.

The question of whether a set or dict guarantees
some order seems quite different from the question
of whether rerunning an **unchanged program** yields the
**unchanged results**. The latter question is the question
of replicability.

Again I point out that some sophisticated users
(among which I am not numbering myself) did not
see into the source of this "anomaly". This
suggests that an explicit warning is warranted.

Cheers,
Alan Isaac

PS I know ordered dicts are under discussion;
what about ordered sets?

May 9 '07 #19

Robert Kern

Alan G Isaac wrote:

Diez B. Roggisch wrote:
>Not really, but that depends on what you know about the concept of sets and
maps as collections of course.

The contract for sets and dicts doesn't imply any order whatsoever. Which is
essentially the reason why

set(xrange(10))[0]

doesn't exist, and quite a few times cries for an ordered dictionary as part
of the standard libraries was made.

It seems to me that you are missing the point,
but maybe I am missing your point.

The question of whether a set or dict guarantees
some order seems quite different from the question
of whether rerunning an **unchanged program** yields the
**unchanged results**. The latter question is the question
of replicability.

Again I point out that some sophisticated users
(among which I am not numbering myself) did not
see into the source of this "anomaly". This
suggests that an explicit warning is warranted.

http://docs.python.org/lib/typesmapping.html
"""
Keys and values are listed in an arbitrary order which is non-random, varies
across Python implementations, and depends on the dictionary's history of
insertions and deletions.
"""

The sets documentation is a bit less explicit, though.

http://docs.python.org/lib/types-set.html
"""
Like other collections, sets support x in set, len(set), and for x in set. Being
an unordered collection, sets do not record element position or order of insertion.
"""

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

May 9 '07 #20

Alan G Isaac

Robert Kern wrote:

http://docs.python.org/lib/typesmapping.html
"""
Keys and values are listed in an arbitrary order which is non-random, varies
across Python implementations, and depends on the dictionary's history of
insertions and deletions.
"""

Even this does not tell me that if I use a specified implementation
that my results can vary from run to run. That is, it still does
not communicate that rerunning an *unchanged* program with an
*unchanged* implementation can produce a change in results.

Alan Isaac

May 9 '07 #21

Chris Mellon

On 5/9/07, Alan G Isaac <ai****@american.eduwrote:

Robert Kern wrote:
http://docs.python.org/lib/typesmapping.html
"""
Keys and values are listed in an arbitrary order which is non-random, varies
across Python implementations, and depends on the dictionary's history of
insertions and deletions.
"""

Even this does not tell me that if I use a specified implementation
that my results can vary from run to run. That is, it still does
not communicate that rerunning an *unchanged* program with an
*unchanged* implementation can produce a change in results.

Well, now you know. I'm not sure why you expect any given program to
be idempotent unless you take specific measures to ensure that anyway.

May 9 '07 #22

Robert Kern

Alan G Isaac wrote:

Robert Kern wrote:
>http://docs.python.org/lib/typesmapping.html
"""
Keys and values are listed in an arbitrary order which is non-random, varies
across Python implementations, and depends on the dictionary's history of
insertions and deletions.
"""

Even this does not tell me that if I use a specified implementation
that my results can vary from run to run. That is, it still does
not communicate that rerunning an *unchanged* program with an
*unchanged* implementation can produce a change in results.

The last clause does tell me that.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

May 9 '07 #23

Carsten Haese

On Wed, 2007-05-09 at 15:35 -0500, Alan G Isaac wrote:

Robert Kern wrote:
http://docs.python.org/lib/typesmapping.html
"""
Keys and values are listed in an arbitrary order which is non-random, varies
across Python implementations, and depends on the dictionary's history of
insertions and deletions.
"""

Even this does not tell me that if I use a specified implementation
that my results can vary from run to run. That is, it still does
not communicate that rerunning an *unchanged* program with an
*unchanged* implementation can produce a change in results.

It doesn't say that rerunning the program won't produce a change in
results. It doesn't say that the order depends *only* on those factors
in a deterministic and reproducible manner.

The documentation shouldn't be expected to list every little thing that
might change the order of keys in a dictionary. The documentation does
say explicitly what *is* guaranteed: Order of keys is preserved as long
as no intervening modifications happen to the dictionary. Tearing down
the interpreter, starting it back up, and rebuilding the dictionary from
scratch is very definitely an intervening modification.

Regards,

--
Carsten Haese
http://informixdb.sourceforge.net

May 9 '07 #24

Alan Isaac

>Robert Kern wrote:

>>http://docs.python.org/lib/typesmapping.html
"""
Keys and values are listed in an arbitrary order which is non-random,

varies

>>across Python implementations, and depends on the dictionary's history

>>insertions and deletions.
"""

Alan G Isaac wrote:
>Even this does not tell me that if I use a specified implementation
that my results can vary from run to run. That is, it still does
not communicate that rerunning an *unchanged* program with an
*unchanged* implementation can produce a change in results.

"Robert Kern" <ro*********@gmail.comwrote in message
news:ma***************************************@pyt hon.org...

The last clause does tell me that.

1. About your reading of the current language:
I believe you, of course, but can you tell me **how** it tells you that?
To be concrete, let us suppose parallel language were added to
the description of sets. What about that language should allow
me to anticipate Peter's example (in this thread)?

2. About possibly changing the docs:
You are much more sophisticated than ordinary users.
Did this thread not demonstrate that even sophisticated users
do not see into this "implication" immediately? Replicability
of results is a huge deal in some circles. I think the docs
for sets and dicts should include a red flag: do not use
these as iterators if you want replicable results.
(Side note to Carsten: this does not require listing "every little thing".)

Cheers,
Alan Isaac

May 10 '07 #25

Robert Kern

Alan Isaac wrote:

>>Robert Kern wrote:
http://docs.python.org/lib/typesmapping.html
"""
Keys and values are listed in an arbitrary order which is non-random,

varies

>>>across Python implementations, and depends on the dictionary's history

of

>>>insertions and deletions.
"""

>Alan G Isaac wrote:
>>Even this does not tell me that if I use a specified implementation
that my results can vary from run to run. That is, it still does
not communicate that rerunning an *unchanged* program with an
*unchanged* implementation can produce a change in results.

"Robert Kern" <ro*********@gmail.comwrote in message
news:ma***************************************@pyt hon.org...
>The last clause does tell me that.

1. About your reading of the current language:
I believe you, of course, but can you tell me **how** it tells you that?
To be concrete, let us suppose parallel language were added to
the description of sets. What about that language should allow
me to anticipate Peter's example (in this thread)?

Actually, the root cause of Peter's specific example is the fact that the
default implementation of __hash__() and __eq__() rely on identity comparisons.
Two separate invocations of the same script give different objects by identity
and thus the "history of insertions and deletions" is different.

2. About possibly changing the docs:
You are much more sophisticated than ordinary users.
Did this thread not demonstrate that even sophisticated users
do not see into this "implication" immediately?

Well, if you had a small test case that demonstrated the problem, we would have.
Your example was large, complicated, and involved other semi-deterministic red
herrings (the PRNG). It's quite easy to see the problem with Peter's example.

Replicability
of results is a huge deal in some circles. I think the docs
for sets and dicts should include a red flag: do not use
these as iterators if you want replicable results.
(Side note to Carsten: this does not require listing "every little thing".)

They do. They say very explicitly that they are not ordered and that the
sequence of iteration should not be relied upon. The red flags are there.

But I'm not going to stop you from writing up something that's even more explicit.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

May 10 '07 #26

Carsten Haese

On Thu, 2007-05-10 at 01:25 +0000, Alan Isaac wrote:

Did this thread not demonstrate that even sophisticated users
do not see into this "implication" immediately?

Knowing that maps don't have reproducible ordering is one thing.
Realizing that that's the cause of the problem that's arbitrarily and
wrongly attributed to the 'random' module, in a piece of code that's not
posted to the public, and presumably not trimmed down to the shortest
possible example of the problem, is quite another.

I'll venture the guess that most Python programmers with a modicum of
experience will, when asked point blank if it's safe to rely on a
dictionary to be iterated in a particular order, answer no.

Replicability
of results is a huge deal in some circles.

Every software engineer wants their results to be replicable. Software
engineers also know that they can only expect their results to be
replicable if they use deterministic functions. You wouldn't expect
time.time() to return the same result just because you're running the
same code, would you?

I think the docs
for sets and dicts should include a red flag: do not use
these as iterators if you want replicable results.

It does, at least for dicts: "Keys and values are listed in an arbitrary
order." If this wording is not present for sets, something to this
effect should be added.

Regards,

--
Carsten Haese
http://informixdb.sourceforge.net

May 10 '07 #27

Steven D'Aprano

On Wed, 09 May 2007 16:01:02 -0500, Robert Kern wrote:

Alan G Isaac wrote:
>Robert Kern wrote:
>>http://docs.python.org/lib/typesmapping.html
"""
Keys and values are listed in an arbitrary order which is non-random, varies
across Python implementations, and depends on the dictionary's history of
insertions and deletions.
"""

Even this does not tell me that if I use a specified implementation
that my results can vary from run to run. That is, it still does
not communicate that rerunning an *unchanged* program with an
*unchanged* implementation can produce a change in results.

The last clause does tell me that.

Actually it doesn't. If you run a program twice, with the same inputs,
and no other source of randomness (or at most have pseudo-randomness
starting with the same seed), then the dictionary will have the same
history of insertions and deletions from run to run.

Go back to Peter Otten's diagnosis of the issue:

"... your GridPlayer instances are located in different memory locations
and get different hash values. This in turn affects the order in which
they occur when you iterate over the GridPlayer.players_played set."

There is nothing in there about the dictionary having a different history
of insertions and deletions. It is having the same insertions and
deletions each run, but the items being inserted are located at different
memory locations, and _that_ changes their hash value and hence the order
they occur in when you iterate over the set.

That's quite a subtle thread to follow, and with all respect Robert, it's
easy to say it is obvious in hindsight, but I didn't notice you solving
the problem in the first place. Maybe you would have, if you had tried...
and maybe you would have scratched your head too. Who can tell?

As Carsten Haese says in another post:

"The documentation shouldn't be expected to list every little thing that
might change the order of keys in a dictionary. The documentation does say
explicitly what *is* guaranteed: Order of keys is preserved as long as no
intervening modifications happen to the dictionary. Tearing down the
interpreter, starting it back up, and rebuilding the dictionary from
scratch is very definitely an intervening modification."

That's all very true, but nevertheless it is a significant gotcha. It is
natural to expect two runs of any program to give the same result if there
are (1) no random numbers involved; (2) the same input data; (3) and no
permanent storage from run to run. One doesn't normally expect the output
of a well-written, bug-free program to depend on the memory location of
objects. And that's the gotcha -- with dicts and sets, they can.

--
Steven.

May 10 '07 #28

Steven D'Aprano

On Wed, 09 May 2007 21:18:25 -0500, Robert Kern wrote:

Actually, the root cause of Peter's specific example is the fact that the
default implementation of __hash__() and __eq__() rely on identity comparisons.
Two separate invocations of the same script give different objects by identity
and thus the "history of insertions and deletions" is different.

The history is the same. The objects inserted are the same (by equality).
The memory address those objects are located at is different.

Would you expect that "hello world".find("w") should depend on the address
of the string "w"? No, of course not. Programming in a high level language
like Python, we hope to never need to think about memory addresses. And
that's the gotcha.

--
Steven.

May 10 '07 #29

Alan Isaac

"Robert Kern" <ro*********@gmail.comwrote in message
news:ma***************************************@pyt hon.org...

Actually, the root cause of Peter's specific example is the fact that the
default implementation of __hash__() and __eq__() rely on identity

comparisons.

Two separate invocations of the same script give different objects by

identity

and thus the "history of insertions and deletions" is different.

OK. Thank you.
Alan

May 10 '07 #30

Alan Isaac

"Carsten Haese" <ca*****@uniqsys.comwrote in message
news:ma***************************************@pyt hon.org...

Knowing that maps don't have reproducible ordering is one thing.
Realizing that that's the cause of the problem that's arbitrarily and
wrongly attributed to the 'random' module, in a piece of code that's not
posted to the public, and presumably not trimmed down to the shortest
possible example of the problem, is quite another.

There is no reason to be unfriendly about this.
I posted an observation about my code behavior
and my best understanding of it. I asked for an
explanation and did not assert a bug, although when
someone doubted that the presence or absence of the
..pyc file mattered for the results I said that *if* it should
not matter *then* there was a bug. I offered the code
to all that asked for it. I did not post it **because**
I had not adequately isolated the problem. (But indeed,
I was not isolating the problem due to misconceptions.)

I'll venture the guess that most Python programmers with a modicum of
experience will, when asked point blank if it's safe to rely on a
dictionary to be iterated in a particular order, answer no.

Again, that misses the point. This is clearly documented.
I would have said the same thing: no, that's not safe. But
the question is whether the same people will be surprised when
*unchanged* code rerun with an *unchanged* implementation
produces *changed* results. I do not see how a reader of
this thread cannot conclude that yes, even some sophisticated
users (who received my code) will be surprised. The docs
should not be useful only to the most sophisticated users.

It does, at least for dicts: "Keys and values are listed in an arbitrary
order." If this wording is not present for sets, something to this
effect should be added.

Even Robert did not claim that *that* phrase was adequate.
I note that you cut off "which is non-random"!

Alan Isaac

May 10 '07 #31

Robert Kern

Steven D'Aprano wrote:

On Wed, 09 May 2007 21:18:25 -0500, Robert Kern wrote:

>Actually, the root cause of Peter's specific example is the fact that the
default implementation of __hash__() and __eq__() rely on identity comparisons.
Two separate invocations of the same script give different objects by identity
and thus the "history of insertions and deletions" is different.

The history is the same. The objects inserted are the same (by equality).

No, they *were* different by equality (identity being the default implementation
equality that was not overridden in either Peter's code nor Alan's).

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

May 10 '07 #32

Carsten Haese

On Thu, 10 May 2007 02:50:49 GMT, Alan Isaac wrote

"Carsten Haese" <ca*****@uniqsys.comwrote in message
news:ma***************************************@pyt hon.org...
Knowing that maps don't have reproducible ordering is one thing.
Realizing that that's the cause of the problem that's arbitrarily and
wrongly attributed to the 'random' module, in a piece of code that's not
posted to the public, and presumably not trimmed down to the shortest
possible example of the problem, is quite another.

There is no reason to be unfriendly about this.

I did not mean this to be unfriendly. I'm sorry if you got that impression. I
was simply pointing out all the ways in which you made it difficult for the
community to explain your problem.

I'll venture the guess that most Python programmers with a modicum of
experience will, when asked point blank if it's safe to rely on a
dictionary to be iterated in a particular order, answer no.

Again, that misses the point. This is clearly documented.
I would have said the same thing: no, that's not safe. But
the question is whether the same people will be surprised when
*unchanged* code rerun with an *unchanged* implementation
produces *changed* results.

That only means that a program can behave non-deterministically if you're not
carefully restricting it to functions that are guaranteed to be deterministic.
No experienced software engineer, whether they are experienced in Python or
some other programming language should be surprised by this notion.

I don't think that the cause of non-determinism in your case was exceptionally
subtle, you just made it harder to find.

The docs
should not be useful only to the most sophisticated users.

Please feel free to suggest specific wording changes to make the documentation
more useful.

It does, at least for dicts: "Keys and values are listed in an arbitrary
order." If this wording is not present for sets, something to this
effect should be added.

Even Robert did not claim that *that* phrase was adequate.
I note that you cut off "which is non-random"!

In my opinion, that phrase is adequate. I did cut off the non-random part
because it's irrelevant. Non-random doesn't mean deterministic.

Regards,

Carsten.

May 10 '07 #33

Carsten Haese

On Thu, 10 May 2007 12:46:05 +1000, Steven D'Aprano wrote

It is natural to expect two runs of any program to give the same
result if there are (1) no random numbers involved; (2) the same
input data; (3) and no permanent storage from run to run.

Which of those three categories does time.time() fall into? What about
id("hello")?

-Carsten

May 10 '07 #34

Steven D'Aprano

On Thu, 10 May 2007 01:06:33 -0400, Carsten Haese wrote:

On Thu, 10 May 2007 12:46:05 +1000, Steven D'Aprano wrote
>It is natural to expect two runs of any program to give the same
result if there are (1) no random numbers involved; (2) the same
input data; (3) and no permanent storage from run to run.

Which of those three categories does time.time() fall into? What about
id("hello")?

I didn't say there were no exceptions to the heuristic "expect any
computer program to do the same thing on subsequent runs". I said it was a
natural expectation.

Obviously one of the differences between a naive programmer and a
sophisticated programmer is that the sophisticated programmer has learnt
more exceptions to the rule.

And that's why I have described this behaviour as a gotcha, not as a bug
or a mis-feature or anything else.
--
Steven.

May 10 '07 #35

Steven D'Aprano

On Wed, 09 May 2007 23:10:19 -0500, Robert Kern wrote:

Steven D'Aprano wrote:
>On Wed, 09 May 2007 21:18:25 -0500, Robert Kern wrote:

>>Actually, the root cause of Peter's specific example is the fact that the
default implementation of __hash__() and __eq__() rely on identity comparisons.
Two separate invocations of the same script give different objects by identity
and thus the "history of insertions and deletions" is different.

The history is the same. The objects inserted are the same (by equality).

No, they *were* different by equality (identity being the default implementation
equality that was not overridden in either Peter's code nor Alan's).

Ah yes, you are right in the sense that Python's notion of equality for
class instances is to fall back on identity by default.

But in the vernacular human sense, an instance X with the same state as an
instance Y is "equal", despite being at another memory address. I was
using equality in the sense that two copies of the same edition of a book
are the same, despite being in different places.

For the record, and for the avoidance of all confusion, I'm not suggesting
that Python's default behaviour is "wrong" or even "bad", merely pointing
out to all those wise in hindsight that the behaviour was extremely
puzzling for the reasons I've given. But you can be sure that I'll never
forget this lesson :)

--
Steven.

May 10 '07 #36

Alan Isaac

"Carsten Haese" <ca*****@uniqsys.comwrote in message
news:ma***************************************@pyt hon.org...

I was simply pointing out all the ways in which you made it difficult for

the

community to explain your problem.

And without that community, I would still not have a clue.
Thanks to all!

Please feel free to suggest specific wording changes to make the

documentation

more useful.

I'm sure my first pass will be flawed, but here goes:

http://docs.python.org/lib/typesmapping.html:
to footnote (3), add phrase "which may depend on the memory location of the
keys" to get:

Keys and values are listed in an arbitrary order,
which may depend on the memory location of the keys.
This order is non-random, varies across Python implementations,
and depends on the dictionary's history of insertions and deletions.

http://docs.python.org/lib/types-set.html: append a new sentence to 2nd
paragraph

Iteration over a set returns elements in an arbitrary order,
which may depend on the memory location of the elements.

fwiw,
Alan Isaac

May 10 '07 #37

Raymond Hettinger

On May 9, 6:42 am, "Alan Isaac" <ais...@american.eduwrote:

Is there
a warning anywhere in the docs? Should
there be?

I do not think additional documentation here would be helpful. One
could note that the default hash value is the object id. Somewhere
else you could write that the placement of objects in memory is
arbitrary and can be affected by a number of factors not explicity
under user control.

With those notes scattered throughout the documentation, I'm not sure
that you would have found them and recognized the implications with
respect to your design and with respect to the deletion of pyc files
(which is just one factor among many that could cause different
placements in memory).

Also, the existing docs describe behavior at a more granular level.
How the parts interact is typically left to third-party documentation
(i.e. the set docs say what the set methods do but do not give advice
on when to use them instead of a dict or list).

Out of this thread, the more important lesson is that the docs
intentionally do not comment on implemation specific details. When the
docs do not make specific guarantees and behavior is left undefined,
it is not a good practice to make assumptions about invariants that
may or may not be true (in your case, you assumed that objects would
be located in the same memory locations between runs -- while that
sometimes happens to be true, it is certainly not guaranteed behavior
as you found out -- moreover, you've made non-guaranteed assumptions
about the arbitrary ordering of an unordered collection -- a definite
no-no).
Raymond Hettinger

May 10 '07 #38

Robert Kern

Alan Isaac wrote:

"Carsten Haese" <ca*****@uniqsys.comwrote in message
news:ma***************************************@pyt hon.org...
>I was simply pointing out all the ways in which you made it difficult for
the
>community to explain your problem.

And without that community, I would still not have a clue.
Thanks to all!

>Please feel free to suggest specific wording changes to make the
documentation
>more useful.

I'm sure my first pass will be flawed, but here goes:

http://docs.python.org/lib/typesmapping.html:
to footnote (3), add phrase "which may depend on the memory location of the
keys" to get:

Keys and values are listed in an arbitrary order,
which may depend on the memory location of the keys.
This order is non-random, varies across Python implementations,
and depends on the dictionary's history of insertions and deletions.

http://docs.python.org/lib/types-set.html: append a new sentence to 2nd
paragraph

Iteration over a set returns elements in an arbitrary order,
which may depend on the memory location of the elements.

It's misleading. It only depends on the memory location of the elements if
__hash__() is implemented as id() (the default).

How about this?

"""Never rely on the order of dictionaries and sets."""

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

May 10 '07 #39

Carsten Haese

On Thu, 2007-05-10 at 11:29 -0500, Robert Kern wrote:

"""Never rely on the order of dictionaries and sets."""

Easy, Robert, there's a baby in that bathwater.

I think it's useful to note that the arbitrary ordering returned by
dict.keys() et al. is locally stable in the absence of intervening
modifications, as long as the guarantee is worded in a way that prevents
overly optimistic reliance on that ordering.

-Carsten

May 10 '07 #40

change of random state when pyc created??

Similar topics