Speed quirk: redundant line gives six-fold speedup

Mark Dickinson

I have a simple 192-line Python script that begins with the line:

dummy0 = 47

The script runs in less than 2.5 seconds. The variable dummy0 is never
referenced again, directly or indirectly, by the rest of the script.

Here's the surprise: if I remove or comment out this first line, the
script takes more than 15 seconds to run. So it appears that adding a
redundant line produces a spectacular six-fold increase in speed!

(Actually, I had to add 29 dummy lines at the beginning of the code to
get the speed increase; if any one of these lines is removed the
running time reverts to around 15 seconds again.)

Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?

I'm running Python 2.4.1 on a 1.2Ghz iBook G4:

Python 2.4.1 (#1, May 21 2005, 19:56:42)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin

I've posted the code below, with some trepidation, since it's not a
work of art and wasn't really intended to be seen by other human
beings. It's necessarily quite long: any attempt to shorten it
significantly seems to cancel the speed gain. Any clues as to what
might be going on would be greatly appreciated!

Mark

# code starts here
dummy0 = 47
dummy1 = 47
dummy2 = 47
dummy3 = 47
dummy4 = 47
dummy5 = 47
dummy6 = 47
dummy7 = 47
dummy8 = 47
dummy9 = 47
dummy10 = 47
dummy11 = 47
dummy12 = 47
dummy13 = 47
dummy14 = 47
dummy15 = 47
dummy16 = 47
dummy17 = 47
dummy18 = 47
dummy19 = 47
dummy20 = 47
dummy21 = 47
dummy22 = 47
dummy23 = 47
dummy24 = 47
dummy25 = 47
dummy26 = 47
dummy27 = 47
dummy28 = 47

# Sudoku puzzle solver via Knuth's method of `dancing links'.

# Initial data: list of constraints, list of moves, and map from moves to lists of constraints

template = (" | %s %s %s | %s %s %s | %s %s %s |\n" * 3).join([" +-------+-------+-------+\n"] * 4)
div_nums = range(9)
symbols = "123456789"

constraints = ["position %d, %d" % (i, j) for i in div_nums for j in div_nums] + \
["%s in %s %d" % (i, j, k) for i in symbols for j in ["row", "column", "block"] for k in div_nums]

satisfies = dict(((s, i, j),
["position %d, %d" % (i, j),
"%s in row %d" % (s, i),
"%s in column %d" % (s, j),
"%s in block %d" % (s, i-i%3+j//3)]) for s in symbols for i in div_nums for j in div_nums)

moves = satisfies.keys()

class LLentry(object): pass

# First set up the data objects and column objects

def rowhead(obj):
obj.L = obj.R = obj.M = obj

def colhead(obj):
obj.U = obj.D = obj.C = obj
obj.S = 0

def rowinsert(obj, pos):
# insert into doubly-linked list with header pos
posL = pos.L
obj.R = pos
pos.L = obj
obj.L = posL
posL.R = obj
obj.M = pos

def colinsert(obj, pos):
# as above
posU = pos.U
obj.D = pos
pos.U = obj
obj.U = posU
posU.D = obj
obj.C = pos
pos.S += 1

def rowitems(pos):
c = pos.R
while c is not pos:
yield c
c = c.R

def move(m):
cc = m.R
while cc is not m:
c = cc.C
c.R.L = c.L; c.L.R = c.R
r = c.D
while r is not c:
j = r.R
while j is not r:
j.D.U = j.U
j.U.D = j.D
j.C.S -= 1
j = j.R
r = r.D
cc = cc.R
moves_so_far.append(m)

h = LLentry()
rowhead(h); colhead(h)

constraint_from_name = {}
for name in constraints:
obj = LLentry()
obj.N = name; constraint_from_name[name] = obj
rowinsert(obj, h); colhead(obj)
obj.S = 0

move_from_name = {}
for m in satisfies.keys():
# we must assume that each move satisfies at least one constraint
obj = LLentry()
obj.N = m; move_from_name[m] = obj
colinsert(obj, h); rowhead(obj)

ones = [(move_from_name[m], constraint_from_name[c]) for m, cc in satisfies.items() for c in cc]
for m, c in ones:
obj = LLentry()
rowinsert(obj, m)
colinsert(obj, c)

moves_so_far = []
# everything's now set up to start the search

def search():
if h.L is h:
data = dict(((i, j), s) for s, i, j in (m.N for m in moves_so_far))
yield template % tuple(data[i, j] for i in div_nums for j in div_nums)
else:
mm = min((c.S, c) for c in rowitems(h))[1].D
while mm is not mm.C:
m = mm.M
cc = m.R
while cc is not m:
c = cc.C
c.R.L = c.L
c.L.R = c.R
r = c.D
while r is not c:
j = r.R
while j is not r:
j.D.U = j.U
j.U.D = j.D
j.C.S -= 1
j = j.R
r = r.D
cc = cc.R
moves_so_far.append(m)
for solution in search():
yield solution
m = moves_so_far.pop()
cc = m.L
while cc is not m:
c = cc.C
r = c.U
while r is not c:
j = r.L
while j is not r:
j.D.U = j.U.D = j
j.C.S += 1
j = j.L
r = r.U
c.R.L = c.L.R = c
cc = cc.L
mm = mm.D

rows = [
"7......19",
"46.19....",
"...6827.4",
".9......7",
"...3..4.5",
"..67.....",
"..1......",
"2...74...",
"...2..3.."]

for r, row in enumerate(rows):
for c, entry in enumerate(row):
if entry != '.':
move(move_from_name[(entry, r, c)])

import time
t = time.time()
for i in range(10):
for solution in search():
print solution
print "Total time taken: %s seconds" % (time.time() - t)

Aug 25 '05 #1

Subscribe Post Reply

2607

Jack Diederich

On Thu, Aug 25, 2005 at 04:44:24PM +0000, Mark Dickinson wrote:

I have a simple 192-line Python script that begins with the line:

dummy0 = 47

The script runs in less than 2.5 seconds. The variable dummy0 is never
referenced again, directly or indirectly, by the rest of the script.

Here's the surprise: if I remove or comment out this first line, the
script takes more than 15 seconds to run. So it appears that adding a
redundant line produces a spectacular six-fold increase in speed!

(Actually, I had to add 29 dummy lines at the beginning of the code to
get the speed increase; if any one of these lines is removed the
running time reverts to around 15 seconds again.)

Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
I get the same thing.
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
It seems to be related to the number of globals. I get the "fast"
version with 30 to 120 globals and the "slow" version with less than 30
or more than 130. It actually gets even slower for higher numbers
of globals.

Here is a snippet to adjust the number of globals

for (i) in range(100):
globals()['dummy%d' % (i)] = 1
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?

Yes, module level globals have bad lookup times compared to function
local names. If you refactor your code to pass around the data
currently at the global module level you should see times at least
as fast as the current 'fast' one.

That said, I'm very surprised that the lookup times jump around so much.
Your code does bazillions of namespace lookups, so a small difference
in lookup times is getting multiplied into some really big numbers.

-jackdied

Aug 25 '05 #2

Erik Max Francis

Mark Dickinson wrote:

Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?

I see no difference in execution times, as expected. The most likely
explanation is simply that other things were going on on your system
when you ran the first test, but not the second test, resulting in the
discrepancy. In other words, the speed change had nothing to do with
your dummy lines.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
There never was a good war or a bad peace.
-- Benjamin Franklin

Aug 25 '05 #3

Bill Mill

On 8/25/05, Mark Dickinson <di******@verizon.net> wrote:

I have a simple 192-line Python script that begins with the line:

dummy0 = 47

The script runs in less than 2.5 seconds. The variable dummy0 is never
referenced again, directly or indirectly, by the rest of the script.

Here's the surprise: if I remove or comment out this first line, the
script takes more than 15 seconds to run. So it appears that adding a
redundant line produces a spectacular six-fold increase in speed!

(Actually, I had to add 29 dummy lines at the beginning of the code to
get the speed increase; if any one of these lines is removed the
running time reverts to around 15 seconds again.)

Questions:

<snip>

One of my own: what in the world made you think "maybe I'll add 29
dummy global variables to speed things up?"

It seems to work (>19x speedup on my machine), I'm just curious what
path you followed to get there.

And, finally, you should forward this to the python-dev list, if
somebody hasn't already. There are more people who know a ton about
python internals there.

Peace
Bill Mill
bill.mill at gmail.com

Aug 25 '05 #4

jay graves

Mark Dickinson wrote:

Questions:
(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
yes. I get 7 sec vs 1 sec on my laptop.
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
I don't think there are any optimizations at play here.
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?

When I start a python interpreter and import the module the speed
difference disappears and it actually gets about two times faster.
YMMV.

I don't have a solution but I admire the problem. [1]

....
jay

[1] Google tells me that this is probably attributable to Ashleigh
Brilliant.

Aug 25 '05 #5

Bill Mill

On 8/25/05, Erik Max Francis <ma*@alcyone.com> wrote:

Mark Dickinson wrote:

Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?

I see no difference in execution times, as expected. The most likely
explanation is simply that other things were going on on your system
when you ran the first test, but not the second test, resulting in the
discrepancy. In other words, the speed change had nothing to do with
your dummy lines.

Unlikely; 2 people have confirmed these results already.

I did find, though, that if I remove all print statements from the
program, the dummy and non-dummy variable versions take indentical
time. Can others reproduce this?

I'm Investigating further...

Peace
Bill Mill
bill.mill at gmail.com

Aug 25 '05 #6

Erik Max Francis

Bill Mill wrote:

Unlikely; 2 people have confirmed these results already.

I did find, though, that if I remove all print statements from the
program, the dummy and non-dummy variable versions take indentical
time. Can others reproduce this?

Yes, it's obviously a real effect given the other sightings. I don't
see any speed difference, myself (Pentium IV 3.0 GHz running Slackware
Linux).

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
There never was a good war or a bad peace.
-- Benjamin Franklin

Aug 25 '05 #7

Bill Mill

On 8/25/05, Erik Max Francis <ma*@alcyone.com> wrote:

Bill Mill wrote:

Unlikely; 2 people have confirmed these results already.

I did find, though, that if I remove all print statements from the
program, the dummy and non-dummy variable versions take indentical
time. Can others reproduce this?

Yes, it's obviously a real effect given the other sightings. I don't
see any speed difference, myself (Pentium IV 3.0 GHz running Slackware
Linux).

Pentium M 1.8 GHz Windows 2k. Here's the top of the profile results
for fast and slow on my machine (these won't look decent except in a
fixed-width font):

Slow:

6766494 function calls (6737594 primitive calls) in 45.740 CPU seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
3322320 20.539 0.000 31.152 0.000 test.py:135(<generator expression>
)
27520 10.641 0.000 41.792 0.002 :0(min)
3322320 10.613 0.000 10.613 0.000 test.py:81(rowitems)
28100/20 3.620 0.000 45.633 2.282 test.py:130(search)
27545 0.113 0.000 0.113 0.000 :0(append)
27520 0.098 0.000 0.098 0.000 :0(pop)
1 0.041 0.041 45.736 45.736 test.py:36(?)

Fast:

540174 function calls (536514 primitive calls) in 3.506 CPU seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
259640 1.516 0.000 2.303 0.000 test.py:135(<generator expression>
)
2280 0.791 0.000 3.094 0.001 :0(min)
259640 0.788 0.000 0.788 0.000 test.py:81(rowitems)
2860/20 0.269 0.000 3.391 0.170 test.py:130(search)
1 0.045 0.045 3.499 3.499 test.py:2(?)
3645 0.021 0.000 0.021 0.000 test.py:71(colinsert)
3240 0.019 0.000 0.019 0.000 test.py:62(rowinsert)
2305 0.010 0.000 0.010 0.000 :0(append)

Interestingly, the test.py:36 line, which takes 45 seconds (!!) in the
slow version, does not appear at all in the fast profile. I can't
figure out why - both printed out their data, so template must have
been called somewhere.

Peace
Bill Mill
bill.mill at gmail.com

Aug 25 '05 #8

Bill Mill

Bill Mill wrote:

Pentium M 1.8 GHz Windows 2k. Here's the top of the profile results
for fast and slow on my machine (these won't look decent except in a
fixed-width font):
<snip profiles>
Interestingly, the test.py:36 line, which takes 45 seconds (!!) in the
slow version, does not appear at all in the fast profile. I can't
figure out why - both printed out their data, so template must have
been called somewhere.

OK, I'm getting somewhere now. When I replace:

template = (" | %s %s %s | %s %s %s | %s %s %s |\n" * 3).join(["
+-------+-------+-------+\n"] * 4)

wtih:

template = """ | %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
+-------+-------+-------+\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
+-------+-------+-------+\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
+-------+-------+-------+\n"""

Then the non-dummy version is faster than the dummy version (very
slightly, presumably because it doesn't need to allocate 28 dummy
variables).

Peace
Bill Mill
bill.mill at gmail.com

Aug 25 '05 #9

Jack Diederich

On Thu, Aug 25, 2005 at 01:35:04PM -0400, Bill Mill wrote:

On 8/25/05, Erik Max Francis <ma*@alcyone.com> wrote:
Mark Dickinson wrote:
Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?

I did find, though, that if I remove all print statements from the
program, the dummy and non-dummy variable versions take indentical
time. Can others reproduce this?

I'm Investigating further...

I'm getting similarly freakish results. I tried a little ghetto debugging
by putting a printf in dictobject.c's resize method and recompiling python.
Sadly I can't get the problem to reproduce itself with the new binary
(with or without the printf). The Ubuntu default 2.4.1 is sometimes fast,
my hand compiled one (./configure && make) is always slow.

There are some very arcane low level things going on here.

sprat:~/src/Python-2.4.1# time ./python /tmp/odd.py > /dev/null
7.876u 0.008s 0:07.91 99.4% 0+0k 0+0io 0pf+0w
sprat:~/src/Python-2.4.1# time python /tmp/odd.py > /dev/null
1.813u 0.004s 0:01.77 102.2% 0+0k 0+0io 0pf+0w

sprat:~/src/Python-2.4.1# ./python
Python 2.4.1 (#5, Aug 25 2005, 13:55:44)
[GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

sprat:~/src/Python-2.4.1# python
Python 2.4.1 (#2, Mar 30 2005, 21:51:10)
[GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

No-idea-ly,

-jackdied

Aug 25 '05 #10

Mark Dickinson

In article <ma***************************************@python. org>, Bill
Mill <bi*******@gmail.com> wrote:

One of my own: what in the world made you think "maybe I'll add 29
dummy global variables to speed things up?"

You mean this isn't a well-known optimization technique? :)

I was refactoring the code, and after making a particular function
redundant I noticed that removing the code for that function produced
the slow down described. Then I naturally experimented to try to figure
out what I had to put back in to recover the original speed.

Not that I really care about the speed itself, but I'd dearly like to
understand what's at work here.

Mark

Aug 25 '05 #11

Jack Diederich

On Thu, Aug 25, 2005 at 01:55:48PM -0400, Bill Mill wrote:

Bill Mill wrote:

Pentium M 1.8 GHz Windows 2k. Here's the top of the profile results
for fast and slow on my machine (these won't look decent except in a
fixed-width font):

<snip profiles>

Interestingly, the test.py:36 line, which takes 45 seconds (!!) in the
slow version, does not appear at all in the fast profile. I can't
figure out why - both printed out their data, so template must have
been called somewhere.

OK, I'm getting somewhere now. When I replace:

template = (" | %s %s %s | %s %s %s | %s %s %s |\n" * 3).join(["
+-------+-------+-------+\n"] * 4)

wtih:

template = """ | %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
+-------+-------+-------+\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
+-------+-------+-------+\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
+-------+-------+-------+\n"""

Then the non-dummy version is faster than the dummy version (very
slightly, presumably because it doesn't need to allocate 28 dummy
variables).

I think this is just another tweaking that hits the magic spot (or
avoids the bad spot) in some lookup table somewhere. Perhaps it
has to do with the number interned strings and some worst case
behavior? varnames are interened, so changing the number of globals
would change the number of interened strings as well.

-jackdied

Aug 25 '05 #12

Bill Mill

On 8/25/05, Jack Diederich <ja**@performancedrivers.com> wrote:

On Thu, Aug 25, 2005 at 01:35:04PM -0400, Bill Mill wrote:
On 8/25/05, Erik Max Francis <ma*@alcyone.com> wrote:
Mark Dickinson wrote:

> Questions:
>
> (1) Can anyone else reproduce this behaviour, or is it just some quirk
> of my setup?
> (2) Any possible explanations? Is there some optimization that kicks
> in at a certain number of lines, or at a certain length of
> bytecode?
> (3) If (2), is there some way to force the optimization, so that I can
> get the speed increase without having to add the extra lines?

I did find, though, that if I remove all print statements from the
program, the dummy and non-dummy variable versions take indentical
time. Can others reproduce this?

I'm Investigating further...

I'm getting similarly freakish results. I tried a little ghetto debugging
by putting a printf in dictobject.c's resize method and recompiling python.
Sadly I can't get the problem to reproduce itself with the new binary
(with or without the printf). The Ubuntu default 2.4.1 is sometimes fast,
my hand compiled one (./configure && make) is always slow.

There are some very arcane low level things going on here.

agreed. Also, either I was temporarily insane, or the version with the
explicit template no longer runs faster for me, so I hope nobody
spends a lot of time on that.

Peace
Bill Mill
bill.mill at gmail.com

Aug 25 '05 #13

Stelios Xanthakis

Mark Dickinson wrote:

I have a simple 192-line Python script that begins with the line:

dummy0 = 47

The script runs in less than 2.5 seconds. The variable dummy0 is never
referenced again, directly or indirectly, by the rest of the script.

Here's the surprise: if I remove or comment out this first line, the
script takes more than 15 seconds to run. So it appears that adding a
redundant line produces a spectacular six-fold increase in speed!

(Actually, I had to add 29 dummy lines at the beginning of the code to
get the speed increase; if any one of these lines is removed the
running time reverts to around 15 seconds again.)

Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?

Hi.

I haven't been able to reproduce this but I had a similar case
before (a program that some times crashed and some times worked
perfectly and the difference was whitespace in the comments!!!).

After lots of wondering and thinking that I must be dreaming
(luckily I had pyvm which also crashed but for different amounts
of whitespace), it was solved. The explanation is this: hash
and comparison of objects depends on the state of the memory
allocator. A sample case is this:

class A: pass
dummy0=47 # comment this to get a different result for min
a=A()
b=A()
print min (a, b)

the result of 'min' is not only non-deterministic but also depends
on whether other things have been allocated before. The same
thing can happen for 'dictionary.keys()' if the keys are objects
and 'iterate-over-set' when the set contains objects.

In the sudoku solver, there is a min (number, object) which is
probably what's affected by the extistance of the dummy variable.
Now, in sudoku puzzles some times the algorithm has to suppose
that in a box the right solution is one of the possible, try it
and if it fails then try the other one. I guess that the result
of from the different 'min' leads the solver to first try the
correct solution in the fast case and therefore need not attempt
the wrong one. Or at least this is what I think that happens.

By the way, since we started posting code, here is a more pythonic
sudoku solver.

#----The 'man with scissors runs around shifting barrels algorithm'

from sys import argv
SEMANTIC = 1'SEM' in argv

class Impossible:
pass

Patterns = []
for r in range (9):
for c in range (9):
p = 27*(r/3) + 3*(c/3)
pl = set (range (9*r, 9*r+9) + range (c, 81, 9) + [p+x for x in
(0,1,2,9,10,11,18,19,20)])
pl.remove (9*r+c)
Patterns.append (tuple (sorted (list (pl))))

def initboard ():
x = range (1, 10)
return [ x [:] for i in xrange (9*9) ]

def printboard (board):
if not SEMANTIC:
return
print 30*'-'
for i in range (9):
for j in board [9*i:9*(i+1)]:
if type (j) is list:
#print 'X',
print ''.join (map (str, j)),
else: print j,
print
print 30*'-'

def dupboard (board):
B = []
for i in board:
if type (i) is list:
B.append (i [:])
else:
B.append (i)
return B

def solve (board, coords):
while coords:
p, v = coords.pop ()
board [p] = v
for i in Patterns [p]:
if type (board [i]) is list:
if v in board [i]:
board [i].remove (v)
if len (board [i]) == 1:
board [i] = board [i][0]
coords.append ((i, board [i]))
else:
if board [i] == v:
raise Impossible
for p, i in enumerate (board):
if type (i) is list:
for j in i:
try:
return solve (dupboard (board), [(p, j)])
except Impossible:
pass
raise Impossible
return board
PP = [
[
"7xxxxxx19",
"46x19xxxx",
"xxx6827x4",
"x9xxxxxx7",
"xxx3xx4x5",
"xx67xxxxx",
"xx1xxxxxx",
"2xxx74xxx",
"xxx2xx3xx",
]
]

def puz2coord (P):
if len (P) != 9:
print "P must have 9 rows"
raise SystemExit
coords = []
for r, i in enumerate (P):
if len (i) != 9:
print "Row [%s] doesn't have 9 columns" %i
raise SystemExit
for c, j in enumerate (list (i)):
if j != 'x':
coords.append ((9*r + c, int (j)))
return coords

try:
if SEMANTIC:
for i in xrange (10):
for P in PP:
printboard (solve (initboard (), puz2coord (P)))
else:
for i in xrange (TIMES):
for P in PP:
printboard (solve (initboard (), puz2coord (P)))
except Impossible:
print "IMPOSSIBLY IMPOSSIBLE"

Aug 25 '05 #14

Jack Diederich

On Thu, Aug 25, 2005 at 09:23:09PM +0300, Stelios Xanthakis wrote:

The explanation is this: hash
and comparison of objects depends on the state of the memory
allocator. A sample case is this:

class A: pass
dummy0=47 # comment this to get a different result for min
a=A()
b=A()
print min (a, b)

the result of 'min' is not only non-deterministic but also depends
on whether other things have been allocated before. The same
thing can happen for 'dictionary.keys()' if the keys are objects
and 'iterate-over-set' when the set contains objects.

In the sudoku solver, there is a min (number, object) which is
probably what's affected by the extistance of the dummy variable.
Now, in sudoku puzzles some times the algorithm has to suppose

Doh, I feel silly. Without an 'import random' in the program I
assumed it was deterministic. I would have also expected the
comparison to be a TypeError, and it sometimes is.

mynum = 7
myob = object()
cmp(mynum, myob) -1 mynum.__cmp__(myob)

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: int.__cmp__(x,y) requires y to be a 'int', not a 'object'
Thanks for the explanation,
-jackdied

Aug 25 '05 #15

Bill Mill

On 8/25/05, Jack Diederich <ja**@performancedrivers.com> wrote:

On Thu, Aug 25, 2005 at 09:23:09PM +0300, Stelios Xanthakis wrote:
The explanation is this: hash
and comparison of objects depends on the state of the memory
allocator. A sample case is this:

class A: pass
dummy0=47 # comment this to get a different result for min
a=A()
b=A()
print min (a, b)

the result of 'min' is not only non-deterministic but also depends
on whether other things have been allocated before. The same
thing can happen for 'dictionary.keys()' if the keys are objects
and 'iterate-over-set' when the set contains objects.

I'm also pretty sure I've caught a bug in his code, though I'm not
sure how it works exactly. I replaced the 'min' built-in with my own
min, and he's going to get nondeterministic results from this line:

mm = min((c.S, c) for c in rowitems(h))[1].D

because 'c' is often the exact same object. A snippet from my
debugging version of 'min', which prints out the tuple its handed:

(1, <__main__.LLentry object at 0x00969710>)
(1, <__main__.LLentry object at 0x00969710>)
(4, <__main__.LLentry object at 0x00969710>)
<snip more 4s from the same object>
(4, <__main__.LLentry object at 0x00969710>)
(3, <__main__.LLentry object at 0x00969710>)
(3, <__main__.LLentry object at 0x00969710>)
(3, <__main__.LLentry object at 0x00969710>)
(2, <__main__.LLentry object at 0x00969710>)
<snip more 2s, same object>

Although they appear in order here, they don't always. Often, multiple
objects have a value of 1, and he's going to get one of them at random
as the 'min' object. I'm pretty sure.

Mark, can you confirm that this is/isn't a bug?

(btw, it still runs fast with and slow without the dummies with my
custom min() func)

Peace
Bill Mill
bill.mill at gmail.com

Aug 25 '05 #16

Mark Dickinson

In article <ma***************************************@python. org>,
Stelios Xanthakis <sx****@ceid.upatras.gr> wrote:

In the sudoku solver, there is a min (number, object) which is
probably what's affected by the extistance of the dummy variable.
Now, in sudoku puzzles some times the algorithm has to suppose
that in a box the right solution is one of the possible, try it
and if it fails then try the other one. I guess that the result
of from the different 'min' leads the solver to first try the
correct solution in the fast case and therefore need not attempt
the wrong one. Or at least this is what I think that happens.

Thank you! That does indeed seem to be the explanation.
The min() line looks for the shortest `constraint'; that is, the box
with the fewest possible symbols, or the symbol with the smallest
number of possible positions within a given row, column or block; the
code below that line then tries all possibilities for this constraint.
The line should really be something like

min(c for c in rowitems(h), key = lambda c: c.S)

but that isn't going to work until Python 2.5 comes out, and I was
too lazy to expand the whole thing properly instead of using the
cheap trick I did.

Thanks again for clearing up this confusion.

Mark

Aug 25 '05 #17

Mark Dickinson

In article <ma***************************************@python. org>, Bill
Mill <bi*******@gmail.com> wrote:

I'm also pretty sure I've caught a bug in his code, though I'm not
sure how it works exactly. I replaced the 'min' built-in with my own
min, and he's going to get nondeterministic results from this line:

mm = min((c.S, c) for c in rowitems(h))[1].D

because 'c' is often the exact same object. A snippet from my
debugging version of 'min', which prints out the tuple its handed:

(1, < main .LLentry object at 0x00969710>)
(1, < main .LLentry object at 0x00969710>)
(4, < main .LLentry object at 0x00969710>)
<snip more 4s from the same object>
(4, < main .LLentry object at 0x00969710>)
(3, < main .LLentry object at 0x00969710>)
(3, < main .LLentry object at 0x00969710>)
(3, < main .LLentry object at 0x00969710>)
(2, < main .LLentry object at 0x00969710>)
<snip more 2s, same object>
The c's returned by any one call to rowitems(h) should all be distinct.
This seems to work in my code---I can't reproduce your results above, and
I don't *think* there's a bug there.
Although they appear in order here, they don't always. Often, multiple
objects have a value of 1, and he's going to get one of them at random
as the 'min' object. I'm pretty sure.

I agree entirely---this is where my bug, and the source of
all the confusion comes from. I was briefly aware as I wrote this that the
result would be nondeterministic, but persuaded myself after two second's
thought that it didn't matter, then forgot all about it.

So mild changes in the rest of the program give rise to a different `random'
choice in the min, and choosing to eunumerate all 3 possibilities for position 5,7
instead of the 3 possibilities for position 2, 1 (say) makes a huge diffference
to the running time. I'm still surprised by the magnitude of the differences, though.

I've learnt my lesson :) Thank you for your help, and apologies
for wasting other people's time with this as well as my own!

Mark

Aug 25 '05 #18

Raymond Hettinger

Mark Dickinson wrote:

I have a simple 192-line Python script that begins with the line:

dummy0 = 47

The script runs in less than 2.5 seconds. The variable dummy0 is never
referenced again, directly or indirectly, by the rest of the script.

Here's the surprise: if I remove or comment out this first line, the
script takes more than 15 seconds to run. So it appears that adding a
redundant line produces a spectacular six-fold increase in speed!

Thanks for your post. It is cute, brilliant, and interesting.

I haven't had time to investigate but can point at the likely cause.
All of the global names are stored in a single hash table. Search time
is dictated by two factors, the sparseness of the hash table and the
distribution of hash values. With respect to sparseness, whenever that
table becomes 2/3 full, it grows by a factor of four and becomes only
1/6 full (leading to many fewer collisions). With respect to
distribution, it should be noted that string hash values are decidely
non-random and your variable names likely congested consecutive spaces
in a nearly full table (resulting in seven times as many search probes
to find a global value).

When the extra value was added, it likely resized the table four-fold
and redistributed the hash values into fewer consecutive positions.
Raymond
P.S. To analyze it further, start with something like this:

len(set(hash('dummy%d' %i) & 31 for i in xrange(29))) 26 len(set(hash('dummy%d' %i) & 127 for i in xrange(29)))

Aug 25 '05 #19

rafi

Stelios Xanthakis wrote:

Mark Dickinson wrote:
I have a simple 192-line Python script that begins with the line:

dummy0 = 47

The script runs in less than 2.5 seconds. The variable dummy0 is never
referenced again, directly or indirectly, by the rest of the script.

Here's the surprise: if I remove or comment out this first line, the
script takes more than 15 seconds to run. So it appears that adding a
redundant line produces a spectacular six-fold increase in speed!

(Actually, I had to add 29 dummy lines at the beginning of the code to
get the speed increase; if any one of these lines is removed the
running time reverts to around 15 seconds again.)

Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?

Hi.

I haven't been able to reproduce this but I had a similar case
before (a program that some times crashed and some times worked
perfectly and the difference was whitespace in the comments!!!).

After lots of wondering and thinking that I must be dreaming
(luckily I had pyvm which also crashed but for different amounts
of whitespace), it was solved. The explanation is this: hash
and comparison of objects depends on the state of the memory
allocator. A sample case is this:

class A: pass
dummy0=47 # comment this to get a different result for min
a=A()
b=A()
print min (a, b)

the result of 'min' is not only non-deterministic but also depends
on whether other things have been allocated before. The same
thing can happen for 'dictionary.keys()' if the keys are objects
and 'iterate-over-set' when the set contains objects.

I do not get the point here: isn't min comparing the adress in memory as
there is nothing else to compare?

[python 2.4.1 on ubuntu linux]

On 10 runs from within emacs I had about 50% for `a' and 50% for `b'
returned by min (a,b). It was the same without the dummy0=47.

On 10 runs from the command line I had always `a' returned (with and
without the dummy). The only difference was the address of b's object.
(For all the run of each case the two addresses were exactly the same)

For your first post:

On 10 tests run for each case (average time, but each time was never
different one from the other more that .02s)

0.554 when the 28 dummies are in place
6.679 when commented out, removed or put in a single multiline string
2.576 when putting the 28 dummies in a list or a dict using a for loop
7.195 when creating the 28 dummies using an exec in a for loop

What about memory allocation that would be performed chunk by chunk by
the interpreter? Then being at the end or at the beginning of a chunk
may not be the same for processing? All the instructions of the program
would then be in the cpu cache for example in the same block while in
the other case theyr would be in two distinct block -> thus less caching
for the cpu... [this particular problem arose when I was working several
year ago on a persistent object manager for a distributed operating system]

--
rafi

"Imagination is more important than knowledge."
(Albert Einstein)

Aug 25 '05 #20

MrJean1

Two observations:

1 - The difference in run time with and without the dummy* globals is
due to a difference in the number of invokations of the search()
function: 1,140 resp. 27,530 in my environment.

To verify, just change the line

def search():
....

to

searches = 0
def search():
global searches
searches += 1
....

and add at the very end

print searches, "searches"
2 - The run times with and without the dummy* variables is equal(ly
slow) if the LLentry() class and min() function call are modified to be
independent of the object value.

Change line

class LLentry: pass

to

LLinst = 0
class LLentry(object):
def __init__(self):
global LLinst
LLinst += 1
self.I = LLinst

and change line

mm = min((c.S, c) for c in rowitems(h))[1].D

to

mm = min((c.S, c.I, c) for c in rowitems(h))[2].D
/Jean Brouwers

Aug 25 '05 #21

Claudio Grondi

> I've learnt my lesson :) Thank you for your help, and apologies

for wasting other people's time with this as well as my own!

I've learnt my lesson reading through this thread, too.

I am glad to be given the chance of wasting my time
with it and very happy and thankful, that you posted
your problem here before I have bumped into similar
one myself. I suppose, that there are many others
on this newsgroup who appreciated your posting,
too.

Claudio

Aug 25 '05 #22

Benjamin Niemann

Raymond Hettinger wrote:

Mark Dickinson wrote:
I have a simple 192-line Python script that begins with the line:

dummy0 = 47

The script runs in less than 2.5 seconds. The variable dummy0 is never
referenced again, directly or indirectly, by the rest of the script.

Here's the surprise: if I remove or comment out this first line, the
script takes more than 15 seconds to run. So it appears that adding a
redundant line produces a spectacular six-fold increase in speed!

Thanks for your post. It is cute, brilliant, and interesting.

I haven't had time to investigate but can point at the likely cause.
All of the global names are stored in a single hash table. Search time
is dictated by two factors, the sparseness of the hash table and the
distribution of hash values. With respect to sparseness, whenever that
table becomes 2/3 full, it grows by a factor of four and becomes only
1/6 full (leading to many fewer collisions). With respect to
distribution, it should be noted that string hash values are decidely
non-random and your variable names likely congested consecutive spaces
in a nearly full table (resulting in seven times as many search probes
to find a global value).

When the extra value was added, it likely resized the table four-fold
and redistributed the hash values into fewer consecutive positions.

If that's the cause, then here's another reason to use long, descriptive
names instead of C64-BASIC style a, b, c, i, j... - with long names the
chances of hash collisions are pretty low.
Or everyone will start optimizing their programs by using long, *random*
names ;)
--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/

Aug 26 '05 #23

Raymond Hettinger

[Raymond Hettinger]

With respect to
distribution, it should be noted that string hash values are decidely
non-random and your variable names likely congested consecutive spaces
in a nearly full table (resulting in seven times as many search probes
to find a global value).

When the extra value was added, it likely resized the table four-fold
and redistributed the hash values into fewer consecutive positions.

[Benjamin Niemann] If that's the cause, then here's another reason to use long, descriptive
names instead of C64-BASIC style a, b, c, i, j... - with long names the
chances of hash collisions are pretty low.
Or everyone will start optimizing their programs by using long, *random*
names ;)

The wink applied to second line but could have also applied to the
first. For the most part, the non-randomness of string hashes tends to
work in your favor -- it can result in collision free hash tables.

For optimization, the best approach is to use local variables in your
innermost loops.

Alternatively, if you want to be tricky, try something like this:

globals().update(globals())

What this does and why it can improve search times is left as an
exercise for the reader :-)
Raymond

Aug 26 '05 #24

Speed quirk: redundant line gives six-fold speedup

Similar topics