473,545 Members | 1,924 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Speed quirk: redundant line gives six-fold speedup

I have a simple 192-line Python script that begins with the line:

dummy0 = 47

The script runs in less than 2.5 seconds. The variable dummy0 is never
referenced again, directly or indirectly, by the rest of the script.

Here's the surprise: if I remove or comment out this first line, the
script takes more than 15 seconds to run. So it appears that adding a
redundant line produces a spectacular six-fold increase in speed!

(Actually, I had to add 29 dummy lines at the beginning of the code to
get the speed increase; if any one of these lines is removed the
running time reverts to around 15 seconds again.)

Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?

I'm running Python 2.4.1 on a 1.2Ghz iBook G4:

Python 2.4.1 (#1, May 21 2005, 19:56:42)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin

I've posted the code below, with some trepidation, since it's not a
work of art and wasn't really intended to be seen by other human
beings. It's necessarily quite long: any attempt to shorten it
significantly seems to cancel the speed gain. Any clues as to what
might be going on would be greatly appreciated!

Mark

# code starts here
dummy0 = 47
dummy1 = 47
dummy2 = 47
dummy3 = 47
dummy4 = 47
dummy5 = 47
dummy6 = 47
dummy7 = 47
dummy8 = 47
dummy9 = 47
dummy10 = 47
dummy11 = 47
dummy12 = 47
dummy13 = 47
dummy14 = 47
dummy15 = 47
dummy16 = 47
dummy17 = 47
dummy18 = 47
dummy19 = 47
dummy20 = 47
dummy21 = 47
dummy22 = 47
dummy23 = 47
dummy24 = 47
dummy25 = 47
dummy26 = 47
dummy27 = 47
dummy28 = 47

# Sudoku puzzle solver via Knuth's method of `dancing links'.

# Initial data: list of constraints, list of moves, and map from moves to lists of constraints

template = (" | %s %s %s | %s %s %s | %s %s %s |\n" * 3).join([" +-------+-------+-------+\n"] * 4)
div_nums = range(9)
symbols = "123456789"

constraints = ["position %d, %d" % (i, j) for i in div_nums for j in div_nums] + \
["%s in %s %d" % (i, j, k) for i in symbols for j in ["row", "column", "block"] for k in div_nums]

satisfies = dict(((s, i, j),
["position %d, %d" % (i, j),
"%s in row %d" % (s, i),
"%s in column %d" % (s, j),
"%s in block %d" % (s, i-i%3+j//3)]) for s in symbols for i in div_nums for j in div_nums)

moves = satisfies.keys( )

class LLentry(object) : pass

# First set up the data objects and column objects

def rowhead(obj):
obj.L = obj.R = obj.M = obj

def colhead(obj):
obj.U = obj.D = obj.C = obj
obj.S = 0

def rowinsert(obj, pos):
# insert into doubly-linked list with header pos
posL = pos.L
obj.R = pos
pos.L = obj
obj.L = posL
posL.R = obj
obj.M = pos

def colinsert(obj, pos):
# as above
posU = pos.U
obj.D = pos
pos.U = obj
obj.U = posU
posU.D = obj
obj.C = pos
pos.S += 1

def rowitems(pos):
c = pos.R
while c is not pos:
yield c
c = c.R

def move(m):
cc = m.R
while cc is not m:
c = cc.C
c.R.L = c.L; c.L.R = c.R
r = c.D
while r is not c:
j = r.R
while j is not r:
j.D.U = j.U
j.U.D = j.D
j.C.S -= 1
j = j.R
r = r.D
cc = cc.R
moves_so_far.ap pend(m)

h = LLentry()
rowhead(h); colhead(h)

constraint_from _name = {}
for name in constraints:
obj = LLentry()
obj.N = name; constraint_from _name[name] = obj
rowinsert(obj, h); colhead(obj)
obj.S = 0

move_from_name = {}
for m in satisfies.keys( ):
# we must assume that each move satisfies at least one constraint
obj = LLentry()
obj.N = m; move_from_name[m] = obj
colinsert(obj, h); rowhead(obj)

ones = [(move_from_name[m], constraint_from _name[c]) for m, cc in satisfies.items () for c in cc]
for m, c in ones:
obj = LLentry()
rowinsert(obj, m)
colinsert(obj, c)

moves_so_far = []
# everything's now set up to start the search

def search():
if h.L is h:
data = dict(((i, j), s) for s, i, j in (m.N for m in moves_so_far))
yield template % tuple(data[i, j] for i in div_nums for j in div_nums)
else:
mm = min((c.S, c) for c in rowitems(h))[1].D
while mm is not mm.C:
m = mm.M
cc = m.R
while cc is not m:
c = cc.C
c.R.L = c.L
c.L.R = c.R
r = c.D
while r is not c:
j = r.R
while j is not r:
j.D.U = j.U
j.U.D = j.D
j.C.S -= 1
j = j.R
r = r.D
cc = cc.R
moves_so_far.ap pend(m)
for solution in search():
yield solution
m = moves_so_far.po p()
cc = m.L
while cc is not m:
c = cc.C
r = c.U
while r is not c:
j = r.L
while j is not r:
j.D.U = j.U.D = j
j.C.S += 1
j = j.L
r = r.U
c.R.L = c.L.R = c
cc = cc.L
mm = mm.D

rows = [
"7......19" ,
"46.19...." ,
"...6827.4" ,
".9......7" ,
"...3..4.5" ,
"..67....." ,
"..1......" ,
"2...74..." ,
"...2..3.."]

for r, row in enumerate(rows) :
for c, entry in enumerate(row):
if entry != '.':
move(move_from_ name[(entry, r, c)])

import time
t = time.time()
for i in range(10):
for solution in search():
print solution
print "Total time taken: %s seconds" % (time.time() - t)
Aug 25 '05 #1
23 2621
On Thu, Aug 25, 2005 at 04:44:24PM +0000, Mark Dickinson wrote:
I have a simple 192-line Python script that begins with the line:

dummy0 = 47

The script runs in less than 2.5 seconds. The variable dummy0 is never
referenced again, directly or indirectly, by the rest of the script.

Here's the surprise: if I remove or comment out this first line, the
script takes more than 15 seconds to run. So it appears that adding a
redundant line produces a spectacular six-fold increase in speed!

(Actually, I had to add 29 dummy lines at the beginning of the code to
get the speed increase; if any one of these lines is removed the
running time reverts to around 15 seconds again.)

Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
I get the same thing.
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
It seems to be related to the number of globals. I get the "fast"
version with 30 to 120 globals and the "slow" version with less than 30
or more than 130. It actually gets even slower for higher numbers
of globals.

Here is a snippet to adjust the number of globals

for (i) in range(100):
globals()['dummy%d' % (i)] = 1
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?


Yes, module level globals have bad lookup times compared to function
local names. If you refactor your code to pass around the data
currently at the global module level you should see times at least
as fast as the current 'fast' one.

That said, I'm very surprised that the lookup times jump around so much.
Your code does bazillions of namespace lookups, so a small difference
in lookup times is getting multiplied into some really big numbers.

-jackdied

Aug 25 '05 #2
Mark Dickinson wrote:
Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?


I see no difference in execution times, as expected. The most likely
explanation is simply that other things were going on on your system
when you ran the first test, but not the second test, resulting in the
discrepancy. In other words, the speed change had nothing to do with
your dummy lines.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
There never was a good war or a bad peace.
-- Benjamin Franklin
Aug 25 '05 #3
On 8/25/05, Mark Dickinson <di******@veriz on.net> wrote:
I have a simple 192-line Python script that begins with the line:

dummy0 = 47

The script runs in less than 2.5 seconds. The variable dummy0 is never
referenced again, directly or indirectly, by the rest of the script.

Here's the surprise: if I remove or comment out this first line, the
script takes more than 15 seconds to run. So it appears that adding a
redundant line produces a spectacular six-fold increase in speed!

(Actually, I had to add 29 dummy lines at the beginning of the code to
get the speed increase; if any one of these lines is removed the
running time reverts to around 15 seconds again.)

Questions:

<snip>

One of my own: what in the world made you think "maybe I'll add 29
dummy global variables to speed things up?"

It seems to work (>19x speedup on my machine), I'm just curious what
path you followed to get there.

And, finally, you should forward this to the python-dev list, if
somebody hasn't already. There are more people who know a ton about
python internals there.

Peace
Bill Mill
bill.mill at gmail.com
Aug 25 '05 #4
Mark Dickinson wrote:
Questions:
(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
yes. I get 7 sec vs 1 sec on my laptop.
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
I don't think there are any optimizations at play here.
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?


When I start a python interpreter and import the module the speed
difference disappears and it actually gets about two times faster.
YMMV.

I don't have a solution but I admire the problem. [1]

....
jay

[1] Google tells me that this is probably attributable to Ashleigh
Brilliant.

Aug 25 '05 #5
On 8/25/05, Erik Max Francis <ma*@alcyone.co m> wrote:
Mark Dickinson wrote:
Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?


I see no difference in execution times, as expected. The most likely
explanation is simply that other things were going on on your system
when you ran the first test, but not the second test, resulting in the
discrepancy. In other words, the speed change had nothing to do with
your dummy lines.


Unlikely; 2 people have confirmed these results already.

I did find, though, that if I remove all print statements from the
program, the dummy and non-dummy variable versions take indentical
time. Can others reproduce this?

I'm Investigating further...

Peace
Bill Mill
bill.mill at gmail.com
Aug 25 '05 #6
Bill Mill wrote:
Unlikely; 2 people have confirmed these results already.

I did find, though, that if I remove all print statements from the
program, the dummy and non-dummy variable versions take indentical
time. Can others reproduce this?


Yes, it's obviously a real effect given the other sightings. I don't
see any speed difference, myself (Pentium IV 3.0 GHz running Slackware
Linux).

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
There never was a good war or a bad peace.
-- Benjamin Franklin
Aug 25 '05 #7
On 8/25/05, Erik Max Francis <ma*@alcyone.co m> wrote:
Bill Mill wrote:
Unlikely; 2 people have confirmed these results already.

I did find, though, that if I remove all print statements from the
program, the dummy and non-dummy variable versions take indentical
time. Can others reproduce this?


Yes, it's obviously a real effect given the other sightings. I don't
see any speed difference, myself (Pentium IV 3.0 GHz running Slackware
Linux).


Pentium M 1.8 GHz Windows 2k. Here's the top of the profile results
for fast and slow on my machine (these won't look decent except in a
fixed-width font):

Slow:

6766494 function calls (6737594 primitive calls) in 45.740 CPU seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno (function)
3322320 20.539 0.000 31.152 0.000 test.py:135(<ge nerator expression>
)
27520 10.641 0.000 41.792 0.002 :0(min)
3322320 10.613 0.000 10.613 0.000 test.py:81(rowi tems)
28100/20 3.620 0.000 45.633 2.282 test.py:130(sea rch)
27545 0.113 0.000 0.113 0.000 :0(append)
27520 0.098 0.000 0.098 0.000 :0(pop)
1 0.041 0.041 45.736 45.736 test.py:36(?)

Fast:

540174 function calls (536514 primitive calls) in 3.506 CPU seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno (function)
259640 1.516 0.000 2.303 0.000 test.py:135(<ge nerator expression>
)
2280 0.791 0.000 3.094 0.001 :0(min)
259640 0.788 0.000 0.788 0.000 test.py:81(rowi tems)
2860/20 0.269 0.000 3.391 0.170 test.py:130(sea rch)
1 0.045 0.045 3.499 3.499 test.py:2(?)
3645 0.021 0.000 0.021 0.000 test.py:71(coli nsert)
3240 0.019 0.000 0.019 0.000 test.py:62(rowi nsert)
2305 0.010 0.000 0.010 0.000 :0(append)

Interestingly, the test.py:36 line, which takes 45 seconds (!!) in the
slow version, does not appear at all in the fast profile. I can't
figure out why - both printed out their data, so template must have
been called somewhere.

Peace
Bill Mill
bill.mill at gmail.com
Aug 25 '05 #8
Bill Mill wrote:

Pentium M 1.8 GHz Windows 2k. Here's the top of the profile results
for fast and slow on my machine (these won't look decent except in a
fixed-width font):
<snip profiles>
Interestingly, the test.py:36 line, which takes 45 seconds (!!) in the
slow version, does not appear at all in the fast profile. I can't
figure out why - both printed out their data, so template must have
been called somewhere.


OK, I'm getting somewhere now. When I replace:

template = (" | %s %s %s | %s %s %s | %s %s %s |\n" * 3).join(["
+-------+-------+-------+\n"] * 4)

wtih:

template = """ | %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
+-------+-------+-------+\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
+-------+-------+-------+\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
| %s %s %s | %s %s %s | %s %s %s |\n
+-------+-------+-------+\n"""

Then the non-dummy version is faster than the dummy version (very
slightly, presumably because it doesn't need to allocate 28 dummy
variables).

Peace
Bill Mill
bill.mill at gmail.com
Aug 25 '05 #9
On Thu, Aug 25, 2005 at 01:35:04PM -0400, Bill Mill wrote:
On 8/25/05, Erik Max Francis <ma*@alcyone.co m> wrote:
Mark Dickinson wrote:
Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
(2) Any possible explanations? Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?


I did find, though, that if I remove all print statements from the
program, the dummy and non-dummy variable versions take indentical
time. Can others reproduce this?

I'm Investigating further...


I'm getting similarly freakish results. I tried a little ghetto debugging
by putting a printf in dictobject.c's resize method and recompiling python.
Sadly I can't get the problem to reproduce itself with the new binary
(with or without the printf). The Ubuntu default 2.4.1 is sometimes fast,
my hand compiled one (./configure && make) is always slow.

There are some very arcane low level things going on here.

sprat:~/src/Python-2.4.1# time ./python /tmp/odd.py > /dev/null
7.876u 0.008s 0:07.91 99.4% 0+0k 0+0io 0pf+0w
sprat:~/src/Python-2.4.1# time python /tmp/odd.py > /dev/null
1.813u 0.004s 0:01.77 102.2% 0+0k 0+0io 0pf+0w

sprat:~/src/Python-2.4.1# ./python
Python 2.4.1 (#5, Aug 25 2005, 13:55:44)
[GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.

sprat:~/src/Python-2.4.1# python
Python 2.4.1 (#2, Mar 30 2005, 21:51:10)
[GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.


No-idea-ly,

-jackdied
Aug 25 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
7413
by: Jordanakins | last post by:
Usenet, I am currently working on my website and am needing to detect the connection speed of the client. I would like to do this in PHP and not use any other languages. This makes it a bit more complicated. I know it is possiable to do this in PHP but I can't think of how. All I need to figure out is if they are on dial up or not. It...
2
5346
by: shaun roe | last post by:
As a follow up to my question about STL and bitset<64> I'd like to share a quirk (bug?) about unsigned long long support and the bitset. I'm using gcc 3.2 on linux or gcc 3.3 on mac, the answer is the same. unsigned long long int f; f=3; bitset<64> g(f); cout << f << endl; cout << g << endl; f<<=32;
1
1480
by: Peter Billam | last post by:
Greetings. This problem arises in the context of the CPAN module Crypt::Tea ... It works with IE, Mozilla, Netscape, Firefox, IE and Opera, but if you check out http://www.pjb.com.au/comp/test.html using Konqueror 5.0 (at least on linux) you will find: tea_code((2048299521,595110280), (-764348263,554905533,637549562,-283747546)) returns...
8
2970
by: Armando | last post by:
Here's one that's going to leave me bald before long - I have a report (no headers or footers, small detail section) that runs normally when I open it from the database window. When I run it from code: DoCmd.OpenReport "rpt_Name", acViewPreview I get five records on the first page, and only one record on the second
4
4673
by: mux | last post by:
Hi I found out that the following piece of code throws an error. 1 #include "stdio.h" 2 3 int main() 4 { 5 int a,b; 6 a= 10;
14
11941
by: Ian Pilcher | last post by:
It's pretty common to see declarations such as: static volatile sig_atomic_t caught_signal = 0; C99 defines sig_atomic_t as a "... (possibly volatile-qualified) integer type of an object that can be accessed as an atomic entity, even in the presence of asynchronous interrupts." Does this mean that the use of "volatile" in the above...
4
1287
by: Galen Somerville | last post by:
I have a real timing problem. This involves getting data from a USB device via an ActiveX dll and drawing trace lines on the screen via a UserControl. The USB device sends six sets of pixel info so twelve lines have to be drawn on the screen. The first line drawn blacks out a previously drawn line. The second line drawn is from the current...
9
2033
by: howarthc | last post by:
OK - here is what I want to do - but I am lost how to do it. I have a variable $mystring = "one two three four five six seven eight nine" This variable $mystring can be 4 words long or it could be 50 words long it is totally variable. What I want to do is take the number of words in $mystring, divide by three and insert a <brat the end...
19
2440
by: =?Utf-8?B?QnJpYW4gQ29vaw==?= | last post by:
This is an example of the data; 2007/07/27 11:00:03 ARES_INDICATION 010.050.016.002 404.2.01 (6511) RX 74 bytes 2007/07/27 11:00:03 65 11 26 02 BC 6C AA 20 76 93 51 53 50 76 13 48 2007/07/27 11:00:03 52 00 52 02 02 C7 83 D7 07 07 1B 0B 00 00 00 00 2007/07/27 11:00:03 28 0A 06 06 06 06 06 06 06 06 06 06 06 0A 06 06
2
1583
by: mde | last post by:
I'm wondering if there is a "best practice" for *creating doctests in methods* that reduces clutter/duplication of dummy instantiations. In the following code there are five (labeled 1-5) possible places to put a dummy mock instantiation. I have the impression that creating the dummies in every method is the common practice (1-3), but I'm...
0
7478
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7410
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7668
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7923
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7437
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7773
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5343
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
1
1025
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
722
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.