445,813 Members | 1,252 Online Need help? Post your question and get tips & solutions from a community of 445,813 IT Pros & Developers. It's quick & easy.

# Why is this loop heavy code so slow in Python? Possible Project Euler spoilers

 P: n/a I'm pretty new to python, but am very happy with it. As well as using it at work I've been using it to solve various puzzles on the Project Euler site - http://projecteuler.net. So far it has not let me down, but it has proved surprisingly slow on one puzzle. The puzzle is: p is the perimeter of a right angle triangle with integral length sides, {a,b,c}. which value of p < 1000, is the number of solutions {a,b,c} maximised? Here's my python code: #!/usr/local/bin/python solutions =  * 1001 p = 0 for a in xrange(1, 1000): for b in xrange(1, 1000 - a): for c in xrange(1, 1000 - a - b): p = a + b + c if p < 1000: if a ** 2 + b ** 2 == c ** 2: solutions[p] += 1 max = 0 maxIndex = 0 index = 0 for solution in solutions: if solution max: max = solution maxIndex = index index += 1 print maxIndex It takes 2 minutes and twelve seconds on a 2.4GHz Core2Duo MacBook Pro. Surprised at how slow it was I implemented the same algorithm in C: #include #include int main() { int* solutions = calloc(1000, sizeof(int)); int p; for(int a = 1; a < 1000; ++a) { for(int b = 1; b < 1000 - a; ++b) { for(int c = 1; c < 1000 - a - b; ++c) { p = a + b + c; if(p < 1000) { if(a * a + b * b == c * c) { solutions[p] += 1; } } } } } int max = 0; int maxIndex = 0; for(int i = 0; i < 1000; ++i) { if(solutions[i] max) { max = solutions[i]; maxIndex = i; } } printf("%d\n", maxIndex); return 0; } gcc -o 39 -std=c99 -O3 39.c The resulting executable takes 0.24 seconds to run. I'm not expecting a scripting language to run faster than native code, but I was surprised at how much slower it was in this case. Any ideas as to what is causing python so much trouble in the above code? Sep 2 '07 #1
25 Replies

 P: n/a On Sep 2, 12:51 pm, jwrweather...@gmail.com wrote: I'm pretty new to python, but am very happy with it. As well as using it at work I've been using it to solve various puzzles on the Project Euler site -http://projecteuler.net. So far it has not let me down, but it has proved surprisingly slow on one puzzle. The puzzle is: p is the perimeter of a right angle triangle with integral length sides, {a,b,c}. which value of p < 1000, is the number of solutions {a,b,c} maximised? Here's my python code: #!/usr/local/bin/python solutions =  * 1001 p = 0 for a in xrange(1, 1000): for b in xrange(1, 1000 - a): for c in xrange(1, 1000 - a - b): p = a + b + c if p < 1000: if a ** 2 + b ** 2 == c ** 2: solutions[p] += 1 max = 0 maxIndex = 0 index = 0 for solution in solutions: if solution max: max = solution maxIndex = index index += 1 print maxIndex It takes 2 minutes and twelve seconds on a 2.4GHz Core2Duo MacBook Pro. Surprised at how slow it was I implemented the same algorithm in C: #include #include int main() { int* solutions = calloc(1000, sizeof(int)); int p; for(int a = 1; a < 1000; ++a) { for(int b = 1; b < 1000 - a; ++b) { for(int c = 1; c < 1000 - a - b; ++c) { p = a + b + c; if(p < 1000) { if(a * a + b * b == c * c) { solutions[p] += 1; } } } } } int max = 0; int maxIndex = 0; for(int i = 0; i < 1000; ++i) { if(solutions[i] max) { max = solutions[i]; maxIndex = i; } } printf("%d\n", maxIndex); return 0; } gcc -o 39 -std=c99 -O3 39.c The resulting executable takes 0.24 seconds to run. I'm not expecting a scripting language to run faster than native code, but I was surprised at how much slower it was in this case. Any ideas as to what is causing python so much trouble in the above code? from math import sqrt solutions =  * 1001 p = 0 for a in xrange(1, 1000): a2 = a*a for b in xrange(1, 1000 - a): c = sqrt(a2 + b*b) if c == int(c) and a+b+c < 1000: solutions[a+b+int(c)] += 1 max = 0 maxIndex = 0 index = 0 for solution in solutions: if solution max: max = solution maxIndex = index index += 1 print maxIndex -- Arnaud Sep 2 '07 #2

 P: n/a On Sep 2, 7:20 am, Arnaud Delobelle #include int main() { int* solutions = calloc(1000, sizeof(int)); int p; for(int a = 1; a < 1000; ++a) { for(int b = 1; b < 1000 - a; ++b) { for(int c = 1; c < 1000 - a - b; ++c) { p = a + b + c; if(p < 1000) { if(a * a + b * b == c * c) { solutions[p] += 1; } } } } } int max = 0; int maxIndex = 0; for(int i = 0; i < 1000; ++i) { if(solutions[i] max) { max = solutions[i]; maxIndex = i; } } printf("%d\n", maxIndex); return 0; } gcc -o 39 -std=c99 -O3 39.c The resulting executable takes 0.24 seconds to run. I'm not expecting a scripting language to run faster than native code, but I was surprised at how much slower it was in this case. Any ideas as to what is causing python so much trouble in the above code? from math import sqrt solutions =  * 1001 p = 0 for a in xrange(1, 1000): a2 = a*a for b in xrange(1, 1000 - a): c = sqrt(a2 + b*b) if c == int(c) and a+b+c < 1000: solutions[a+b+int(c)] += 1 max = 0 maxIndex = 0 index = 0 for solution in solutions: if solution max: max = solution maxIndex = index index += 1 print maxIndex -- Arnaud For the curious: O O + P A A + P ======= ======= ======= ======= 2:22.56 0:25.65 0:00.75 0:00.20 O = Original Implementation P = Psyco (psyco.full()) A = Arnaud's Revised Implementation Sep 2 '07 #3

 P: n/a jw***********@gmail.com wrote in news:11*********************@r34g2000hsd.googlegro ups.com: The puzzle is: p is the perimeter of a right angle triangle with integral length sides, {a,b,c}. which value of p < 1000, is the number of solutions {a,b,c} maximised? Here's my python code: #!/usr/local/bin/python solutions =  * 1001 p = 0 for a in xrange(1, 1000): for b in xrange(1, 1000 - a): for c in xrange(1, 1000 - a - b): p = a + b + c if p < 1000: if a ** 2 + b ** 2 == c ** 2: solutions[p] += 1 Once p >= 1000, it ain't goin' back. If you break out of the innermost loop here after that happens, you'll save a bunch of time. max = 0 maxIndex = 0 index = 0 for solution in solutions: if solution max: max = solution maxIndex = index index += 1 print maxIndex It takes 2 minutes and twelve seconds on a 2.4GHz Core2Duo MacBook Pro. [...] The resulting executable takes 0.24 seconds to run. I'm not expecting a scripting language to run faster than native code, but I was surprised at how much slower it was in this case. Any ideas as to what is causing python so much trouble in the above code? Sep 2 '07 #4

 P: n/a [snip code] Thanks for that. I realise that improving the algorithm will speed things up. I wanted to know why my less than perfect algorithm was so much slower in python than exactly the same algorithm in C. Even when turning off gcc's optimiser with the -O0 flag, the C version is still 100 times quicker. Sep 2 '07 #5

 P: n/a On Sep 2, 9:45 am, jwrweather...@gmail.com wrote: [snip code] Thanks for that. I realise that improving the algorithm will speed things up. I wanted to know why my less than perfect algorithm was so much slower in python than exactly the same algorithm in C. Even when turning off gcc's optimiser with the -O0 flag, the C version is still 100 times quicker. Well, for one thing, you're creating half a million xrange objects in the course of the search. All the C code has to do is increment a few integers. Mark Sep 2 '07 #6

 P: n/a On Sep 2, 9:45 pm, jwrweather...@gmail.com wrote: [snip code] Thanks for that. I realise that improving the algorithm will speed things up. I wanted to know why my less than perfect algorithm was so much slower in python than exactly the same algorithm in C. Even when turning off gcc's optimiser with the -O0 flag, the C version is still 100 times quicker.- Hide quoted text - - Show quoted text - Maybe Python is the slowest programming language in the world. So there is a joke: some python hater said that "python" can only crawl rather than run. :) Python is slow because: (1) dynamic binding (2) it is a interpretation language For example, in C code, an interger expression "a+b" will directly generate an assembly code "add" for x86 processors. A python interpreter, on the other side, need detect the type of a and b first, then choose the right "+" operation for them and use a evaluation stack to get the result. Psyco is a JIT compiler with just in time specialization which can somehow solve these two problems Sep 2 '07 #7

 P: n/a On Sep 2, 12:51 pm, jwrweather...@gmail.com wrote: [...] The resulting executable takes 0.24 seconds to run. I'm not expecting a scripting language to run faster than native code, but I was surprised at how much slower it was in this case. Any ideas as to what is causing python so much trouble in the above code? Sorry I didn't answer your question at all in my previous post (although my modified method gets the answer in about 6s on a measly PB G4 1GHz :). Your little code snippet is just about the worst bit of code for python to be compared to C. Three loops, only integer arithmetic: that's not going to make Python shine! Nevertheless as you optimise your C snippet (-O3), there are probably a few things that the compiler does for you: * as in the inner loop, a*a + b*b always remain the same, it is probably stored in a register once and for all * in the second loop, a*a remains the same so it might be calculated once and for all as well. It gives this: M = 1000 solutions =  * M def f1(): "Your original implementation" for a in xrange(1, M): for b in xrange(1, M - a): for c in xrange(1, M - a - b): if a**2 + b**2 == c**2: solutions[a+b+c] += 1 def f2(): "a*a + b*b precalculated" for a in xrange(1, M): a2 = a*a for b in xrange(1, M - a): s = a2 + b*b for c in xrange(1, M - a - b): if s == c*c: solutions[a+b+c] += 1 It doesn't make it as quick as C of course, but more than halves the execution time. -- Arnaud Sep 2 '07 #8

 P: n/a In article <11*********************@g4g2000hsf.googlegroups.c om>, Mark Dickinson On Sep 2, 9:45 am, jwrweather...@gmail.com wrote: >[snip code]Thanks for that. I realise that improving the algorithm will speedthings up. I wanted to know why my less than perfect algorithm was somuch slower in python than exactly the same algorithm in C. Even whenturning off gcc's optimiser with the -O0 flag, the C version is still 100 times quicker. Well, for one thing, you're creating half a million xrange objects inthe course of the search. All the C code hasto do is increment a few integers.Mark Right: Mr. Dickinson's original question is entirely legitimate, and it's not adequate to respond, as some follow-ups did, with ways to improve the Python-coded algorithm. The correct answer, which I want to reinforce, is that the exhibited Python and C versions are NOT "exactly the same algorithm", at least not without more quali- fication. Part of Python expertise is to recognize that creation of xrange objects, mentioned above, is far from free. Also, -O3 gives C the opportunity, also remarked in a follow-up, to factor calculations outside their loops. Sep 2 '07 #9

 P: n/a Ivan Wang a écrit : On Sep 2, 9:45 pm, jwrweather...@gmail.com wrote: >>[snip code]Thanks for that. I realise that improving the algorithm will speedthings up. I wanted to know why my less than perfect algorithm was somuch slower in python than exactly the same algorithm in C. Even whenturning off gcc's optimiser with the -O0 flag, the C version is still >>>100 times quicker.- Hide quoted text - - Show quoted text - Maybe Python is the slowest programming language in the world. So there is a joke: some python hater said that "python" can only crawl rather than run. :) Python is slow because: (1) dynamic binding Yes. (2) it is a interpretation language Not quite. It's compiled to byte-code - just like Java (would you call Java an 'interpreted language' ?) Sep 2 '07 #10

 P: n/a al***@mac.com (Alex Martelli) writes: ...which suggests that creating an xrange object is _cheaper_ than indexing a list... Why not re-use the xrange instead of keeping a list around? Python 2.4.4 (#1, Oct 23 2006, 13:58:00) >>a = xrange(3)print list(a) [0, 1, 2] >>print list(a) [0, 1, 2] Sep 2 '07 #11

 P: n/a Paul Rubin >a = xrange(3) >>print list(a) [0, 1, 2] >>print list(a) [0, 1, 2] Reusing xranges is exactly what my code was doing -- at each for loop you need an xrange(1, k) for a different value of k, which is why you need some container to keep them around (and a list of xrange objects is the simplest applicable container). Your suggestion doesn't appear to make any sense in the context of the optimization problem at hand -- what list(...) calls are you thinking of?! Please indicate how your suggestion would apply in the context of: def f3(M=M, solutions=solutions): "pull out all the stops" xrs = [xrange(1, k) for k in xrange(0, M+1)] for a in xrs[M]: a2 = a*a for b in xrs[M-a]: s = a2 + b*b for c in xrs[M-a-b]: if s == c*c: solutions[a+b+c] += 1 Alex Sep 2 '07 #12

 P: n/a al***@mac.com (Alex Martelli) writes: Reusing xranges is exactly what my code was doing Oh cool, I missed that, I was just going by the text description. Looking at the code, yes, it's re-using the xranges. Thanks. Sep 2 '07 #13

 P: n/a On Sep 2, 12:55 pm, al...@mac.com (Alex Martelli) wrote: Mark Dickinson

 P: n/a >(2) it is a interpretation language Not quite. It's compiled to byte-code - just like Java (would you call Java an 'interpreted language' ?) Python is not implemented like Java. In Java (at least in HotSpot), the byte code is further compiled to machine code before execution; in Python, the byte code is interpreted. Whether this makes Python an interpreter or a compiler, I don't know. Regards, Martin Sep 2 '07 #15

 P: n/a Mark Dickinson

 P: n/a Martin v. Löwis wrote: >>(2) it is a interpretation language Not quite. It's compiled to byte-code - just like Java (would you callJava an 'interpreted language' ?) Python is not implemented like Java. In Java (at least in HotSpot), the byte code is further compiled to machine code before execution; in Python, the byte code is interpreted. OK, good. Naive question now comming to mind: Why doesn't Python do the latter as well? /W Sep 2 '07 #17

 P: n/a On Sun, 02 Sep 2007 21:00:45 +0200, Wildemar Wildenburger wrote: Martin v. LÃ¶wis wrote: >>>(2) it is a interpretation languageNot quite. It's compiled to byte-code - just like Java (would you callJava an 'interpreted language' ?) Python is not implemented like Java. In Java (at least in HotSpot), thebyte code is further compiled to machine code before execution; inPython, the byte code is interpreted. OK, good. Naive question now comming to mind: Why doesn't Python do the latter as well? /W There is no single version of Java, and the reference interpretation runs on a virtual machine just like Python. Today there are virtual machine implementations of Java, native compilers, and Just In Time compilers for Java, including HotSpot mentioned by Martin, but Java the language was originally defined to run on a VM. See, for example, here: http://schmidt.devlib.org/java/compilers.html There are costs to native compilation, the biggest one of course being the initial investment in time and effort in creating the native compiler. Sun and other commercial companies have invested a lot of money in Java, and I don't think the money invested in Python has come even close. Consequently, any work into JIT compilation for Java has been done by volunteers. Nevertheless, we have Psyco, which is a JIT compiler of sorts; work also continues on PyPy (Python implemented in Python) which, it is hoped, will lead to faster Python implementations. Part of the reason that native compilation for Python is hard is that Python's dynamic object model makes it very difficult to apply the same sorts of compiler optimizations that C and Java allow. Comparatively little of the work done by Python can be safely pushed from run time to compile time, so it is unlikely that the average Python application will ever run as fast as the equivalent C code -- although that ignores the question of what "the equivalent C code" could possibly mean. (If the C code includes all the dynamic high-level features of Python, it too would surely run as slowly as Python, and if it didn't, it can hardly be said to be equivalent.) Nevertheless, by using something like Psyco parts of your Python code can run at virtually the same speed as C. A big question mark in my mind is Lisp, which according to aficionados is just as dynamic as Python, but has native compilers that generate code running as fast as highly optimized C. I'm not qualified to judge whether the lessons learnt from Lisp can be applied to Python, but in any case Lisp is an old, old language -- only Fortran is older. The amount of development effort and money put into Lisp dwarfs that put into Python by possibly a hundred or more. So... if you'd like to see Python run as fast as C or Lisp, and you have a few tens of millions of dollars spare to invest in development, I think the Python Software Foundation would love to hear from you. -- Steven. Sep 2 '07 #18

 P: n/a Wildemar Wildenburger schrieb: Martin v. Löwis wrote: >>>(2) it is a interpretation languageNot quite. It's compiled to byte-code - just like Java (would you callJava an 'interpreted language' ?) Python is not implemented like Java. In Java (at least in HotSpot),the byte code is further compiled to machine code before execution;in Python, the byte code is interpreted. OK, good. Naive question now comming to mind: Why doesn't Python do the latter as well? because of the dynamic nature of it. Java is statically typed, so the JIT can heavily optimize. OTH psyco IS a JIT-compiler to optimize certain calculations which are mostly of a numerical nature. But this can't be done to the extend it is possible in java. Diez Sep 2 '07 #19

 P: n/a On 9/2/07, Diez B. Roggisch >(2) it is a interpretation languageNot quite. It's compiled to byte-code - just like Java (would you callJava an 'interpreted language' ?) Python is not implemented like Java. In Java (at least in HotSpot), the byte code is further compiled to machine code before execution; in Python, the byte code is interpreted. OK, good. Naive question now comming to mind: Why doesn't Python do the latter as well? because of the dynamic nature of it. Java is statically typed, so the JIT can heavily optimize. OTH psyco IS a JIT-compiler to optimize certain calculations which are mostly of a numerical nature. But this can't be done to the extend it is possible in java. Original code: 3 min, 9 seconds Original code with psyco: 30.28 seconds Original code, compiled with Pyrex: 1min 39 seconds (moves the for loops into C) Same, with a,b,c declared with cdef int: 20 seconds (uses c pow() instead of going through Python). Same, with the for..xrange loops rewritten to use C integer loops: 13 seconds (saves xrange use, but since the python loop was already gone, not too much savings). With a small amount of work, you should be able to implement the C algorithm in Pyrex (or even just use the C algorithm, in a wrapper that converts the return value to an int) and get the same speed as the C version + a constant marshalling factor. Adding pysco took all of 20 seconds (half that because I needed to move the module scope code into a function), and re-writing with pyrex took a minute or two. So, as a demonstration of numeric optmization, both of them can give you quite good rewards with minimal effort. Sep 3 '07 #20

 P: n/a On 2007-09-02, Martin v. Löwis >(2) it is a interpretation language Not quite. It's compiled to byte-code - just like Java (wouldyou call Java an 'interpreted language' ?) Python is not implemented like Java. In Java (at least in HotSpot), the byte code is further compiled to machine code before execution; in Python, the byte code is interpreted. Whether this makes Python an interpreter or a compiler, I don't know. I'd call it an integrated compiler and virtual machine--a classic combination. -- Neil Cerutti Sep 3 '07 #21

 P: n/a Steven D'Aprano

 P: n/a Thanks for all the answers to my question. I think what I need to take away from this is that xrange is an object - I thought it was just some loop construct, and that maths is slow in python - so avoid pathological looping.I remember the first time I tried Objective-C on OS X I used the NSNumber class for arithmetic - the results that time were pretty awful too! Sep 3 '07 #23

 P: n/a On 3 Sep, 15:39, jwrweather...@gmail.com wrote: Thanks for all the answers to my question. I think what I need to take away from this is that xrange is an object Indeed, but using xrange can be faster than converting your "for" loops to "while" loops plus a counter; I converted your code to use the latter arrangement and it was noticeably slower. Perhaps the repeated invocation of each xrange object's next method is less expensive than repeatedly obtaining new incremented int objects and testing them against other int objects. - I thought it was just some loop construct, and that maths is slow in python - so avoid pathological looping. You'd be wise to avoid doing unnecessary work deep within nested loops in any programming language. Sadly, it's not possible for the compiler to work out that some calculations can be lifted out of the innermost loops in Python, since the various guarantees that make such operations possible in other languages are not offered by Python. I remember the first time I tried Objective-C on OS X I used the NSNumber class for arithmetic - the results that time were pretty awful too! Obviously, it'd be a fairer test if you used comparable numeric types in both implementations, but a more capable numeric type would be overkill for the C implementation of this program, and a less capable numeric type doesn't exist for Python unless you're willing to use a dialect such as Pyrex (as others have shown). Paul Sep 3 '07 #24

 P: n/a Martin v. Löwis a écrit : >>(2) it is a interpretation language Not quite. It's compiled to byte-code - just like Java (would you callJava an 'interpreted language' ?) Python is not implemented like Java. In Java (at least in HotSpot), the byte code is further compiled to machine code before execution; This is a VM-specific feature. in Python, the byte code is interpreted. Idem. Whether this makes Python an interpreter or a compiler, I don't know. This is an old troll^Mdiscussion !-) Now IANAL, but AFAIK, the byte-code compilation stage can make a great difference performances-wide, and for a same language, a byte-compiled implementation is likely to be faster than a pure-interpreted one, at least because of the saving on parsing time (please someone correct me if I'm wrong) ... Sep 3 '07 #25

 P: n/a On Sep 2, 12:51 pm, jwrweather...@gmail.com wrote: ... right-angled triangle puzzle solver max = 0 maxIndex = 0 index = 0 for solution in solutions: if solution max: max = solution maxIndex = index index += 1 print maxIndex Not that it solves your slowness problem, or has anything to do with the question you asked :), but you can replace this last segment of your code with: print solutions.index(max(solutions)) -- Paul Hankin Sep 19 '07 #26

### This discussion thread is closed

Replies have been disabled for this discussion. 