I posted this few weeks ago (remember the C Sharp thread?) but it went
unnoticed on the large mass of posts, so let me retry. Here I get Python+
Psyco twice as fast as optimized C, so I would like to now if something
is wrong on my old laptop and if anybody can reproduce my results.
Here are I my numbers for calling the error function a million times
(Python 2.3, Psyco 1.0, Red Hat Linux 7.3, Pentium II 366 MHz):
$ time p23 erf.py
real 0m0.614s
user 0m0.551s
sys 0m0.029s
This is twice as fast as optimized C:
$ gcc erf.c -lm -o3
$ time ./a.out
real 0m1.125s
user 0m1.086s
sys 0m0.006s
Here is the situation for pure Python
$time p23 erf.jy
real 0m25.761s
user 0m25.012s
sys 0m0.049s
and, just for fun, here is Jython performance:
$ time jython erf.jy
real 0m42.979s
user 0m41.430s
sys 0m0.361s
The source code follows (copied from Alex Martelli's post):
----------------------------------------------------------------------
$ cat erf.py
import math
import psyco
psyco.full()
def erfc(x):
exp = math.exp
p = 0.3275911
a1 = 0.254829592
a2 = -0.284496736
a3 = 1.421413741
a4 = -1.453152027
a5 = 1.061405429
t = 1.0 / (1.0 + p*x)
erfcx = ( (a1 + (a2 + (a3 +
(a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x)
return erfcx
def main():
erg = 0.0
for i in xrange(1000000):
erg += erfc(0.456)
if __name__ == '__main__':
main()
--------------------------------------------------------------------------
# python/jython version = same without "import psyco; psyco.full()"
--------------------------------------------------------------------------
$cat erf.c
#include <stdio.h>
#include <math.h>
double erfc( double x )
{
double p, a1, a2, a3, a4, a5;
double t, erfcx;
p = 0.3275911;
a1 = 0.254829592;
a2 = -0.284496736;
a3 = 1.421413741;
a4 = -1.453152027;
a5 = 1.061405429;
t = 1.0 / (1.0 + p*x);
erfcx = ( (a1 + (a2 + (a3 +
(a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x);
return erfcx;
}
int main()
{
double erg=0.0;
int i;
for(i=0; i<1000000; i++)
{
erg = erg + erfc(0.456);
}
return 0;
}
Michele Simionato, Ph. D. Mi**************@libero.it http://www.phyast.pitt.edu/~micheles
--- Currently looking for a job --- 18 4832
Michele Simionato wrote: I posted this few weeks ago (remember the C Sharp thread?) but it went unnoticed on the large mass of posts, so let me retry. Here I get Python+ Psyco twice as fast as optimized C, so I would like to now if something is wrong on my old laptop and if anybody can reproduce my results.
I can. :-)
I had to increase the loop counter by a factor of 10 because it
ran too fast on my machine (celeron 533 mhz), and added a print statement
of the accumulated sum (erg). These are my results:
[irmen@atlantis]$ gcc -O3 -march=pentium2 -mcpu=pentium2 -lm erf.c
[irmen@atlantis]$ time ./a.out
5190039.338694
4.11user 0.00system 0:04.11elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (103major+13minor)pagefaults 0swaps
[irmen@atlantis]$ time python2.3 erf.py
5190039.33869
2.91user 0.01system 0:02.92elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (544major+380minor)pagefaults 0swaps
This is with gcc 3.2.2 on Mandrake 9.1.
While Python + Psyco is not twice as fast as compiled & optimized C,
it's still faster by almost 30% on my system, which is still great!!
--Irmen
Michele Simionato wrote: I posted this few weeks ago (remember the C Sharp thread?) but it went unnoticed on the large mass of posts, so let me retry. Here I get Python+ Psyco twice as fast as optimized C, so I would like to now if something is wrong on my old laptop and if anybody can reproduce my results.
I can. :-)
I had to increase the loop counter by a factor of 10 because it
ran too fast on my machine (celeron 533 mhz), and added a print statement
of the accumulated sum (erg). These are my results:
[irmen@atlantis]$ gcc -O3 -march=pentium2 -mcpu=pentium2 -lm erf.c
[irmen@atlantis]$ time ./a.out
5190039.338694
4.11user 0.00system 0:04.11elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (103major+13minor)pagefaults 0swaps
[irmen@atlantis]$ time python2.3 erf.py
5190039.33869
2.91user 0.01system 0:02.92elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (544major+380minor)pagefaults 0swaps
This is with gcc 3.2.2 on Mandrake 9.1.
While Python + Psyco is not twice as fast as compiled & optimized C,
it's still faster by almost 30% on my system, which is still great!!
--Irmen
Michele Simionato wrote: $ time p23 erf.py real 0m0.614s user 0m0.551s sys 0m0.029s
This is twice as fast as optimized C:
$ gcc erf.c -lm -o3 $ time ./a.out real 0m1.125s user 0m1.086s sys 0m0.006s
Here is the situation for pure Python
$time p23 erf.jy real 0m25.761s user 0m25.012s sys 0m0.049s
and, just for fun, here is Jython performance:
$ time jython erf.jy real 0m42.979s user 0m41.430s sys 0m0.361s
Mmm...on my machine C is faster. What version of GCC do you have? I think
2.9x, right?
These are my timings (Debian GNU Linux Unstable, Duron 1300, Python2.3,
Psyco CVS, GCC 3.3.2, Java 1.4.1):
$ time python erf.py
real 0m0.251s
user 0m0.207s
sys 0m0.012s
$ gcc erf.c -lm -O3
$ time ./a.out
real 0m0.162s
user 0m0.157s
sys 0m0.001s
Notice that C is faster than Psyco + Python2.3 on my machine (about 65% of
speedup)
Without Psyco Python2.3 tooks about 6 seconds
$ time python erf.jy
real 0m6.177s
user 0m6.040s
sys 0m0.010s
And Jython is definitely slower :)
$ time jython erf.jy
real 0m10.423s
user 0m9.506s
sys 0m0.197s
--
Lawrence "Rhymes" Oluyede http://loluyede.blogspot.com rh****@NOSPAMmyself.com
Michele Simionato wrote: $ time p23 erf.py real 0m0.614s user 0m0.551s sys 0m0.029s
This is twice as fast as optimized C:
$ gcc erf.c -lm -o3 $ time ./a.out real 0m1.125s user 0m1.086s sys 0m0.006s
Here is the situation for pure Python
$time p23 erf.jy real 0m25.761s user 0m25.012s sys 0m0.049s
and, just for fun, here is Jython performance:
$ time jython erf.jy real 0m42.979s user 0m41.430s sys 0m0.361s
Mmm...on my machine C is faster. What version of GCC do you have? I think
2.9x, right?
These are my timings (Debian GNU Linux Unstable, Duron 1300, Python2.3,
Psyco CVS, GCC 3.3.2, Java 1.4.1):
$ time python erf.py
real 0m0.251s
user 0m0.207s
sys 0m0.012s
$ gcc erf.c -lm -O3
$ time ./a.out
real 0m0.162s
user 0m0.157s
sys 0m0.001s
Notice that C is faster than Psyco + Python2.3 on my machine (about 65% of
speedup)
Without Psyco Python2.3 tooks about 6 seconds
$ time python erf.jy
real 0m6.177s
user 0m6.040s
sys 0m0.010s
And Jython is definitely slower :)
$ time jython erf.jy
real 0m10.423s
user 0m9.506s
sys 0m0.197s
--
Lawrence "Rhymes" Oluyede http://loluyede.blogspot.com rh****@NOSPAMmyself.com
Michele Simionato wrote: I posted this few weeks ago (remember the C Sharp thread?) but it went unnoticed on the large mass of posts, so let me retry. Here I get Python+ Psyco twice as fast as optimized C
$ gcc erf.c -lm -O3
try a 3.x series gcc with the appropriate -march=pentium3
You'll be pleasently surprised. I can't understand how
the sudden improvment of gcc code generation lately hasn't
been hyped more? If you want to try different machines
then http://www.pixelbeat.org/scripts/gcccpuopt will give
you the appropriate machine specific gcc options to use.
Note also -ffast-math might help a lot in this application?
cheers,
Pádraig.
Michele Simionato wrote: I posted this few weeks ago (remember the C Sharp thread?) but it went unnoticed on the large mass of posts, so let me retry. Here I get Python+ Psyco twice as fast as optimized C
$ gcc erf.c -lm -O3
try a 3.x series gcc with the appropriate -march=pentium3
You'll be pleasently surprised. I can't understand how
the sudden improvment of gcc code generation lately hasn't
been hyped more? If you want to try different machines
then http://www.pixelbeat.org/scripts/gcccpuopt will give
you the appropriate machine specific gcc options to use.
Note also -ffast-math might help a lot in this application?
cheers,
Pádraig.
Irmen de Jong <irmen@-NOSPAM-REMOVETHIS-xs4all.nl> writes: P@draigBrady.com wrote:
try a 3.x series gcc with the appropriate -march=pentium3 You'll be pleasently surprised.
In my other reply I mentioned that I still get a Python+Psyco advantage of 30% over a gcc 3.2.2 compiled version. My gcc is doing a lot better than Michele's reported 50% difference, but Python+Psyco still wins :-)
So, the interesting part is: why?
John
Irmen de Jong <irmen@-NOSPAM-REMOVETHIS-xs4all.nl> writes: P@draigBrady.com wrote:
try a 3.x series gcc with the appropriate -march=pentium3 You'll be pleasently surprised.
In my other reply I mentioned that I still get a Python+Psyco advantage of 30% over a gcc 3.2.2 compiled version. My gcc is doing a lot better than Michele's reported 50% difference, but Python+Psyco still wins :-)
So, the interesting part is: why?
John
On Sun, 24 Aug 2003 00:31:15 +0100, John J. Lee wrote: Irmen de Jong <irmen@-NOSPAM-REMOVETHIS-xs4all.nl> writes:
P@draigBrady.com wrote:
.... but Python+Psyco still wins :-)
So, the interesting part is: why?
John
My suspicion is that when psyco looks at erfc, it
finds that nothing changes and so replaces the
function call with the resulting number (am i right? it's the
same each time?). This is what a "specializing compiler"
would do, me thinks. So, try using a different number
with each call.
Simon.
On Sun, 24 Aug 2003 00:31:15 +0100, John J. Lee wrote: Irmen de Jong <irmen@-NOSPAM-REMOVETHIS-xs4all.nl> writes:
P@draigBrady.com wrote:
.... but Python+Psyco still wins :-)
So, the interesting part is: why?
John
My suspicion is that when psyco looks at erfc, it
finds that nothing changes and so replaces the
function call with the resulting number (am i right? it's the
same each time?). This is what a "specializing compiler"
would do, me thinks. So, try using a different number
with each call.
Simon.
Lawrence Oluyede wrote: P@draigBrady.com wrote:
If you want to try different machines then http://www.pixelbeat.org/scripts/gcccpuopt will give you the appropriate machine specific gcc options to use.
Very cool script, thanks :) Anyway it didn't change so much with erf.c erfCPU is compiled with the flags suggested by gcccpuopt script:
$ gcccpuopt -march=athlon-xp -mfpmath=sse -msse -mmmx -m3dnow
You still need some -O optimization flags. The -m options just let gcc
generate some nice instructions specific to your Athlon CPU.
Also, I don't think that script is all that useful because at least some
(if not all) of those -m options are already implied by -march=athlon-xp
(I don't recall which ones off the top of my head but I'll find a
reference for anyone interested... you can also find out by looking at
the gcc command line option parsing code).
Anyone who wants some other good ideas for the best flags on their
machine check out ccbench: http://www.rocklinux.net/packages/ccbench.html
The problem here of course is that not all applications behave like the
benchmarks :(
Van Gale
Van Gale wrote: You still need some -O optimization flags. The -m options just let gcc generate some nice instructions specific to your Athlon CPU.
I didn't mention but I also used -O3 flag. I don't know why but on my
machine C code is faster than psyco code in this test
--
Lawrence "Rhymes" Oluyede http://loluyede.blogspot.com rh****@NOSPAMmyself.com
Van Gale <ne**@exultants.org> wrote in message news:<XK*****************@newssvr27.news.prodigy.c om>... Michele Simionato wrote: I posted this few weeks ago (remember the C Sharp thread?) but it went unnoticed on the large mass of posts, so let me retry. Here I get Python+ Psyco twice as fast as optimized C, so I would like to now if something is wrong on my old laptop and if anybody can reproduce my results. Here are I my numbers for calling the error function a million times (Python 2.3, Psyco 1.0, Red Hat Linux 7.3, Pentium II 366 MHz):
$ gcc erf.c -lm -o3
Did you really use "-o3" instead of "-O3"? The lowercase -o3 will produce object code file named "3" instead of doing optimization.
Yes, I used -O3, this was a misprint in the e-email. The compiler was
gcc 2.96.
Michele Simionato, Ph. D. Mi**************@libero.it http://www.phyast.pitt.edu/~micheles
--- Currently looking for a job ---
I finally came to the conclusion that the exceeding good performance
of Psyco was due to the fact that the function was called a million
times with the *same* argument. Evidently Psyco is smart enough to
notice that. Changing the argument at each call
(erfc(0.456) -> i/1000000.0) slows down Python+Psyco at 1/4 of C speed.
Psyco improves Python performance by an order of magnitude, but still it
is not enough :-(
I was too optimistic!
Here I my numbers for Python 2.3, Psyco 1.0, Red Hat Linux 7.3,
Pentium II 366 MHz:
$ time p23 erf.py
real 0m3.245s
user 0m3.164s
sys 0m0.037s
This is more than four times slower than optimized C:
$ gcc erf.c -lm -O3
$ time ./a.out
real 0m0.742s
user 0m0.725s
sys 0m0.002s
Here is the situation for pure Python
$time p23 erf.jy
real 0m27.470s
user 0m27.162s
sys 0m0.023s
and, just for fun, here is Jython performance:
$ time jython erf.jy
real 0m44.395s
user 0m42.602s
sys 0m0.389s
----------------------------------------------------------------------
$ cat erf.py
import math
import psyco
psyco.full()
def erfc(x):
exp = math.exp
p = 0.3275911
a1 = 0.254829592
a2 = -0.284496736
a3 = 1.421413741
a4 = -1.453152027
a5 = 1.061405429
t = 1.0 / (1.0 + p*x)
erfcx = ( (a1 + (a2 + (a3 +
(a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x)
return erfcx
def main():
erg = 0.0
for i in xrange(1000000):
erg += erfc(i/1000000.0)
if __name__ == '__main__':
main()
--------------------------------------------------------------------------
# python/jython version = same without "import psyco; psyco.full()"
--------------------------------------------------------------------------
$cat erf.c
#include <stdio.h>
#include <math.h>
double erfc( double x )
{
double p, a1, a2, a3, a4, a5;
double t, erfcx;
p = 0.3275911;
a1 = 0.254829592;
a2 = -0.284496736;
a3 = 1.421413741;
a4 = -1.453152027;
a5 = 1.061405429;
t = 1.0 / (1.0 + p*x);
erfcx = ( (a1 + (a2 + (a3 +
(a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x);
return erfcx;
}
int main()
{
double erg=0.0;
int i;
for(i=0; i<1000000; i++)
{
erg = erg + erfc(i/1000000.0);
}
return 0;
}
Michele Simionato, Ph. D. Mi**************@libero.it http://www.phyast.pitt.edu/~micheles/
---- Currently looking for a job ----
Michele Simionato wrote: I finally came to the conclusion that the exceeding good performance of Psyco was due to the fact that the function was called a million times with the *same* argument. Evidently Psyco is smart enough to notice that. Changing the argument at each call (erfc(0.456) -> i/1000000.0) slows down Python+Psyco at 1/4 of C speed. Psyco improves Python performance by an order of magnitude, but still it is not enough :-(
This is not suprising. Last I checked, Psyco does not fully compile
floating point expressions. If, I rememeber correctly (though every time
try to delve too deeply into Psyco my brains start oozing out my ears),
there are three ways a in which a given chunk of code evaluated. At one
level, which I'll call #1, Psyco generates the machine code(*) for the
expression. At a second level, Psyco calls out to C helper functions,
but still works with unboxed values. At the third level, Psyco punts and
creates a Python object and hands things off to the interpreter.
Most integer functions operate at level #1, so they tend to be quite
fast. Most floating point operations operate at level #2, so they have a
certain amount of overhead, but are still much faster than unpsyco
(sane?) Python. I believe the reason for this is that x86 floating point
operations are very messy, so Armin punted...
(*) Armin is working on virtual machine implementation of Psyco, so it
should be available on non x86 machines soon.
FWIW,
-tim
I was too optimistic!
Here I my numbers for Python 2.3, Psyco 1.0, Red Hat Linux 7.3, Pentium II 366 MHz:
$ time p23 erf.py real 0m3.245s user 0m3.164s sys 0m0.037s
This is more than four times slower than optimized C:
$ gcc erf.c -lm -O3 $ time ./a.out real 0m0.742s user 0m0.725s sys 0m0.002s
Here is the situation for pure Python
$time p23 erf.jy real 0m27.470s user 0m27.162s sys 0m0.023s
and, just for fun, here is Jython performance:
$ time jython erf.jy real 0m44.395s user 0m42.602s sys 0m0.389s
----------------------------------------------------------------------
$ cat erf.py import math import psyco psyco.full()
def erfc(x): exp = math.exp
p = 0.3275911 a1 = 0.254829592 a2 = -0.284496736 a3 = 1.421413741 a4 = -1.453152027 a5 = 1.061405429
t = 1.0 / (1.0 + p*x) erfcx = ( (a1 + (a2 + (a3 + (a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x) return erfcx
def main(): erg = 0.0
for i in xrange(1000000): erg += erfc(i/1000000.0)
if __name__ == '__main__': main()
--------------------------------------------------------------------------
# python/jython version = same without "import psyco; psyco.full()"
--------------------------------------------------------------------------
$cat erf.c #include <stdio.h> #include <math.h>
double erfc( double x ) { double p, a1, a2, a3, a4, a5; double t, erfcx;
p = 0.3275911; a1 = 0.254829592; a2 = -0.284496736; a3 = 1.421413741; a4 = -1.453152027; a5 = 1.061405429;
t = 1.0 / (1.0 + p*x); erfcx = ( (a1 + (a2 + (a3 + (a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x);
return erfcx; }
int main() { double erg=0.0; int i;
for(i=0; i<1000000; i++) { erg = erg + erfc(i/1000000.0); }
return 0; }
Michele Simionato, Ph. D. Mi**************@libero.it http://www.phyast.pitt.edu/~micheles/ ---- Currently looking for a job ---- mi**@pitt.edu (Michele Simionato) wrote in message
news:<22**************************@posting.google. com>... I finally came to the conclusion that the exceeding good performance of Psyco was due to the fact that the function was called a million times with the *same* argument. Evidently Psyco is smart enough to notice that. Changing the argument at each call (erfc(0.456) -> i/1000000.0) slows down Python+Psyco at 1/4 of C speed. Psyco improves Python performance by an order of magnitude, but still it is not enough :-(
It's plenty! A factor of 4 from optimized C, considering the newness
and limited resources behind psyco, is very encouraging, and good
enough for most tasks. Java JIT compilers are still around a factor
of 2 slower than C, and they've had at least 2 orders of magnitude
more whumpage.
This is a far cry from the factor of 10-30 I've been seeing with pure
python. For performance-critical code, this could be the difference
between hand-coding 5% versus 20% of your code.
Excellent news!! da*******@yahoo.com (dan) writes: mi**@pitt.edu (Michele Simionato) wrote in message news:<22**************************@posting.google. com>...
[...] This is a far cry from the factor of 10-30 I've been seeing with pure python. For performance-critical code, this could be the difference between hand-coding 5% versus 20% of your code.
Excellent news!!
If you care about this a lot, don't forget Pyrex.
John
right, pyrex -- looked at that a while ago. Compiled Python with
C-style type declarations, right? Kinda like common lisp??? (I'm
stretching my memory cells now)
will review jj*@pobox.com (John J. Lee) wrote in message news:<87************@pobox.com>... da*******@yahoo.com (dan) writes:
mi**@pitt.edu (Michele Simionato) wrote in message news:<22**************************@posting.google. com>... [...] This is a far cry from the factor of 10-30 I've been seeing with pure python. For performance-critical code, this could be the difference between hand-coding 5% versus 20% of your code.
Excellent news!!
If you care about this a lot, don't forget Pyrex.
John This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Michele Simionato |
last post by:
I posted this few weeks ago (remember the C Sharp thread?) but it went
unnoticed on the large mass of posts, so let me retry. Here I get Python+
Psyco twice as fast as optimized C, so I would like...
|
by: Michael Scarlett |
last post by:
There is an amazing article by paul graham about python, and an even
better discussion about it on slashdot. The reason I point this out,
is the more I read both articles, the more I realised how...
|
by: julio |
last post by:
Sorry but there is no another way, c# .net and mono are going to rip
python, not because python is a bad lenguage, but because is to darn old
and it refuses to innovate things, to fix wrong things,...
|
by: Iwan van der Kleyn |
last post by:
Please ignore if you are allergic to ramblings :-)
Despite a puritan streak I've always tried to refrain from language wars
or syntax bickering; call it enforced pragmatism. That's the main...
|
by: diffuser78 |
last post by:
I have just started to learn python. Some said that its slow. Can
somebody pin point the issue.
Thans
|
by: abhinav |
last post by:
Hi guys.I have to implement a topical crawler as a part of my
project.What language should i implement
C or Python?Python though has fast development cycle but my concern is
speed also.I want to...
|
by: 63q2o4i02 |
last post by:
Hi, I've been thinking about Python vs. Lisp. I've been learning
Python the past few months and like it very much. A few years ago I
had an AI class where we had to use Lisp, and I absolutely...
|
by: Nicholas Reville |
last post by:
Hi, I hope this is an OK spot for this question:
I'm a co-founder of the Participatory Culture Foundation
(pculture.org), we're a non-profit that develops Democracy Player and
some related...
|
by: Vicent Giner |
last post by:
Hello.
I am new to Python. It seems a very interesting language to me. Its
simplicity is very attractive.
However, it is usually said that Python is not a compiled but
interpreted programming...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shćllîpôpď 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
| |