Python and STL efficiency

Licheng Fang

Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.

//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;

int main(){
vector<stringa;
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<stringb(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}

#python
def f():
a = []
for i in range(10000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?

Aug 21 '06 #1

Subscribe Reply

3552

Marc 'BlackJack' Rintsch

In <11**********************@i42g2000cwa.googlegroups .com>, Licheng Fang
wrote:

Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.

//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;

int main(){
vector<stringa;
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<stringb(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}

Why are you using `unique_copy` here?

#python
def f():
a = []
for i in range(10000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?

There's a difference in data structures at least. The Python `set` type
is implemented with a hash algorithm, so the equivalent STL type would be
`hash_set`. `set` in Python does not store its contents sorted.

Ciao,
Marc 'BlackJack' Rintsch

Aug 21 '06 #2

Licheng Fang

Marc 'BlackJack' Rintsch wrote:

In <11**********************@i42g2000cwa.googlegroups .com>, Licheng Fang
wrote:

Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.

//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;

int main(){
vector<stringa;
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<stringb(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}

Why are you using `unique_copy` here?

Sorry, that's a typo. Actually I used 'copy'.

>
#python
def f():
a = []
for i in range(10000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?

There's a difference in data structures at least. The Python `set` type
is implemented with a hash algorithm, so the equivalent STL type would be
`hash_set`. `set` in Python does not store its contents sorted.

Ciao,
Marc 'BlackJack' Rintsch

Thank you for your comments. I tested with hash_set, but I didn't see
much performance improvement. When I increased the loop to 1 million
times, the python code still ran reasonably fast and the C++ code got
stuck there. This totally surprised me, because according this page
http://norvig.com/python-lisp.html, the speed of python is nowhere near
that of C++.

Aug 21 '06 #3

Tim N. van der Leeuw

Licheng Fang wrote:

Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.
I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?

Hi,

I'm no C++ guru so cannot comment on the C++ code itself, however I do
wonder if you tested your C++ code with other STL implementation such
as gcc (gcc is available on windows as well, in various versions).

What could be is that expanding the list in C++ is done in very small
increments, leading to many re-allocations. Is it possible to
pre-allocate the vector<with sufficient entries?
Also, your Python code as quoted, doesn't actually call your function
f(). If you say that you get results instantly, I assume that you mean
all 4 strings are actually printed to console?

(I'm surprised that the console prints things that fast).

btw, using range() in Python isn't very efficient, I think... Better to
use xrange().
Asked a C++ collegue of mine to comment, and he strongly suspects that
you're actually running it in the .Net runtime (your C++ code contains
some C#-isms, such as omitting the '.h' in the include <statements).

Luck,

--Tim

Aug 21 '06 #4

Tim N. van der Leeuw

Marc 'BlackJack' Rintsch wrote:

In <11**********************@i42g2000cwa.googlegroups .com>, Licheng Fang
wrote:

Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.

[...]

>
There's a difference in data structures at least. The Python `set` type
is implemented with a hash algorithm, so the equivalent STL type would be
`hash_set`. `set` in Python does not store its contents sorted.

The set should be only 4 items in size, according to my reading of the
code, so set implementation differences shouldn't lead to drastic
performance differences.

Ciao,
Marc 'BlackJack' Rintsch

Cheers,

--Tim

Aug 21 '06 #5

Marc 'BlackJack' Rintsch

In <11*********************@p79g2000cwp.googlegroups. com>, Tim N. van der
Leeuw wrote:

(your C++ code contains some C#-isms, such as omitting the '.h' in the
include <statements).

That's no C#-ism, that's C++. The standard C++ header names don't have a
trailing '.h'. ``gcc`` prints deprecation warnings if you write the
names with '.h'.

Ciao,
Marc 'BlackJack' Rintsch

Aug 21 '06 #6

Tim N. van der Leeuw

Marc 'BlackJack' Rintsch wrote:

In <11*********************@p79g2000cwp.googlegroups. com>, Tim N. van der
Leeuw wrote:

(your C++ code contains some C#-isms, such as omitting the '.h' in the
include <statements).

That's no C#-ism, that's C++. The standard C++ header names don't have a
trailing '.h'. ``gcc`` prints deprecation warnings if you write the
names with '.h'.

Ciao,
Marc 'BlackJack' Rintsch

We stand corrected.

--Tim

Aug 21 '06 #7

Fredrik Lundh

Licheng Fang wrote:

I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?

in the Python example, the four strings in your example are shared, so
you're basically copying 40000 pointers to the list.

in the C++ example, you're creating 40000 string objects.

</F>

Aug 21 '06 #8

Ray

How did you compile the C++ executable? I assume that it is Release
mode? Then are the optimization switches enabled? Is it compiled as
Native Win32 or Managed application?

I suspect that other than what other posters have suggested about your
code, the difference in speed is due to the way you build your C++
executable...

HTH,
Ray

Licheng Fang wrote:

Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.

//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;

int main(){
vector<stringa;
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<stringb(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}

#python
def f():
a = []
for i in range(10000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?

Aug 21 '06 #9

Peter Otten

Licheng Fang wrote:

Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.

I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?

Just a guess: immutable strings might be Python's advantage. Due to your
"benchmark"'s simplicity you end up with 10000 string instances in C++ and
just four str-s (and a lot of pointers) in Python.

What happens if you replace 'string' with 'const char *' in C++ ?
(Note that this modification is a bit unfair to Python as it would not
detect equal strings in different memory locations)

Peter

Aug 21 '06 #10

Ray

Fredrik Lundh wrote:

in the Python example, the four strings in your example are shared, so
you're basically copying 40000 pointers to the list.

in the C++ example, you're creating 40000 string objects.

</F>

In which case, Licheng, you should try using the /GF switch. This will
tell Microsoft C++ compiler to pool identical string literals together.
:)

Aug 21 '06 #11

Fredrik Lundh

Ray wrote:

>in the C++ example, you're creating 40000 string objects.

In which case, Licheng, you should try using the /GF switch. This will
tell Microsoft C++ compiler to pool identical string literals together.

in what way does that change the implementation of C++'s string type ?

</F>

Aug 21 '06 #12

Tim N. van der Leeuw

Ray wrote:

Fredrik Lundh wrote:
in the Python example, the four strings in your example are shared, so
you're basically copying 40000 pointers to the list.

in the C++ example, you're creating 40000 string objects.

</F>

In which case, Licheng, you should try using the /GF switch. This will
tell Microsoft C++ compiler to pool identical string literals together.
:)

The code still creates a new string - instance each time it tries to
append a const char* to the vector<string...

You should instead create the string-objects ahead of time, outside of
the loop.

Regards,

--Tim

Aug 21 '06 #13

Jeremy Sanders

Licheng Fang wrote:

I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?

It must be the debugging, the compiler or a poor STL implementation. With
gcc 4 it runs instantly on my computer (using -O2), even with 10x the
number of values.

If the problem is that C++ has to make lots of new strings, as other posters
have suggested, then you could do something like

const string foo = "What do you know?";

for (long int i=0; i<10000 ; ++i){
***a.push_back(foo);
...
}

as many C++ implementations use reference counting for identical strings.

Jeremy

--
Jeremy Sanders
http://www.jeremysanders.net/

Aug 21 '06 #14

Christophe

Jeremy Sanders a écrit :

Licheng Fang wrote:

>I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?

It must be the debugging, the compiler or a poor STL implementation. With
gcc 4 it runs instantly on my computer (using -O2), even with 10x the
number of values.

If the problem is that C++ has to make lots of new strings, as other posters
have suggested, then you could do something like

const string foo = "What do you know?";

for (long int i=0; i<10000 ; ++i){
a.push_back(foo);
...
}

as many C++ implementations use reference counting for identical strings.

Jeremy

As a matter of fact, do not count on that. Use a vector<string*just in
case.

Aug 21 '06 #15

Tim N. van der Leeuw

Tim N. van der Leeuw wrote:

Ray wrote:
Fredrik Lundh wrote:
in the Python example, the four strings in your example are shared, so
you're basically copying 40000 pointers to the list.
>
in the C++ example, you're creating 40000 string objects.
>
</F>
In which case, Licheng, you should try using the /GF switch. This will
tell Microsoft C++ compiler to pool identical string literals together.
:)

The code still creates a new string - instance each time it tries to
append a const char* to the vector<string...

You should instead create the string-objects ahead of time, outside of
the loop.

Regards,

--Tim

Alternatively, slow down the Python implementation by making Python
allocate new strings each time round:

a.append('%s' % 'What do you know')
.... for each of your string-appends. But even then, the python-code is
still near-instant.

Cheers,

--Tim

Aug 21 '06 #16

Ray

Fredrik Lundh wrote:

Ray wrote:

in the C++ example, you're creating 40000 string objects.
In which case, Licheng, you should try using the /GF switch. This will
tell Microsoft C++ compiler to pool identical string literals together.

in what way does that change the implementation of C++'s string type ?

Ah, yes what was I thinking? The fact that it stores std::string
objects escaped my mind somehow. /GF just pools the string literals.
Thanks for the correction.

>
</F>

Aug 22 '06 #17

Ray

Tim N. van der Leeuw wrote:

In which case, Licheng, you should try using the /GF switch. This will
tell Microsoft C++ compiler to pool identical string literals together.
:)

The code still creates a new string - instance each time it tries to
append a const char* to the vector<string...

Yeah, you're right... I've been programming Java too long :)

You should instead create the string-objects ahead of time, outside of
the loop.

Regards,

--Tim

Aug 22 '06 #18

Tim N. van der Leeuw

Ray wrote:

Tim N. van der Leeuw wrote:

In which case, Licheng, you should try using the /GF switch. This will
tell Microsoft C++ compiler to pool identical string literals together.
>
>
:)
The code still creates a new string - instance each time it tries to
append a const char* to the vector<string...

Yeah, you're right... I've been programming Java too long :)

Took me a while to see that too! Have been programming too much Java /
Python as well. Anyways, when changing the Python version so that it
adds 40.000 unique strings to the list (and proving that there are
indeed 40.000 unique ids in the list, by making a set of all id()s in
the list and taking the len() of that set), it still takes at most a
second. I cannot test the speed of the c++ version on my computer, so
nothing scientific here.

I'm curious though, if on the OP's machine the slowed-down Python
version is still faster than the C++ version.
Cheers,

--Tim

Aug 22 '06 #19

Mc Osten

Tim N. van der Leeuw <ti*************@nl.unisys.comwrote:

I'm curious though, if on the OP's machine the slowed-down Python
version is still faster than the C++ version.

I tested both on my machine (my other post in the thread)

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 22 '06 #20

Mc Osten

Jeremy Sanders <je*******************@jeremysanders.netwrote:

It must be the debugging, the compiler or a poor STL implementation. With
gcc 4 it runs instantly on my computer (using -O2), even with 10x the
number of values.

$ gcc --version
i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build
5363)

I adapted original poster's code and made a function that did not create
strings each time. The NoisyString is a class we can use to actually
track copying.

In fact Python here is faster. Suppose it has a really optimized set
class...
Here some results (I know that the fpoint optimizations are useless...
it's is my "prebuilt" full optimization macro :) ):

$ g++ -O3 -pipe -O2 -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 5.8
Elapsed 1.71

$ g++ -Os -pipe -O2 -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl

What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 5.8
Elapsed 1.71

$ g++ -O3 -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 0.47
Elapsed 0.18

$ g++ -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 0.63
Elapsed 0.33

$ python -O set_impl.py
so long...
What do you know
fool
chicken crosses road
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.370000 seconds
Elapsed: 3.810000 seconds

------------------- PYTHON CODE ---------------------------------
#python

global size
size = 1000000

def f():
a = []
for i in range(size):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

def slow_f():
a = []
for i in range(size):
a.append('%s' % 'What do you know')
a.append('%s' % 'so long...')
a.append('%s' % 'chicken crosses road')
a.append('%s' % 'fool')
b = set(a)
for s in b:
print s

import time
from time import clock

f_start = clock()
f()
f_end = clock()

slow_f_start = clock()
slow_f()
slow_f_end = clock()

print "Elapsed: %f seconds" % (f_end - f_start)
print "Elapsed: %f seconds" % (slow_f_end - slow_f_start)

------------------------------------------------------------------
----------------- CPP CODE -------------------------------------
#include <iostream>
#include <ostream>
#include <iterator>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
#include <ctime>
using namespace std;
#define SIZE 1000000

class NoisyString : public std::string {
public:
NoisyString(const string& cp)
: string(cp)
{
cout << "Fuck I got copied!" << endl;
}

NoisyString(const char* s ) : string(s) {

}

};
void f(){
vector<stringa;
for (long int i=0; i<SIZE ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<stringb(a.begin(), a.end());
copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}

void fast_f(){
vector<stringa;
string s1 = "What do you know?" ;
string s2 = "so long..." ;
string s3 = "chicken crosses road";
string s4 = "fool" ;
for (long int i=0; i<SIZE ; ++i){
a.push_back(s1);
a.push_back(s2);
a.push_back(s3);
a.push_back(s4);
}
set<stringb(a.begin(), a.end());
copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}
int main(){
clock_t f_start,
f_end,
faster_f_start,
faster_f_end,
fast_f_start,
fast_f_end;

f_start = clock();
f();
f_end = clock();

fast_f_start = clock();
fast_f();
fast_f_end = clock();
cout << "Elapsed " << (f_end - f_start) / double(CLOCKS_PER_SEC) <<
endl;
cout << "Elapsed " << (fast_f_end - fast_f_start) /
double(CLOCKS_PER_SEC) << endl;

}

-----------------------------------------------------------------------

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 22 '06 #21

Fredrik Lundh

"Mc Osten" wrote:

In fact Python here is faster. Suppose it has a really optimized set
class...

Python's memory allocator is also quite fast, compared to most generic
allocators...

</F>

Aug 22 '06 #22

Mc Osten

Fredrik Lundh <fr*****@pythonware.comwrote:

Python's memory allocator is also quite fast, compared to most generic
allocators...

In fact also in the two "slow" versions Python outperforms C++.
I didn't notice it in the first place.

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 22 '06 #23

Tim N. van der Leeuw

Mc Osten wrote:

Fredrik Lundh <fr*****@pythonware.comwrote:

Python's memory allocator is also quite fast, compared to most generic
allocators...

In fact also in the two "slow" versions Python outperforms C++.
I didn't notice it in the first place.

But your C++ program outputs times in seconds, right? So all
compilations except for the first two give results in less than a
second, right? (meaning the optimizations of your standard-compilation
give worst results than -O3?)

BTW, I don't quite understand your gcc optimizations for the first 2
compiles anyways: two -O options with different values. Doesn't that
mean the 2nd -O takes preference, and the compilation is at -O2 instead
of -O3?

Why both -O3 and -O2 at the command-line?

Cheers,

--Tim

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 22 '06 #24

Tim N. van der Leeuw

Mc Osten wrote:

Fredrik Lundh <fr*****@pythonware.comwrote:

Python's memory allocator is also quite fast, compared to most generic
allocators...

In fact also in the two "slow" versions Python outperforms C++.
I didn't notice it in the first place.

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Well, I guess I'm getting really obsessed with this. But anyways. I
installed MinGW on my Windows-XP (sp2) laptop. It is g++ version 3.4.5
-- ancient, yes, but on windows it's the latest available.

I compiled Mc Osten's C++ program (tweaked the output a little) and ran
it; ran his version of the python code too.
Oh boy; yes indeed the slow python is faster than the fast C++
version... Must be something really awful happening in the STL
implementation that comes with GCC 3.4!

Here's the output from my console:

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ g++ -O3 -march=pentium-m -o SpeedTest SpeedTest.cpp

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 40000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 0.037574 seconds
Slow - Elapsed: 0.081520 seconds

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 2.089 seconds
Slow - Elapsed: 6.303 seconds

LeeuwT@nlshl-leeuwt ~/My Documents/Python
Cheers,

--Tim

Aug 22 '06 #25

Jeremy Sanders

Mc Osten wrote:

Here some results (I know that the fpoint optimizations are useless...
it's is my "prebuilt" full optimization macro :) ):

Interesting. The opimisation makes no difference to the speed of the C++ one
for me. I just get

xpc17:~g++4 -O2 test2.cpp
xpc17:~./a.out
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 2.11
Elapsed 1.11

(This is with an Althon 64 4600+ running Linux).

Unfortunately the Python on this computer doesn't have set as it is too old,
so I can't compare it.

--
Jeremy Sanders
http://www.jeremysanders.net/

Aug 22 '06 #26

Mc Osten

Tim N. van der Leeuw <ti*************@nl.unisys.comwrote:

But your C++ program outputs times in seconds, right? So all
compilations except for the first two give results in less than a
second, right? (meaning the optimizations of your standard-compilation
give worst results than -O3?)

Yes. It's in seconds but the benchmark that are one order of magnitudo
less than the others have of a different "size" (100000 instead of
1000000). That is cut and paste from my terminal... I think it's a
mess. I do it all again from scratch.

BTW, I don't quite understand your gcc optimizations for the first 2
compiles anyways: two -O options with different values. Doesn't that
mean the 2nd -O takes preference, and the compilation is at -O2 instead
of -O3?
Why both -O3 and -O2 at the command-line?

I forgot I put -O2 in my $FAST_FLAGS. I don't know what I was thinking
about.

This the correct version

$ g++ -Os -pipe -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp

$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 6.3
Elapsed 2.1

$ g++ -O2 -pipe -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 5.8
Elapsed 1.7

$ g++ -O3 -pipe -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 5.79
Elapsed 1.72

$ g++ -pipe -march=pentium-m -msse3 -fomit-frame-pointer -mfpmath=sse
-o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 7.12
Elapsed 2.98

$ python -O set_impl.py
so long...
What do you know
fool
chicken crosses road
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.370000 seconds
Elapsed: 3.800000 seconds

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 22 '06 #27

Mc Osten

Tim N. van der Leeuw <ti*************@nl.unisys.comwrote:

Oh boy; yes indeed the slow python is faster than the fast C++
version... Must be something really awful happening in the STL
implementation that comes with GCC 3.4!

And the Python version does the very same number of iterations than the
C++ one? I suppose they are looping on arrays of different sizes, just
like my "first version".

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 22 '06 #28

Tim N. van der Leeuw

Mc Osten wrote:

Tim N. van der Leeuw <ti*************@nl.unisys.comwrote:

Oh boy; yes indeed the slow python is faster than the fast C++
version... Must be something really awful happening in the STL
implementation that comes with GCC 3.4!

And the Python version does the very same number of iterations than the
C++ one? I suppose they are looping on arrays of different sizes, just
like my "first version".

Hmmm.. You're quite right. The C++ version had an array size 100.000
(your version), the Python version still had an array size 10.000 (as
in my modified copy of the original version).

When fixing the Python version to have 100.000 items, like the C++
version, the Python timings are:

Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 400000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 0.512088 seconds
Slow - Elapsed: 1.139370 seconds

Still twice as fast as the fastest GCC 3.4.5 compiled version!

Incidentally, I also have a version compiled with VC++ 6 now... (not
yet w/VC++ 7) .. Compiled with release-flags and maximum optimization
for speed, here's the result of VC++ 6:

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest_VC.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 4.481 seconds
Slow - Elapsed: 4.842 seconds

So you can see that it's 'slow' version of the code is faster than the
'slow' version compiled with GCC, but the 'fast' code is barely faster
than the 'slow' code! And the 'fast' version compiled with GCC is much
faster than the 'fast' version compiled with VC++ 6!

My conclusion from that is, that the vector<or set<implementations
of GCC are far superior to those of VC++ 6, but that memory allocation
for GCC 3.4.5 (MinGW version) is far worse than that of MSCRT / VC++ 6.
(And Python still smokes them both).

Cheers,

--Tim

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 22 '06 #29

Tim N. van der Leeuw

Tim N. van der Leeuw wrote:

Mc Osten wrote:
Tim N. van der Leeuw <ti*************@nl.unisys.comwrote:

Oh boy; yes indeed the slow python is faster than the fast C++
version... Must be something really awful happening in the STL
implementation that comes with GCC 3.4!
And the Python version does the very same number of iterations than the
C++ one? I suppose they are looping on arrays of different sizes, just
like my "first version".

Hmmm.. You're quite right. The C++ version had an array size 100.000
(your version), the Python version still had an array size 10.000 (as
in my modified copy of the original version).

When fixing the Python version to have 100.000 items, like the C++
version, the Python timings are:

[...]

Fast - Elapsed: 0.512088 seconds
Slow - Elapsed: 1.139370 seconds

Still twice as fast as the fastest GCC 3.4.5 compiled version!

Incidentally, I also have a version compiled with VC++ 6 now... (not
yet w/VC++ 7) .. Compiled with release-flags and maximum optimization
for speed, here's the result of VC++ 6:

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest_VC.exe

[...]

Fast - Elapsed: 4.481 seconds
Slow - Elapsed: 4.842 seconds

[...]

And the results of IronPython (1.0rc2) are just in as well:

IronPython 1.0.60816 on .NET 2.0.50727.42
Copyright (c) Microsoft Corporation. All rights reserved.

>>>
import sys
sys.path.append('c:/documents and settings/leeuwt/my documents/python')
import SpeedTest
SpeedTest.run_test()

Begin Test
Number of unique string objects: 4
What do you know
so long...
chicken crosses road
fool
Number of unique string objects: 400000
What do you know
so long...
chicken crosses road
fool
Fast - Elapsed: 1.287923 seconds
Slow - Elapsed: 4.516272 seconds

>>>

And for Python 2.5:
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python25/python.exe SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 400000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 0.440619 seconds
Slow - Elapsed: 1.095341 seconds

LeeuwT@nlshl-leeuwt ~/My Documents/Python

But beware! For Python2.5 I had to change the code slightly, because it
already realized that the expression

'%s' % 'something'

will be a constant expression, and evaluates it once only... so I had
to replace '%s' with a variable, and I got the timings above which show
Python2.5 to be slightly faster than Python2.4.

(Next step would be to create a VB version and a Java version of the
same program, oh and perhaps to try a version that would work with
Jython... perhaps somehow w/o the 'set')

Cheers,

--Tim

Aug 22 '06 #30

skip

TimBut beware! For Python2.5 I had to change the code slightly,
Timbecause it already realized that the expression

Tim'%s' % 'something'

Timwill be a constant expression, and evaluates it once only... so I
Timhad to replace '%s' with a variable, and I got the timings above
Timwhich show Python2.5 to be slightly faster than Python2.4.

Shouldn't you then get rid of any compiler optimizations your C++ compiler
does? Why penalize 2.5 because it recognizes a useful optimization?

Tim(Next step would be to create a VB version and a Java version of
Timthe same program, oh and perhaps to try a version that would work
Timwith Jython... perhaps somehow w/o the 'set')

I don't recall the example exactly, but couldn't you just create a set class
that uses a dict under the covers and only implement the methods you need
for the test?

Skip

Aug 22 '06 #31

Maric Michaud

Le mardi 22 août 2006 12:55, Mc Osten a écrit*:

In fact Python here is faster. Suppose it has a really optimized set
class...

Maybe I'm missing something but the posted c++codes are not equivalent IMO to
what python is doing. I discarded the "slow" version, and tried to get the
equivalent in c++ of :

"""
#!/usr/bin/env python

size = 1000000

def f():
a = []
for i in range(size):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

import time
from time import clock

f_start = clock()
f()
f_end = clock()

print "Elapsed: %f seconds" % (f_end - f_start)
"""

I came at first with the following, which is still slower than the python
version :

"""
void print_occurence_of_unique_strings(){
vector<stringa;
const string& s1 = "What do you know?" ;
const string& s2 = "so long..." ;
const string& s3 = "chicken crosses road";
const string& s4 = "fool" ;
for (long int i=0; i<SIZE ; ++i){
a.push_back(s1);
a.push_back(s2);
a.push_back(s3);
a.push_back(s4);
}
set<stringb(a.begin(), a.end());
copy(b.begin(), b.end(),
ostream_iterator<string>(cout, "\n"));
}
"""

Here, all strings, while passed by reference to the vector, are copied one by
one.
Then, I tried this, it just overcome the performance of python code, but not
in the proportion I expected :

"""
void print_occurence_of_unique_strings_compare_by_adres ses(){
vector<string*a;
string s1 = "What do you know?";
string s2 = "so long...";
string s3 = "chicken crosses road";
string s4 = "fool";
for (long int i=0; i<SIZE ; ++i){
a.push_back(&s1);
a.push_back(&s2);
a.push_back(&s3);
a.push_back(&s4);
}
set<stringb;
for (vector<string*>::iterator it=a.begin(); it!=a.end(); it++)
b.insert(**it);
copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}
"""

The problem here, is that the strings in the set are compared by value, which
is not optimal, and I guess python compare them by adress ("s*n is s*n" has
the same complexity than "s*n == s*n" in CPython, right ?).

so, finally, the code in c++, about ten times faster than the equivalent in
python, must be :

"""
void print_occurence_of_unique_strings_compared_by_addr ess(){
cout << "print_occurence_of_unique_strings_compared_by_add ress" << endl;
vector<string*a;
string s1 = "What do you know?";
string s2 = "so long...";
string s3 = "chicken crosses road";
string s4 = "fool";
for (long int i=0; i<SIZE ; ++i){
a.push_back(&s1);
a.push_back(&s2);
a.push_back(&s3);
a.push_back(&s4);
}
set<string*b(a.begin(), a.end());
set<stringc; // well ordered set (b is ordered by address)
for (set<string*>::iterator it=b.begin(); it!=b.end(); it++)
c.insert(**it);
copy(c.begin(), c.end(), ostream_iterator<string>(cout, "\n"));
}
"""

the result on my box is :

maric@redflag2 mar aoû 22 22:24:23:~$ g++ -O3 -o testcpp testcpp.cpp
maric@redflag2 mar aoû 22 22:24:29:~$ ./testcpp
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_addr ess
What do you know?
chicken crosses road
fool
so long...
strings : 1.89
unique strings : 0.87
compared by address : 0.18
maric@redflag2 mar aoû 22 22:24:38:~$ python2.4 testpython.py
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.680000 seconds
maric@redflag2 mar aoû 22 22:24:51:~$ g++ -v
Using built-in specs.
Target: i486-linux-gnu
Configured
with: ../src/configure -v --enable-languages=c,c++,java,fortran,objc,obj-c++,ada,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu--enable-libstdcxx-debug --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.1-1.4.2.0/jre --enable-mpfr --with-tune=i686 --enable-checking=release
i486-linux-gnu
Thread model: posix
gcc version 4.1.2 20060613 (prerelease) (Debian 4.1.1-5)

I've joined the full c++ file as an attachment.

--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097

Aug 22 '06 #32

Tim N. van der Leeuw

sk**@pobox.com wrote:

TimBut beware! For Python2.5 I had to change the code slightly,
Timbecause it already realized that the expression

Tim'%s' % 'something'

Timwill be a constant expression, and evaluates it once only... so I
Timhad to replace '%s' with a variable, and I got the timings above
Timwhich show Python2.5 to be slightly faster than Python2.4.

Shouldn't you then get rid of any compiler optimizations your C++ compiler
does? Why penalize 2.5 because it recognizes a useful optimization?

The point is that I was trying to create 400.000 string instances. The
extra optimization in 2.5 required an extra trick for that.
The idea is to compare a C++ version which creates 400.000 string
instances, with a Python version which creates 400.000 string
instances; then reduce those 400.000 instances to a set of only 4
unique strings.
(So I cannot just create a list with strings generated from numbers 1 -
400.000, and I didn't want to change the original code too much, so I
just added a trick to make Python allocate a new string each time
round.)

I agree that Python2.5 recognized a useful optimization, and didn't
wish to penalize it for that, however the optimalization was defeating
the purpose of my code in the first place!

Cheers,

--Tim

Aug 22 '06 #33

Fredrik Lundh

Maric Michaud wrote:

The problem here, is that the strings in the set are compared by value, which
is not optimal, and I guess python compare them by adress ("s*n is s*n" has
the same complexity than "s*n == s*n" in CPython, right ?).

wrong.

timeit -s"s='x'; n=1000" "s*n is n*s"

1000000 loops, best of 3: 1.9 usec per loop

timeit -s"s='x'; n=1000" "s*n == n*s"

100000 loops, best of 3: 4.5 usec per loop

</F>

Aug 22 '06 #34

Tim N. van der Leeuw

Maric Michaud wrote:

Le mardi 22 août 2006 12:55, Mc Osten a écrit :
In fact Python here is faster. Suppose it has a really optimized set
class...

Maybe I'm missing something but the posted c++codes are not equivalent IMO to
what python is doing. I discarded the "slow" version, and tried to get the
equivalent in c++ of :

Your C++ version got me the following timings (using gcc 3.4.5 as the
compiler, MinGW version, with -O6):

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./testcpp.exe
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_addr ess
What do you know?
chicken crosses road
fool
so long...
strings : 2.135
unique strings : 1.103
compared by address : 0.21
For reference, Python's best time was 0.39 seconds on the same computer
(in the 'fast' version, using only 4 unique string instances).

Hmmm... Can we conclude now that carefully crafted C++ code is about
twice as fast as casually and intuitively written Python code? ;) (Just
kidding here of course)

NB: Your code now tests for address-equality. Does it also still test
for string-equality? It looks to me that it does, but it's not quite
clear to me.

Cheers,

--Tim

Aug 22 '06 #35

Mc Osten

Tim N. van der Leeuw <ti*************@nl.unisys.comwrote:

NB: Your code now tests for address-equality. Does it also still test
for string-equality? It looks to me that it does, but it's not quite
clear to me.

It does it.

set<string*b(a.begin(), a.end());
set<stringc; // well ordered set (b is ordered by address)
for (set<string*>::iterator it=b.begin(); it!=b.end(); it++)
c.insert(**it);
copy(c.begin(), c.end(), ostream_iterator<string>(cout, "\n"));

When we populate the first set, we get rid of all strings with same
object id/address (it test equality of pointers). Then we populate
another set (and the default equality test is on strings).

However, I would have written the code using a proper compare function
rather than using two sets. In this particular case the number of
elements of the first set is negligible in respect of the initial vector
size, thus copying it again does not take a lot of time.
But such code is optimized for the problem itself: in the real world I
suppose we would have passed set a proper comparison function that
checks address and then string equality.
--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 22 '06 #36

Mc Osten

Tim N. van der Leeuw <ti*************@nl.unisys.comwrote:

My conclusion from that is, that the vector<or set<implementations
of GCC are far superior to those of VC++ 6, but that memory allocation
for GCC 3.4.5 (MinGW version) is far worse than that of MSCRT / VC++ 6.
(And Python still smokes them both).

It would be interesting to test it with VC 8 (2005). I have it in my
Parallels vm, but it looks like something is wrong. The very same code
takes almost a minute, I suppose there is something wrong with it
(Python is almost as fast as the python 2.4 on MacOS).

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 22 '06 #37

Mc Osten

Tim N. van der Leeuw <ti*************@nl.unisys.comwrote:

And the results of IronPython (1.0rc2) are just in as well:

I can't test this one.

>
And for Python 2.5:
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python25/python.exe SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 400000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 0.440619 seconds
Slow - Elapsed: 1.095341 seconds

What the heck... you have a Cray, haven't you?
$ /opt/misc/bin/python2.5 -O set_impl.py
so long...
What do you know
fool
chicken crosses road
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.300000 seconds
Elapsed: 1.290000 seconds

Yes... good optimizer work. The 'slow' code here is faster than the fast
one.
$ python -O set_impl.py
so long...
What do you know
fool
chicken crosses road
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.360000 seconds
Elapsed: 3.800000 seconds

(Next step would be to create a VB version and a Java version of the
same program, oh and perhaps to try a version that would work with
Jython... perhaps somehow w/o the 'set')

Ok. I can do the Java version. If I find a RealBasic Set class I can do
it. However, I don't remember anything about VB6, and have done nothing
with .Net.
But I don't think it is that interesting. Java strings are immutable
too: I expect it to outperform Python (unless Java Set class sucks). And
I don't see the point of taking in VB.
A good BASIC implentation is comparable with Pascal or C++ speedwise.
(At least this results from Great Language Shootout and Free Basic).

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 22 '06 #38

Ray

Tim N. van der Leeuw wrote:

Incidentally, I also have a version compiled with VC++ 6 now... (not
yet w/VC++ 7) .. Compiled with release-flags and maximum optimization
for speed, here's the result of VC++ 6:

<snip>

OK, now I'm getting obsessed with this too ;-)

I'm using VC++ Express, I didn't care to tweak the optimizations, I
merely chose the "Release" configuration for the executable. It's
blazing fast, taking only 30+ ms each run.

Here's the code:

int main(){
DWORD begin = ::GetTickCount();
vector<stringa;
string c = "What do you know?";
string d = "so long...";
string e = "chicken crosses road";
string f = "fool";
for (long int i=0; i<10000 ; ++i){
a.push_back(c);
a.push_back(d);
a.push_back(e);
a.push_back(f);
}
set<stringb(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout,
"\n"));
DWORD end = ::GetTickCount();
cout << "Ends in " << (end - begin) << " ms.";
}

And here's the result:

\TestSTL\release>TestSTL.exe
What do you know?
chicken crosses road
fool
so long...
Ends in 31 ms.

I tried the original version:

int main(){
DWORD begin = ::GetTickCount();
vector<stringa;
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<stringb(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout,
"\n"));
DWORD end = ::GetTickCount();
cout << "Ends in " << (end - begin) << " ms.";
}

And the result is only 50% slower:

\TestSTL\release>TestSTL.exe
What do you know?
chicken crosses road
fool
so long...
Ends in 47 ms.

Aug 22 '06 #39

could.net

That's to say,
python is still much faster?

I am a c++ newbie but I think c++ should be faster here.
Maybe someone can post this to the c++ maillist and they will tell how
to accelerate it.
Tim N. van der Leeuw wrote:

Mc Osten wrote:
Fredrik Lundh <fr*****@pythonware.comwrote:

Python's memory allocator is also quite fast, compared to most generic
allocators...
In fact also in the two "slow" versions Python outperforms C++.
I didn't notice it in the first place.

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Well, I guess I'm getting really obsessed with this. But anyways. I
installed MinGW on my Windows-XP (sp2) laptop. It is g++ version 3.4.5
-- ancient, yes, but on windows it's the latest available.

I compiled Mc Osten's C++ program (tweaked the output a little) and ran
it; ran his version of the python code too.
Oh boy; yes indeed the slow python is faster than the fast C++
version... Must be something really awful happening in the STL
implementation that comes with GCC 3.4!

Here's the output from my console:

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ g++ -O3 -march=pentium-m -o SpeedTest SpeedTest.cpp

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 40000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 0.037574 seconds
Slow - Elapsed: 0.081520 seconds

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 2.089 seconds
Slow - Elapsed: 6.303 seconds

LeeuwT@nlshl-leeuwt ~/My Documents/Python
Cheers,

--Tim

Aug 23 '06 #40

Ray

co*******@gmail.com wrote:

That's to say,
python is still much faster?

Not really, see my test, in my other post in the same thread. I'm using
VC++ Express 2005. If we're comparing with Python 2.5 I think it's just
fair that for C++ we're using the latest as well.

I am a c++ newbie but I think c++ should be faster here.

Same here, although that said Python's implementation of those data
structure must already be as optimal as mortals can do it.

<snip>

Aug 23 '06 #41

Mc Osten

Ray <ra********@yahoo.comwrote:

I'm using VC++ Express, I didn't care to tweak the optimizations, I
merely chose the "Release" configuration for the executable. It's
blazing fast, taking only 30+ ms each run.

Of course it is faster. We are looping 1000000 times, you just 10000.

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 23 '06 #42

Mc Osten

Ray <ra********@yahoo.comwrote:

Not really, see my test, in my other post in the same thread. I'm using
VC++ Express 2005. If we're comparing with Python 2.5 I think it's just
fair that for C++ we're using the latest as well.

In your test, you are looping 10000 times, we looped 1000000.
In Python tests with 10000 elements, it was about 10 ms.

Moreover, we tried various Python and C++ configurations. Most of the
tests are done with Python 2.4, not 2.5.
And I used gcc4, that is to say the latest on my platform.

Same here, although that said Python's implementation of those data
structure must already be as optimal as mortals can do it.

I think this is the rationale behind it.

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 23 '06 #43

Mc Osten

<co*******@gmail.comwrote:

That's to say,
python is still much faster?

Yes it is. But of course you can't sat that "Python is faster than C++".
We found that the code to do this, written in the most natural way, is a
lot faster in Python. However, if you optimze the code, C++ gets almost
as fast.

In other benchmarks C++ outperforms Python and is 10 or 100 times
faster.

Maybe someone can post this to the c++ maillist and they will tell how
to accelerate it.

There are enough C++ experts here to do it. The point is another.

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 23 '06 #44

Ray

Mc Osten wrote:

Ray <ra********@yahoo.comwrote:

I'm using VC++ Express, I didn't care to tweak the optimizations, I
merely chose the "Release" configuration for the executable. It's
blazing fast, taking only 30+ ms each run.

Of course it is faster. We are looping 1000000 times, you just 10000.

Certainly--I was not comparing 1000000 against 10000. Referring to the
OP's statement: "However, while the python code gave the result almost
instantly, the C++ code took several seconds to run!" 30ms sounds like
a definite improvement over several seconds!

I'll try to tweak it later at home and report here. I'll try out the
1000000 too.

Cheers
Ray

>
--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 23 '06 #45

Ray

Mc Osten wrote:

In your test, you are looping 10000 times, we looped 1000000.
In Python tests with 10000 elements, it was about 10 ms.

Moreover, we tried various Python and C++ configurations. Most of the
tests are done with Python 2.4, not 2.5.
And I used gcc4, that is to say the latest on my platform.

Mine's VC 2005 Express--let me put the optimization parameters later
and measure again when I get home.

<snip>

Aug 23 '06 #46

GHUM

Mc Osten schrieb:

Yes it is. But of course you can't sat that "Python is faster than C++".

Of course not. Python is faster then assembler. Proofed @ EuroPython
2006 in CERN, near the LHC Beta, in the same room many Nobel laurates
gave their presentations before.

Harald

Aug 23 '06 #47

Mc Osten

Ray <ra********@yahoo.comwrote:

Certainly--I was not comparing 1000000 against 10000. Referring to the
OP's statement: "However, while the python code gave the result almost
instantly, the C++ code took several seconds to run!" 30ms sounds like
a definite improvement over several seconds!

Of course. I suppose there's something broken in OP's C++ setup (in fact
the version I compiled with VCPP 2005 also takes a lot of seconds...
something like 20-30 seconds, but of course this makes me think I
haven't understood how it is supposed to work, since my gcc gives
results comparable to yours).

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 23 '06 #48

Mc Osten

GHUM <ha**************@gmail.comwrote:

Proofed @ EuroPython
2006 in CERN, near the LHC Beta, in the same room many Nobel laurates
gave their presentations before.

Have you some link? I suppose it's kind of a joke they did or something
like that...

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 23 '06 #49

Ray

Mc Osten wrote:

Of course. I suppose there's something broken in OP's C++ setup (in fact
the version I compiled with VCPP 2005 also takes a lot of seconds...
something like 20-30 seconds, but of course this makes me think I
haven't understood how it is supposed to work, since my gcc gives
results comparable to yours).

Yeah, my guess would be either he used the Debug configuration or he
actually created a Managed executable instead of a pure Win32
application. Sigh, now I can't wait to get home and try it out :)

>
--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Aug 23 '06 #50

Python and STL efficiency

Similar topics