469,281 Members | 2,481 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,281 developers. It's quick & easy.

Python and STL efficiency

Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.

//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;

int main(){
vector<stringa;
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<stringb(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}

#python
def f():
a = []
for i in range(10000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?

Aug 21 '06
83 3156
Le mardi 22 août 2006 23:15, Fredrik Lundh a écrit*:
Maric Michaud wrote:
The problem here, is that the strings in the set are compared by value,
which is not optimal, and I guess python compare them by adress ("s*n is
s*n" has the same complexity than "s*n == s*n" in CPython, right ?).

wrong.
Ah ! wrong, thanks, but not in our case :

In [80]: for i in (10**e for e in range(5)) :
....: res1 = timeit.Timer('s == t', 's, t = "e"*%i, "e"*%i'%(i,
i)).timeit()
....: res2 = timeit.Timer('s is t', 's, t = "e"*%i, "e"*%i'%(i,
i)).timeit()
....: print res1/res2
....:
....:
1.10532866525
1.27507328965
1.90244004672
8.33974283485
89.5215441627

Indeed, it's wrong for two instances of str, but not if we compare an instance
to itself :

In [79]: for i in (10**e for e in range(9)) :
....: r1=timeit.Timer('s is p', "s, p = ('s'*%s,)*2" % i).timeit()
....: r2=timeit.Timer('s == p', "s, p = ('s'*%s,)*2" % i).timeit()
....: print r1/r2
....:
....:
0.854690643008
0.872682262181
0.851785060822
0.871193603744
0.890304121256
0.86925960859
0.846364097331
0.91614070798
0.825424114324
So my lastest c++ algorithm is the good one still, as the python code is
like :

In [29]: timeit.Timer('set(t)', 't=("e"*10,)*10').timeit()
Out[29]: 1.883868932723999

In [30]: timeit.Timer('set(t)', 't=("e"*100,)*10').timeit()
Out[30]: 1.8789899349212646

and not :

In [27]: timeit.Timer('set(t)', 't=["e"*10 for i in range(10)]').timeit()
Out[27]: 2.6000990867614746

In [28]: timeit.Timer('set(t)', 't=["e"*100 for i in range(10)]').timeit()
Out[28]: 4.1173238754272461
timeit -s"s='x'; n=1000" "s*n is n*s"

1000000 loops, best of 3: 1.9 usec per loop
timeit -s"s='x'; n=1000" "s*n == n*s"

100000 loops, best of 3: 4.5 usec per loop

</F>
--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
Aug 23 '06 #51
Tim, sorry for I send it to you personally, I was abused by thunderbird.

Tim N. van der Leeuw a écrit :

Your C++ version got me the following timings (using gcc 3.4.5 as the
compiler, MinGW version, with -O6):

...
Hmmm... Can we conclude now that carefully crafted C++ code is about
twice as fast as casually and intuitively written Python code? ;) (Just
kidding here of course)
Every C++ code must be carefully crafted :).

But your benchmark surprise me, here is what I get on my windows box :

Python2.4 :
===========

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ which python2.4
/usr/bin/python2.4

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ which python
/cygdrive/c/Python24/python

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ python2.4 testpython.py
so long...
What do you know
fool
chicken crosses road
Elapsed: 3.655000 seconds

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ python testpython.py
so long...
What do you know
fool
chicken crosses road
Elapsed: 3.077764 seconds

Cygwin version is slower, but not too much.

C++, compiled with VS2005, release, no other configuration :
================================================== ==========

print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_addr ess
What do you know?
chicken crosses road
fool
so long...
strings : 16.718 # memory allocation is quite bad
unique strings : 1.188
compared by address : 0.453

C++, with gcc 3.3/Cygwin
========================

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ g++ -O3 -o testcpp testcpp.cpp

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ ./testcpp
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_addr ess
What do you know?
chicken crosses road
fool
so long...
strings : 17.266 # still bad
unique strings : 1.547
compared by address : 0.375

Hum, with my old gcc I get equal or better performances than with VS2005.

Finally, the equivalent code is still about 10x faster in c++ than in
Python, as it was on my Linux box.

Mc Osten a écrit :
...
However, I would have written the code using a proper compare function
rather than using two sets. In this particular case the number of
elements of the first set is negligible in respect of the initial vector
size, thus copying it again does not take a lot of time.
But such code is optimized for the problem itself: in the real world I
Of course, it's a quick hack to get the same behavior.
suppose we would have passed set a proper comparison function that
checks address and then string equality.
Yes, furthermore, that is *exactly* what the python code is doing (see
my other post to Fredrik).
--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
Aug 23 '06 #52
This thread can be useful for ShedSkin (the Python =C++ translator),
because often it manages strings slower than CPython still, some
suggestions from a C++ expert can surely improve things a lot. C++ is
fast, but you have to use and know it well, otherwise you don't obtain
much speed.

Maybe this can be useful to speed up C++ hashing (now used in ShedSkin
too):
http://www.azillionmonkeys.com/qed/hash.html

I suggest to test this code with the D language too, it has built-in
dicts too (associative arrays, impleented with trees and not hashes),
so the source code needed is pretty short.

Bye,
bearophile

Aug 23 '06 #53
Ray <ra********@yahoo.comwrote:
Yeah, my guess would be either he used the Debug configuration or he
actually created a Managed executable instead of a pure Win32
application. Sigh, now I can't wait to get home and try it out :)
Can be. But I suppose a Managed should not get *that* slow.
IronPython on Tim's machine is still faster than C++ (even though not as
fast as CPython).

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.
Aug 23 '06 #54

Mc Osten wrote:
Ray <ra********@yahoo.comwrote:
Yeah, my guess would be either he used the Debug configuration or he
actually created a Managed executable instead of a pure Win32
application. Sigh, now I can't wait to get home and try it out :)

Can be. But I suppose a Managed should not get *that* slow.
IronPython on Tim's machine is still faster than C++ (even though not as
fast as CPython).

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.
I have to admit to a stupid mistake, for which I feel quite ashamed - I
got the loop-size wrong in the Python code. So all Python results
posted by me were off by a factor of 10 :-(
I feel quite bad about that!

With the nr of loops corrected, Python on my laptop performs worse than
C++ under all circumstances, by a factor of about 2:

============ Python 2.4 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python24/python.exe SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 4000000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 4.239721 seconds
Slow - Elapsed: 11.883234 seconds

============ Python 2.5 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python25/python.exe SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 4000000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 4.031873 seconds
Slow - Elapsed: 11.314742 seconds
============ GCC 3.4.5, MinGW, -O6 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 2.088 seconds
Slow - Elapsed: 7.033 seconds

============ VC++ 6, 'release' build =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest_VC.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 4.585 seconds
Slow - Elapsed: 5.024 seconds

========== GCC 3.4.5, MinGW, -O6, with most optimized C++ code
==========
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./testcpp.exe
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_addr ess
What do you know?
chicken crosses road
fool
so long...
strings : 2.338
unique strings : 1.109
compared by address : 0.23

LeeuwT@nlshl-leeuwt ~/My Documents/Python

============ IronPython 1.0rc2 =============

IronPython had a hard time coping with it; creating 4 million string
objects is a bit too much and the CLR was eating nearly a gigabyte of
memory near the end.
Here are the numbers:

IronPython 1.0.60816 on .NET 2.0.50727.42
Copyright (c) Microsoft Corporation. All rights reserved.
>>import sys
sys.path.append('C:/Documents and Settings/LeeuwT/My Documents/Python')
import SpeedTest
SpeedTest.run_test()
Begin Test
Number of unique string objects: 4
What do you know
so long...
chicken crosses road
fool
Number of unique string objects: 4000000
What do you know
so long...
chicken crosses road
fool
Fast - Elapsed: 10.501273 seconds
Slow - Elapsed: 371.047343 seconds
>>>
============ Java 1.6.0 b2 =============
Set size: 4
chicken crosses road
What do you know
fool
so long...
Set size: 4
chicken crosses road
What do you know
fool
so long...
Fast - Elapsed 1.003 seconds
Slow - Elapsed 3.96 seconds

============ Java 1.5.0 =============
Set size: 4
fool
What do you know
so long...
chicken crosses road
Set size: 4
fool
What do you know
so long...
chicken crosses road
Fast - Elapsed 1.754 seconds
Slow - Elapsed 5.044 seconds
=========================

Note that the Python code creates a set of all unique id's of all
objects in list a, and prints the length of this set, to verify that
all strings are really unique instances or duplicate instances. The C++
versions don't do that (at least not for 4 million strings); so Python
is at a slight disadvantage here. Printing the number of strings still
didn't help me catch the off-by-ten errors though.

I included a Java version of the program, and it looks like it performs
quite well compared to C++ both with jdk1.5 and jdk1.6.
I humbly apologize for my misinformation yesterday.

Regards,

--Tim

Aug 23 '06 #55
Ray

Tim N. van der Leeuw wrote:
With the nr of loops corrected, Python on my laptop performs worse than
C++ under all circumstances, by a factor of about 2:
*Phew*

Great to know that my model of how the world works is still correct!
(at least in relation to Python and C++!) :)

Thanks,
Ray
>
============ Python 2.4 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python24/python.exe SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 4000000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 4.239721 seconds
Slow - Elapsed: 11.883234 seconds

============ Python 2.5 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python25/python.exe SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 4000000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 4.031873 seconds
Slow - Elapsed: 11.314742 seconds
============ GCC 3.4.5, MinGW, -O6 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 2.088 seconds
Slow - Elapsed: 7.033 seconds

============ VC++ 6, 'release' build =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest_VC.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 4.585 seconds
Slow - Elapsed: 5.024 seconds

========== GCC 3.4.5, MinGW, -O6, with most optimized C++ code
==========
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./testcpp.exe
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_addr ess
What do you know?
chicken crosses road
fool
so long...
strings : 2.338
unique strings : 1.109
compared by address : 0.23

LeeuwT@nlshl-leeuwt ~/My Documents/Python

============ IronPython 1.0rc2 =============

IronPython had a hard time coping with it; creating 4 million string
objects is a bit too much and the CLR was eating nearly a gigabyte of
memory near the end.
Here are the numbers:

IronPython 1.0.60816 on .NET 2.0.50727.42
Copyright (c) Microsoft Corporation. All rights reserved.
>import sys
sys.path.append('C:/Documents and Settings/LeeuwT/My Documents/Python')
import SpeedTest
SpeedTest.run_test()
Begin Test
Number of unique string objects: 4
What do you know
so long...
chicken crosses road
fool
Number of unique string objects: 4000000
What do you know
so long...
chicken crosses road
fool
Fast - Elapsed: 10.501273 seconds
Slow - Elapsed: 371.047343 seconds
>>

============ Java 1.6.0 b2 =============
Set size: 4
chicken crosses road
What do you know
fool
so long...
Set size: 4
chicken crosses road
What do you know
fool
so long...
Fast - Elapsed 1.003 seconds
Slow - Elapsed 3.96 seconds

============ Java 1.5.0 =============
Set size: 4
fool
What do you know
so long...
chicken crosses road
Set size: 4
fool
What do you know
so long...
chicken crosses road
Fast - Elapsed 1.754 seconds
Slow - Elapsed 5.044 seconds
=========================

Note that the Python code creates a set of all unique id's of all
objects in list a, and prints the length of this set, to verify that
all strings are really unique instances or duplicate instances. The C++
versions don't do that (at least not for 4 million strings); so Python
is at a slight disadvantage here. Printing the number of strings still
didn't help me catch the off-by-ten errors though.

I included a Java version of the program, and it looks like it performs
quite well compared to C++ both with jdk1.5 and jdk1.6.
I humbly apologize for my misinformation yesterday.

Regards,

--Tim
Aug 23 '06 #56

RaySame here, although that said Python's implementation of those data
Raystructure must already be as optimal as mortals can do it.

Perhaps more optimal. We've had (tim)bots working on the problem for years.

Skip
Aug 23 '06 #57
>Yes it is. But of course you can't sat that "Python is faster than
C++".
HaraldOf course not. Python is faster then assembler. Proofed @
HaraldEuroPython 2006 in CERN, near the LHC Beta, in the same room
Haraldmany Nobel laurates gave their presentations before.

Harald, do you have a reference to a talk abstract or a paper?

Thx,

Skip
Aug 23 '06 #58
sk**@pobox.com wrote:
Yes it is. But of course you can't sat that "Python is faster than
C++".

HaraldOf course not. Python is faster then assembler. Proofed @
HaraldEuroPython 2006 in CERN, near the LHC Beta, in the same room
Haraldmany Nobel laurates gave their presentations before.

Harald, do you have a reference to a talk abstract or a paper?
Via the lightning talks page:

http://wiki.python.org/moin/EuroPy2006LightningTalks

Here's a direct link:

http://wiki.python.org/moin/EuroPy20...rLightning.odp

Of course, Harald's interpretation of the "faster than" qualification
is broader than a pure assessment of raw performance, so I wouldn't
expect too many technical revelations. And while footage exists of at
least one talk in the CERN Auditorium which led to a Nobel prize, I
don't think that any footage of the EuroPython lightning talks (or of
any of the other talks) has been released just yet.

Paul

Aug 23 '06 #59
I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
- This code runs amortized 100ms on my machien (vc.net 2003 pro,
dinkumware stl, p4m 2.2GHz thinkpad, windows 2003 server), (10 loops in
1000ms)
- with STLPort 5.2, this code runs in 39-43ms (10 loops in 390ms ->
430ms).

a. did you compile in release mode? and if yes, vc.net _standard_
edition has no optimizing compiler in release mode, you need vc.net pro
or enterprise. or did you use vc.net 2005 ?

you should also include <algorithmfor ostream_operator.

Aug 23 '06 #60
Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.
Hi, you are benching heap allocations and of course heap fragmentation.
this is what devpartner c++ profiler had to say:

Method %in % with Called Average Real Name Method Children
-----------------------------------------------------------------
RtlFreeHeap 52,8 52,8 81.034 7,6 7,7
RtlAllocateHeap 19,8 19,8 81.089 2,9 2,9
main 16,9 89,7 1 198083,2 1052376,0
ExitProcess 10,3 10,3 1 120616,8 120641,3
...

So on linux its a lot better than that, because - I think - ptmalloc is
used which is based on dmalloc which is efficient for those small size
allocs (which malloc() and free() were never designed for).

on win32, activate the low fragmenting heap or write a custom stl
allocator.

_set_sbh_threshold(128); // method 1

ULONG HeapFragValue = 2;
HeapSetInformation((HANDLE)_get_heap_handle(), // method 2
HeapCompatibilityInformation,
&HeapFragValue,
sizeof(HeapFragValue));

last non-python message from me :)

Aug 23 '06 #61
Tim N. van der Leeuw <ti*************@nl.unisys.comwrote:
I have to admit to a stupid mistake, for which I feel quite ashamed - I
got the loop-size wrong in the Python code. So all Python results
posted by me were off by a factor of 10 :-(
I feel quite bad about that!
Well, this makes *my* results quite surprising.
I checked it threetimes. I loop 1000000 times in both Python and C++,
and Python here is faster.

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.
Aug 23 '06 #62
Ray <ra********@yahoo.comwrote:
Great to know that my model of how the world works is still correct!
(at least in relation to Python and C++!) :)
So please explain my results. I loop the same number of times.

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.
Aug 23 '06 #63
On 2006-08-23, Amanjit Gill <am****************@googlemail.comwrote:
you should also include <algorithmfor ostream_operator.
<ostream>, actually.

--
Neil Cerutti
Aug 23 '06 #64
On 2006-08-23, Mc Osten <ri**@despammed.comwrote:
Ray <ra********@yahoo.comwrote:
>Great to know that my model of how the world works is still
correct! (at least in relation to Python and C++!) :)

So please explain my results. I loop the same number of times.
Those of you experiencing a temporary obsession with this topic
are encouraged to study The Great Language Shootout, until the
obsession goes away. ;)

Your time might not be totally wasted; an opportunity to improve
the Python solutions may present itself.

http://shootout.alioth.debian.org/

--
Neil Cerutti
Strangely, in slow motion replay, the ball seemed to hang in the
air for even longer. --David Acfield
Aug 24 '06 #65

AmanjitSo on linux its a lot better than that, because - I think -
Amanjitptmalloc is used which is based on dmalloc which is efficient
Amanjitfor those small size allocs (which malloc() and free() were
Amanjitnever designed for).

And which for Python has been further optimized in obmalloc. Again, Python
wins. ;-)

Skip
Aug 24 '06 #66
Neil Cerutti <ho*****@yahoo.comwrote:
Those of you experiencing a temporary obsession with this topic
are encouraged to study The Great Language Shootout, until the
obsession goes away. ;)
I think that everybody knows GLS. However, when I have results different
from what I expected, I try to understand where I did the wrong
assumption.

However, a recent post kind of explains what the problem is.

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.
Aug 24 '06 #67
Neil Cerutti <ho*****@yahoo.comwrote:
Those of you experiencing a temporary obsession with this topic
are encouraged to study The Great Language Shootout, until the
obsession goes away. ;)
I think that everybody knows GLS. However, when I have results different
from what I expected, I try to understand where I did the wrong
assumption.

But a recent post kind of explains what the problem is.

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.
Aug 24 '06 #68

Licheng Fang wrote:
Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.

//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;

int main(){
vector<stringa;
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<stringb(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}
I think you probably want this C++ code instead:

//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;

int main(){
vector<stringa;
a.reserve( 40000 ); //<------------------ note this change
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<stringb(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout,
"\n"));
}

It will run a lot faster if it doesn't have to keep resizing the array.

Aug 24 '06 #69
On 2006-08-24, JS*******@gmail.com <JS*******@gmail.comwrote:
>
Licheng Fang wrote:
>Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.

//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;

int main(){
vector<stringa;
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<stringb(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}

I think you probably want this C++ code instead:

//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;

int main(){
vector<stringa;
a.reserve( 40000 ); //<------------------ note this change
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<stringb(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout,
"\n"));
}

It will run a lot faster if it doesn't have to keep resizing
the array.
I don't see why it should run a lot faster that way.

Appending elements to a vector with push_back takes amortized
constant time. In the example above, preallocating 40000 strings
saves (probably) math.log(40000, 2) reallocations of the vector's
storage along with the consequent copy-construction and
destruction of some fixed number of strings. Most of those
reallocations take place while the vector is still proportionally
quite small.

--
Neil Cerutti
Aug 24 '06 #70
Neil Cerutti wrote:
On 2006-08-24, JS*******@gmail.com <JS*******@gmail.comwrote:
It will run a lot faster if it doesn't have to keep resizing
the array.

I don't see why it should run a lot faster that way.

Appending elements to a vector with push_back takes amortized
constant time. In the example above, preallocating 40000 strings
saves (probably) math.log(40000, 2) reallocations of the vector's
storage along with the consequent copy-construction and
destruction of some fixed number of strings. Most of those
reallocations take place while the vector is still proportionally
quite small.
Math.log(40000, 2) is not a small number when talking about a
relatively expensive operation such as memory allocation and
deallocation. And the superfluous copying amounts to probably an extra
2^16 copies in this case - not exactly trivial.

--
Ben Sizer

Aug 25 '06 #71
Ray

Neil Cerutti wrote:
I don't see why it should run a lot faster that way.

Appending elements to a vector with push_back takes amortized
constant time. In the example above, preallocating 40000 strings
saves (probably) math.log(40000, 2) reallocations of the vector's
storage along with the consequent copy-construction and
destruction of some fixed number of strings. Most of those
reallocations take place while the vector is still proportionally
quite small.
I see your logic, however it does make it run a lot faster. On my
machine, with 1000000 repetition, what used to take 5+ seconds now
takes only about 1+++ to 2 secs.
>
--
Neil Cerutti
Aug 25 '06 #72
On 2006-08-25, Ben Sizer <ky*****@gmail.comwrote:
Neil Cerutti wrote:
>On 2006-08-24, JS*******@gmail.com <JS*******@gmail.comwrote:
It will run a lot faster if it doesn't have to keep resizing
the array.

I don't see why it should run a lot faster that way.

Appending elements to a vector with push_back takes amortized
constant time. In the example above, preallocating 40000
strings saves (probably) math.log(40000, 2) reallocations of
the vector's storage along with the consequent
copy-construction and destruction of some fixed number of
strings. Most of those reallocations take place while the
vector is still proportionally quite small.

Math.log(40000, 2) is not a small number when talking about a
relatively expensive operation such as memory allocation and
deallocation. And the superfluous copying amounts to probably
an extra 2^16 copies in this case - not exactly trivial.
Actually, I misread the suggestion.

I thought I saw:

vector<stringv;
v.reserve(40000);

instead of:

vector<stringv(40000);

The latter actually *slows* the program, according to my tests. I
think the poster meant to suggest the former, and that's what I
was discussing.

I increased the number of iterations from 10000 to 100000 for the
tests I'm reporting.

The best of five, with reserve, was 2363 clock ticks. Without
reserve, it was 3244. The suggestion to use an initial size
resulted in 3655 clocks.

So it appears you were more right (using reserve sped up the
program by 35%, just about as the numbers predict), although we
were both wrong. ;)

--
Neil Cerutti
8 new choir robes are currently needed, due to the addition of
several new members and to the deterioration of some of the
older ones. --Church Bulletin Blooper
Aug 25 '06 #73
I tested in Common Lisp and compared the result with python.

My PC is: 3.6GH Pentium4
My OS is: Ubuntu 6.06 i386
My lisp implementation is SBCL 0.9.8 for i386
My python's version is: V2.4.3
Both implementations were installed via apt-get install.

Here's my Lisp program:

++++++++++++++++++++++++++++++++++++++++++++++++++
;;; from slow to fast implementation

(proclaim '(optimize (speed 3) (debug 0) (safety 0) (space 0)))

(defun test1 ()
(let ((a (make-array 0 :adjustable t :fill-pointer 0))
(b nil))
(dotimes (i 1000000)
(progn
(vector-push-extend "What do you know" a)
(vector-push-extend "so long ..." a)
(vector-push-extend "chicken crosses road" a)
(vector-push-extend "fool" a)))
(setf b (remove-duplicates a :test #'string-equal))
(map 'vector #'print b)))

(defun test2 ()
(let ((a (make-array 0 :adjustable t :fill-pointer 0))
(b nil))
(dotimes (i 1000000)
(progn
(vector-push-extend "What do you know" a)
(vector-push-extend "so long ..." a)
(vector-push-extend "chicken crosses road" a)
(vector-push-extend "fool" a)))
(setf b (remove-duplicates a :test #'eq))
(map 'vector #'print b)))

(defun test3 ()
(let ((a (make-array 4000000 :element-type 'string
:adjustable nil
:fill-pointer 0))
(b nil))
(dotimes (i 1000000)
(progn
(vector-push "What do you know" a)
(vector-push "so long ..." a)
(vector-push "chicken crosses road" a)
(vector-push "fool" a)))
(setf b (remove-duplicates a))
(map 'vector #'print b)))

(defun test4 ()
(let ((a (make-array 4000000 :element-type 'string
:adjustable nil))
(b nil))
(dotimes (i 1000000)
(progn
(let ((j (1- (* 4 i))))
(setf (aref a (incf j)) "What do you know")
(setf (aref a (incf j)) "so long ...")
(setf (aref a (incf j)) "chicken crosses road")
(setf (aref a (incf j)) "fool"))))
(setf b (remove-duplicates a))
(map 'vector #'print b)))

++++++++++++++++++++++++++++++++++++++++++++++++++ +++++

1. When using string equal comparator (test #1), the speed slows down
dramatically. 1.93 -9.35
2. It seems that append(vector-push) in lisp costs much more than
direct access via index.

Here's my benchmark result:

++++++++++++++++++++++++++++++++++++++++++++++++++ +++++
#1
(time (test1))

"What do you know"
"so long ..."
"chicken crosses road"
"fool"
Evaluation took:
9.346 seconds of real time
9.020564 seconds of user run time
0.328021 seconds of system run time
0 page faults and
457,565,688 bytes consed.
#2
(time (test2))

"What do you know"
"so long ..."
"chicken crosses road"
"fool"
Evaluation took:
1.931 seconds of real time
1.884118 seconds of user run time
0.048003 seconds of system run time
0 page faults and
49,561,312 bytes consed.
#3
(time (test3))

"What do you know"
"so long ..."
"chicken crosses road"
"fool"
Evaluation took:
1.721 seconds of real time
1.704107 seconds of user run time
0.016001 seconds of system run time
0 page faults and
32,012,040 bytes consed.
#4
(time (test4))

"What do you know"
"so long ..."
"chicken crosses road"
"fool"
Evaluation took:
0.843 seconds of real time
0.824051 seconds of user run time
0.020001 seconds of system run time
0 page faults and
32,012,040 bytes consed.

++++++++++++++++++++++++++++++++++++++++++++++++++ +

Here's my python's code:

ef f():
a = []
for i in range(1000000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

import time
from time import clock

f_start = clock()
f()
f_end = clock()

print "Elapsed: %f seconds" % (f_end - f_start)

And benchmark result:

python -O test.py
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.260000 seconds
Oops, Python beat native-compiled lisp code.

++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++

Any way,

MY CONCLUSION is: C++ is not the right model for symbolic computation.

Aug 25 '06 #74
Pebblestone:
(defun test4 ()
(let ((a (make-array 4000000 :element-type 'string
:adjustable nil))
(b nil))
(dotimes (i 1000000)
(progn
(let ((j (1- (* 4 i))))
(setf (aref a (incf j)) "What do you know")
(setf (aref a (incf j)) "so long ...")
(setf (aref a (incf j)) "chicken crosses road")
(setf (aref a (incf j)) "fool"))))
(setf b (remove-duplicates a))
(map 'vector #'print b)))

That test4 function can be compared to this one, with explicit
preallocation (and xrange instead of range!):

def f2():
n = 1000000
a = [None] * n * 4
for i in xrange(0, n*4, 4):
a[i] = 'What do you know'
a[i+1] = 'so long...'
a[i+2] = 'chicken crosses road'
a[i+3] = 'fool'
for s in set(a):
print s

But note this a list (that is an array, a list is a different data
structure) of python becomes filled with pointers. I don't know what
your CL does exactly.

I can also suggest you to use Psyco too here
(http://psyco.sourceforge.net/):

import psyco
psyco.bind(f2)

It makes that f2 more than twice faster here.

Bye,
bearophile

Aug 25 '06 #75
But note this a list (that is an array, a list is a different data
structure) of python becomes filled with pointers. I don't know what
your CL does exactly.
I heard that python's list is implemented as adjustable array.

Here's my lisp implementation:

++++++++++++++++++
(defun test-list ()
(let ((a nil)
(b nil))
(dotimes (i 1000000)
(progn
(push "What do you know" a)
(push "so long ..." a)
(push "chicken crosses road" a)
(push "fool" a)))
(setf b (remove-duplicates a))
(map 'list #'print b)))
+++++++++++++++++++++++++

And the benchmark result:

(time (test-list))

"fool"
"chicken crosses road"
"so long ..."
"What do you know"
Evaluation took:
2.88 seconds of real time
2.744172 seconds of user run time
0.136009 seconds of system run time
0 page faults and
74,540,392 bytes consed.

++++++++++++++++++++++++++++++

BTW, I couldn't install psyco on my system (ubuntu), gcc just prompt to
me thousands of lines of errors and warnings.


be************@lycos.com wrote:
Pebblestone:
(defun test4 ()
(let ((a (make-array 4000000 :element-type 'string
:adjustable nil))
(b nil))
(dotimes (i 1000000)
(progn
(let ((j (1- (* 4 i))))
(setf (aref a (incf j)) "What do you know")
(setf (aref a (incf j)) "so long ...")
(setf (aref a (incf j)) "chicken crosses road")
(setf (aref a (incf j)) "fool"))))
(setf b (remove-duplicates a))
(map 'vector #'print b)))


That test4 function can be compared to this one, with explicit
preallocation (and xrange instead of range!):

def f2():
n = 1000000
a = [None] * n * 4
for i in xrange(0, n*4, 4):
a[i] = 'What do you know'
a[i+1] = 'so long...'
a[i+2] = 'chicken crosses road'
a[i+3] = 'fool'
for s in set(a):
print s

But note this a list (that is an array, a list is a different data
structure) of python becomes filled with pointers. I don't know what
your CL does exactly.

I can also suggest you to use Psyco too here
(http://psyco.sourceforge.net/):

import psyco
psyco.bind(f2)

It makes that f2 more than twice faster here.

Bye,
bearophile
Aug 25 '06 #76
Oh, I forgot.

Your python's example (use direct index array index) of my
corresponding lisp code works slower than the version which use
'append'.

This let me think how python's list is implemented.
Anyway, python's list is surprisingly efficient.
be************@lycos.com wrote:
Pebblestone:
(defun test4 ()
(let ((a (make-array 4000000 :element-type 'string
:adjustable nil))
(b nil))
(dotimes (i 1000000)
(progn
(let ((j (1- (* 4 i))))
(setf (aref a (incf j)) "What do you know")
(setf (aref a (incf j)) "so long ...")
(setf (aref a (incf j)) "chicken crosses road")
(setf (aref a (incf j)) "fool"))))
(setf b (remove-duplicates a))
(map 'vector #'print b)))


That test4 function can be compared to this one, with explicit
preallocation (and xrange instead of range!):

def f2():
n = 1000000
a = [None] * n * 4
for i in xrange(0, n*4, 4):
a[i] = 'What do you know'
a[i+1] = 'so long...'
a[i+2] = 'chicken crosses road'
a[i+3] = 'fool'
for s in set(a):
print s

But note this a list (that is an array, a list is a different data
structure) of python becomes filled with pointers. I don't know what
your CL does exactly.

I can also suggest you to use Psyco too here
(http://psyco.sourceforge.net/):

import psyco
psyco.bind(f2)

It makes that f2 more than twice faster here.

Bye,
bearophile
Aug 25 '06 #77
Here's the result:

What do you know
fool
chicken crosses road
f elapsed: 1.260000 seconds
f2 elapsed 2.110000 seconds

be************@lycos.com wrote:
Pebblestone:
(defun test4 ()
(let ((a (make-array 4000000 :element-type 'string
:adjustable nil))
(b nil))
(dotimes (i 1000000)
(progn
(let ((j (1- (* 4 i))))
(setf (aref a (incf j)) "What do you know")
(setf (aref a (incf j)) "so long ...")
(setf (aref a (incf j)) "chicken crosses road")
(setf (aref a (incf j)) "fool"))))
(setf b (remove-duplicates a))
(map 'vector #'print b)))


That test4 function can be compared to this one, with explicit
preallocation (and xrange instead of range!):

def f2():
n = 1000000
a = [None] * n * 4
for i in xrange(0, n*4, 4):
a[i] = 'What do you know'
a[i+1] = 'so long...'
a[i+2] = 'chicken crosses road'
a[i+3] = 'fool'
for s in set(a):
print s

But note this a list (that is an array, a list is a different data
structure) of python becomes filled with pointers. I don't know what
your CL does exactly.

I can also suggest you to use Psyco too here
(http://psyco.sourceforge.net/):

import psyco
psyco.bind(f2)

It makes that f2 more than twice faster here.

Bye,
bearophile
Aug 25 '06 #78
Pebblestone:
>I heard that python's list is implemented as adjustable array.
Correct, an array geometrically adjustable on the right.

>Here's my lisp implementation:<
What's the memory size of a before computing b? You can compare it with
Python, that may need less memory (because the array contains
pointers).

>BTW, I couldn't install psyco on my system (ubuntu), gcc just prompt to me thousands of lines of errors and warnings.<
Find a Win box ;-) It's already compiled for it (for Py 2.3, 2.4).

>Your python's example (use direct index array index) of my corresponding lisp code works slower than the version which use 'append'.<
For me (a slow PC) it's almost twice faster, computer life is usually
complex.
For me using the esplicit allocation + Psyco makes that program about 4
times faster (from 8 to 2 seconds).

>This let me think how python's list is implemented.<
You also have to think how the * allocation is implemented and many
other things :-)
The list implementation is rather readable, Python sources are online
too.

>Anyway, python's list is surprisingly efficient.<
But its access isn't that fast :-) Psyco helps.

Bye,
bearophile

Aug 25 '06 #79
What's the memory size of a before computing b? You can compare it with
Python, that may need less memory (because the array contains
pointers).


Here's the memory usage:

1) before the loop ( fully garbage collected)
29,052,560 bytes, 757,774 objects.

2) after the loop
103,631,952 bytes, 8,760,495 objects.

It seems A has consumed 74M bytes, 8bytes each cell. That make sense
because a cell in list consists of 2 pointers, (car cdr), and an mem
address is 32 bit.

be************@lycos.com wrote:
Pebblestone:
I heard that python's list is implemented as adjustable array.

Correct, an array geometrically adjustable on the right.

Here's my lisp implementation:<

What's the memory size of a before computing b? You can compare it with
Python, that may need less memory (because the array contains
pointers).

BTW, I couldn't install psyco on my system (ubuntu), gcc just prompt to me thousands of lines of errors and warnings.<

Find a Win box ;-) It's already compiled for it (for Py 2.3, 2.4).

Your python's example (use direct index array index) of my corresponding lisp code works slower than the version which use 'append'.<

For me (a slow PC) it's almost twice faster, computer life is usually
complex.
For me using the esplicit allocation + Psyco makes that program about 4
times faster (from 8 to 2 seconds).

This let me think how python's list is implemented.<

You also have to think how the * allocation is implemented and many
other things :-)
The list implementation is rather readable, Python sources are online
too.

Anyway, python's list is surprisingly efficient.<

But its access isn't that fast :-) Psyco helps.

Bye,
bearophile
Aug 25 '06 #80
Sorry, I did some miscalculation.... what a shame.....


be************@lycos.com wrote:
Pebblestone:
I heard that python's list is implemented as adjustable array.

Correct, an array geometrically adjustable on the right.

Here's my lisp implementation:<

What's the memory size of a before computing b? You can compare it with
Python, that may need less memory (because the array contains
pointers).

BTW, I couldn't install psyco on my system (ubuntu), gcc just prompt to me thousands of lines of errors and warnings.<

Find a Win box ;-) It's already compiled for it (for Py 2.3, 2.4).

Your python's example (use direct index array index) of my corresponding lisp code works slower than the version which use 'append'.<

For me (a slow PC) it's almost twice faster, computer life is usually
complex.
For me using the esplicit allocation + Psyco makes that program about 4
times faster (from 8 to 2 seconds).

This let me think how python's list is implemented.<

You also have to think how the * allocation is implemented and many
other things :-)
The list implementation is rather readable, Python sources are online
too.

Anyway, python's list is surprisingly efficient.<

But its access isn't that fast :-) Psyco helps.

Bye,
bearophile
Aug 25 '06 #81
Pebblestone:
Sorry, I did some miscalculation.... what a shame.....
Don't worry.
For me using Py 2.4.3 those memory values are 4536 before and 20184 kb
after, it means a difference of 15648 kb, that equals to about 16023552
bytes, that equals to about 1000000 * 4 * 4. That means 4 bytes for
each string reference into the array. If you don't use Psyco the
starting memory value is quite lower.

If you don't use Psyco but you use the append, the values are 1384 and
18880, this means about 4.47 bytes / value, because python lists grow
geometrically, so there is some wasted space.

I find the memory used with PsList called from the Python script:
http://www.sysinternals.com/Utilities/PsList.html

Bye,
bearophile

Aug 25 '06 #82
Ruby is also not far away :-)

Here's my code:

++++++++++++++++++++++++++++++++++++++++
require 'time'

def f
a = []
1000000.times do
a.push "What do you know"
a.push "so long ..."
a.push "chicken crosses road"
a.push "fool"
end
b = a.uniq
b.each do |x|
puts x
end
end

def f2
a = Array.new(4000000)
1000000.times do |i|
a[i] = "What do you know"
a[i+1] = "so long ..."
a[i+2] = "chicken crosses road"
a[i+3] = "fool"
end
b = a.uniq
b.each do |x|
puts x
end
end
f_start = Time.now
f
f_end = Time.now

f2_start = Time.now
f2
f2_end = Time.now

puts "f: Elapsed time: #{f_end - f_start} sec."
puts "f2: Elapsed time: #{f2_end - f_start} sec."

++++++++++++++++++++++++++++++++++++++++++

And the benchmark result:

What do you know
so long ...
chicken crosses road
fool
What do you know
so long ...
chicken crosses road
fool
nil
f: Elapsed time: 3.600294 sec.
f2: Elapsed time: 11.182927 sec.

++++++++++++++++++++++++++++++++++++++++++

I like ruby because its purity. :p

Licheng Fang wrote:
Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.

//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;

int main(){
vector<stringa;
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<stringb(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}

#python
def f():
a = []
for i in range(10000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?
Aug 25 '06 #83
---------------------------- C++ --------------------------
#include <iostream>
#include <vector>
#include <string>
#include <set>
#include <algorithm>
#include <windows.h>

using namespace std;

int main()
{
DWORD ticks = ::GetTickCount();

const string s1("What do you know");
const string s2("So long...");
const string s3("chicken crosses road");
const string s4("fool");

typedef vector<const string*str_vector_t;
typedef str_vector_t::iterator vec_iter;
typedef set<const string*str_set_t;
typedef str_set_t::const_iterator set_iter;
const int size = 1000000;
str_vector_t vec;
vec.reserve(size*4);

for(int i = 0; i < size; ++i){
vec.push_back(&s1);
vec.push_back(&s2);
vec.push_back(&s3);
vec.push_back(&s4);
}

vec_iter new_end = unique(vec.begin(), vec.end());
str_set_t set_(vec.begin(), new_end);

for(set_iter it = set_.begin(); it != set_.end(); ++it){
cout<<*it<<endl;
}
cout<<::GetTickCount()-ticks<<endl;

return 0;
}

In MS VC+ 2005 in release configuration it gets the work done in 187
ms.

---------------- Python + Psyco----------------
def f():
a = []
for i in range(1000000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

import time
from time import clock

import psyco
psyco.full()
f_start = clock()
f()
f_end = clock()

print "Elapsed: %f seconds" % (f_end - f_start)

In Python in manages to do the same in 1.8 secs (psyco boosts it to
0.7; see below)
so long...
What do you know
fool
chicken crosses road
Elapsed: 0.772899 seconds
Well, that's how it is in my book.

Regards,
Andrei

Aug 27 '06 #84

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

220 posts views Thread by Brandon J. Van Every | last post: by
42 posts views Thread by Bicho Verde | last post: by
4 posts views Thread by Wolfgang Keller | last post: by
267 posts views Thread by Xah Lee | last post: by
29 posts views Thread by 63q2o4i02 | last post: by
1 post views Thread by krishnakant Mane | last post: by
24 posts views Thread by Mark | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.