By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,997 Members | 1,293 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,997 IT Pros & Developers. It's quick & easy.

Problem with STL vector peformance, benchmarks included

P: n/a
I found that old post:
http://groups.google.com/group/comp....519204726d01e8

I just erased the #include <kubux.....lines.

****** old post for your convenince ********
You are right:

#include <vector>
#include <iostream>
#include <ctime>
#include <memory>

#include <kubux/bits/allocator.cc>
#include <kubux/bits/new_delete_allocator.cc>
#include <kubux/bits/malloc_free_allocator.cc>

template < typename T, typename Alloc = std::allocator<T
class stupid {
public:

typedef Alloc allocator;
typedef typename allocator::value_type value_type;
typedef typename allocator::size_type size_type;
typedef typename allocator::difference_type difference_type;
typedef typename allocator::pointer pointer;
typedef typename allocator::const_pointer const_pointer;
typedef typename allocator::reference reference;
typedef typename allocator::const_reference const_reference;

typedef pointer iterator;
typedef const_pointer
const_iterator;
typedef typename std::reverse_iterator< iterator >
reverse_iterator;
typedef typename std::reverse_iterator< const_iterator >
const_reverse_iterator;

private:

pointer ptr;
size_type the_size;

public:

stupid ( size_type length ) :
ptr ( new T [ length ] ),
the_size ( length )
{
for ( iterator iter = this->ptr;
iter != this->ptr + the_size;
++ iter ) {
::new( static_cast<void*>(iter) ) T();
}
}

~stupid ( void ) {
iterator iter = ptr + the_size;
while ( iter ptr ) {
-- iter;
iter->~T();
}
{
allocator alloc;
alloc.deallocate( ptr, the_size );
}
the_size = 0;
}

reference operator[] ( size_type index ) {
return( this->ptr[ index ] );
}

const_reference operator[] ( size_type index ) const {
return( this->ptr[ index ] );
}

}; // stupid

int main ( void ) {
const unsigned long l = 50000000;
{
std::vector< int v ( l );
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "vector: " << loop_end - loop_start << std::endl;
}
{
int* v = new int [ l ];
std::fill_n(v, l, 0);
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "array: " << loop_end - loop_start << std::endl;
}
{
stupid< int, std::allocator<int v ( l );
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "stupid: " << loop_end - loop_start << std::endl;
}
{
std::vector<intv ( l );
std::clock_t loop_start = std::clock();
for ( std::vector<int>::iterator i = v.begin();
i != v.end(); ++i ) {
*i = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "ptr: " << loop_end - loop_start << std::endl;
}
{
int* v = new int [ l ];
std::fill_n(v, l, 0);
std::clock_t loop_start = std::clock();
for ( int* i = v; i < v+l; ++i ) {
*i = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "ptr: " << loop_end - loop_start << std::endl;
}

}
a.out
vector: 320000
array: 320000
stupid: 350000
iterator: 340000
ptr: 340000

No surprises anymore.

Thanks

Kai-Uwe Bux
************************************************** *

I ran the reported test on visual studio professional 2005 with its
standard STL implementation, which should be supplyed by Dinkumware.
My cpu is a dual core t2500 with 2gb ddr2.

I tryed both the intel 9.1 compiler and the Microsoft one.
In both cases I used the O3 optimizations, release mode, and with the
Intel one I also tryed the /Qansi_alias /Qipo options.

Results:

Microsoft:
vector: 141
array: 94
stupid: 93
ptr: 172
ptr: 78

Intel:
vector: 312
array: 156 // becomes 45 if I require P4 extensions, other values
remains nearly the same
stupid: 157
ptr: 1047
ptr: 156

I admit I'm quite disappointed wit the reults obtained with the Intel
compiler.
Is there any fault in the way the tast was conducted or with the
source code I posted?
If everything is correct, how could I investigate where is the
problem?

Cheers
StephQ

Apr 30 '07 #1
Share this Question
Share on Google+
10 Replies


P: n/a
On 30 Apr 2007 05:48:31 -0700, StephQ wrote:
>I ran the reported test on visual studio professional 2005 with its
standard STL implementation, which should be supplyed by Dinkumware.
My cpu is a dual core t2500 with 2gb ddr2.
I tryed both the intel 9.1 compiler and the Microsoft one.
In both cases I used the O3 optimizations, release mode, and with the
Intel one I also tryed the /Qansi_alias /Qipo options.
Have you turned off checked iterators? (see:
http://www.codeproject.com/vcpp/stl/...diterators.asp)
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch
Apr 30 '07 #2

P: n/a
Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)

Thank you for very usefull suggestion. I didn't know that checked
iterators were turned on even in release mode in vc8 by default.

The new results (with checked iterators turned off) are:

Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96

Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140

The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.

Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?

Thank you

Cheers
StephQ

Apr 30 '07 #3

P: n/a
On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:
Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)

Thank you for very usefull suggestion. I didn't know that checked
iterators were turned on even in release mode in vc8 by default.

The new results (with checked iterators turned off) are:

Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96

Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140

The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.

Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?

Thank you

Cheers
StephQ
I reply to myself just to tell you that I don't mind investigating any
more these issues.
I ran the test using doubles instead of int and the results are very
similar, with the microsoft compiler having something like 3% more
performance.

However the Stepanov Abstraction test favours the intel compiler by a
large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2

With Microsoft:
1.11

A curiosity..... how is it possible to get an abstraction penalty
below 1 ?

Chhers
StephQ
Apr 30 '07 #4

P: n/a
On Apr 30, 11:35 am, StephQ <askmeo...@mailinator.comwrote:
[snip]
However the Stepanov Abstraction test favours the intel compiler by a
large margin.
Could you clue me in on what a "Stepanov Abstraction" test is?
Socks

Apr 30 '07 #5

P: n/a
On 30 Apr., 17:41, Puppet_Sock <puppet_s...@hotmail.comwrote:
On Apr 30, 11:35 am, StephQ <askmeo...@mailinator.comwrote:
[snip]
However the Stepanov Abstraction test favours the intel compiler by a
large margin.

Could you clue me in on what a "Stepanov Abstraction" test is?
Socks
The Stepanov abstraction penalty is a benchmark made by Alexander
Stepanov, the man behind STL and templates in C++. Google if you want
to know more.

/Peter

Apr 30 '07 #6

P: n/a
On 30 Apr., 17:35, StephQ <askmeo...@mailinator.comwrote:
On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:


Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)
Thank you for very usefull suggestion. I didn't know that checked
iterators were turned on even in release mode in vc8 by default.
The new results (with checked iterators turned off) are:
Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96
Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140
The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.
Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?
Thank you
Cheers
StephQ

I reply to myself just to tell you that I don't mind investigating any
more these issues.
I ran the test using doubles instead of int and the results are very
similar, with the microsoft compiler having something like 3% more
performance.

However the Stepanov Abstraction test favours the intel compiler by a
large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2

With Microsoft:
1.11

A curiosity..... how is it possible to get an abstraction penalty
below 1 ?
Perhaps because you had a bad test? Rerun the benchmarks more than one
time and remember that caching has a huge effect on results (I believe
a factor of ten is quite normal). So you should know how to e.g. clear
(or fill) the cache as appropriate.
Writing a good benchmark is not easy.

/Peter

Apr 30 '07 #7

P: n/a
On Apr 30, 7:58 pm, peter koch <peter.koch.lar...@gmail.comwrote:
On 30 Apr., 17:35, StephQ <askmeo...@mailinator.comwrote:
On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:
Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)
Thank you for very usefull suggestion. I didn't know that checked
iterators were turned on even in release mode in vc8 by default.
The new results (with checked iterators turned off) are:
Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96
Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140
The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.
Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?
Thank you
Cheers
StephQ
I reply to myself just to tell you that I don't mind investigating any
more these issues.
I ran the test using doubles instead of int and the results are very
similar, with the microsoft compiler having something like 3% more
performance.
However the Stepanov Abstraction test favours the intel compiler by a
large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2
With Microsoft:
1.11
A curiosity..... how is it possible to get an abstraction penalty
below 1 ?

Perhaps because you had a bad test? Rerun the benchmarks more than one
time and remember that caching has a huge effect on results (I believe
a factor of ten is quite normal). So you should know how to e.g. clear
(or fill) the cache as appropriate.
Writing a good benchmark is not easy.

/Peter
I'm quite a newbie....
Do you suggest that the initial run is the "right" one, while
subsequent runs get distrorted by caching or the opposite thing?
By caching you mean that the objects of interests are loaded in the L1/
L2 cache right?
But I obtained these results in a stable way with different runs....

I remember that caching influences the results of subsequent runs of
benchmarks, but I don't understand why. Isn't cache/memory freed after
the software exit?

Anyway I increased the number of calculations in the test becouse it
was taking too few time to run.

StephQ

Apr 30 '07 #8

P: n/a
On 30 Apr 2007 12:53:47 -0700, StephQ wrote:
>On Apr 30, 7:58 pm, peter koch <peter.koch.lar...@gmail.comwrote:
>Perhaps because you had a bad test? Rerun the benchmarks more than one
time and remember that caching has a huge effect on results

I'm quite a newbie....
Do you suggest that the initial run is the "right" one, while
subsequent runs get distrorted by caching or the opposite thing?
By caching you mean that the objects of interests are loaded in the L1/
L2 cache right?
But I obtained these results in a stable way with different runs....
It may be any 'position effect'. Divide the test into functions (one
for each test) and call the functions several (many) times in
randomized order. Include a 'warm up' at the beginning.
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch
Apr 30 '07 #9

P: n/a
On 30 Apr., 21:53, StephQ <askmeo...@mailinator.comwrote:
On Apr 30, 7:58 pm, peter koch <peter.koch.lar...@gmail.comwrote:


On 30 Apr., 17:35, StephQ <askmeo...@mailinator.comwrote:
On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:
Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)
Thank you for very usefull suggestion. I didn't know that checked
iterators were turned on even in release mode in vc8 by default.
The new results (with checked iterators turned off) are:
Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96
Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140
The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.
Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?
Thank you
Cheers
StephQ
I reply to myself just to tell you that I don't mind investigating any
more these issues.
I ran the test using doubles instead of int and the results are very
similar, with the microsoft compiler having something like 3% more
performance.
However the Stepanov Abstraction test favours the intel compiler by a
large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2
With Microsoft:
1.11
A curiosity..... how is it possible to get an abstraction penalty
below 1 ?
Perhaps because you had a bad test? Rerun the benchmarks more than one
time and remember that caching has a huge effect on results (I believe
a factor of ten is quite normal). So you should know how to e.g. clear
(or fill) the cache as appropriate.
Writing a good benchmark is not easy.
/Peter

I'm quite a newbie....
Do you suggest that the initial run is the "right" one, while
subsequent runs get distrorted by caching or the opposite thing?
By caching you mean that the objects of interests are loaded in the L1/
L2 cache right?
Yes.
But I obtained these results in a stable way with different runs....

I remember that caching influences the results of subsequent runs of
benchmarks, but I don't understand why. Isn't cache/memory freed after
the software exit?
No. Caching takes place at the hardware level, so no freeing takes
place just as freeing memory does not remove physical memory.
>
Anyway I increased the number of calculations in the test becouse it
was taking too few time to run.
Right. But try to follow Roland Pibingers advice and see if that
explains anything.

/Peter

Apr 30 '07 #10

P: n/a
peter koch wrote:
On 30 Apr., 21:53, StephQ <askmeo...@mailinator.comwrote:
>On Apr 30, 7:58 pm, peter koch <peter.koch.lar...@gmail.comwrote:


On 30 Apr., 17:35, StephQ <askmeo...@mailinator.comwrote:
On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:
Have you turned off checked iterators?
>
(see:http://www.codeproject.com/vcpp/stl/...diterators.asp)
>>
Thank you for very usefull suggestion. I didn't know that
checked iterators were turned on even in release mode in vc8 by
default.
The new results (with checked iterators turned off) are:
Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96
Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception
handling ptr: 141
ptr: 140
The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35%
better in all the situations except the "vector iterator" one.
Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?
Thank you
Cheers
StephQ
I reply to myself just to tell you that I don't mind
investigating any more these issues.
I ran the test using doubles instead of int and the results are
very similar, with the microsoft compiler having something like
3% more performance.
However the Stepanov Abstraction test favours the intel compiler
by a large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2
With Microsoft:
1.11
A curiosity..... how is it possible to get an abstraction penalty
below 1 ?
Perhaps because you had a bad test? Rerun the benchmarks more than
one time and remember that caching has a huge effect on results (I
believe a factor of ten is quite normal). So you should know how to
e.g. clear (or fill) the cache as appropriate.
Writing a good benchmark is not easy.
/Peter

I'm quite a newbie....
Do you suggest that the initial run is the "right" one, while
subsequent runs get distrorted by caching or the opposite thing?
By caching you mean that the objects of interests are loaded in the
L1/ L2 cache right?
Yes.
>But I obtained these results in a stable way with different runs....

I remember that caching influences the results of subsequent runs of
benchmarks, but I don't understand why. Isn't cache/memory freed
after the software exit?
No. Caching takes place at the hardware level, so no freeing takes
place just as freeing memory does not remove physical memory.
Every sane operating system will clear the memory handed out to a new
process otherwise you could accidentally read what another process
maybe run by another user stored in memory.

I know that Linux does this and I am pretty sure that Windows does it
too nowadays. So memory caching between program runs should never
occur.

--
Markus

Apr 30 '07 #11

This discussion thread is closed

Replies have been disabled for this discussion.