Problem with STL vector peformance, benchmarks included

StephQ

I found that old post:
http://groups.google.com/group/comp....519204726d01e8

I just erased the #include <kubux.....lines.

****** old post for your convenince ********
You are right:

#include <vector>
#include <iostream>
#include <ctime>
#include <memory>

#include <kubux/bits/allocator.cc>
#include <kubux/bits/new_delete_allocator.cc>
#include <kubux/bits/malloc_free_allocator.cc>

template < typename T, typename Alloc = std::allocator<T
class stupid {
public:

typedef Alloc allocator;
typedef typename allocator::value_type value_type;
typedef typename allocator::size_type size_type;
typedef typename allocator::difference_type difference_type;
typedef typename allocator::pointer pointer;
typedef typename allocator::const_pointer const_pointer;
typedef typename allocator::reference reference;
typedef typename allocator::const_reference const_reference;

typedef pointer iterator;
typedef const_pointer
const_iterator;
typedef typename std::reverse_iterator< iterator >
reverse_iterator;
typedef typename std::reverse_iterator< const_iterator >
const_reverse_iterator;

private:

pointer ptr;
size_type the_size;

public:

stupid ( size_type length ) :
ptr ( new T [ length ] ),
the_size ( length )
{
for ( iterator iter = this->ptr;
iter != this->ptr + the_size;
++ iter ) {
::new( static_cast<void*>(iter) ) T();
}
}

~stupid ( void ) {
iterator iter = ptr + the_size;
while ( iter ptr ) {
-- iter;
iter->~T();
}
{
allocator alloc;
alloc.deallocate( ptr, the_size );
}
the_size = 0;
}

reference operator[] ( size_type index ) {
return( this->ptr[ index ] );
}

const_reference operator[] ( size_type index ) const {
return( this->ptr[ index ] );
}

}; // stupid

int main ( void ) {
const unsigned long l = 50000000;
{
std::vector< int v ( l );
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "vector: " << loop_end - loop_start << std::endl;
}
{
int* v = new int [ l ];
std::fill_n(v, l, 0);
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "array: " << loop_end - loop_start << std::endl;
}
{
stupid< int, std::allocator<int v ( l );
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "stupid: " << loop_end - loop_start << std::endl;
}
{
std::vector<intv ( l );
std::clock_t loop_start = std::clock();
for ( std::vector<int>::iterator i = v.begin();
i != v.end(); ++i ) {
*i = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "ptr: " << loop_end - loop_start << std::endl;
}
{
int* v = new int [ l ];
std::fill_n(v, l, 0);
std::clock_t loop_start = std::clock();
for ( int* i = v; i < v+l; ++i ) {
*i = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "ptr: " << loop_end - loop_start << std::endl;
}

}

a.out

vector: 320000
array: 320000
stupid: 350000
iterator: 340000
ptr: 340000

No surprises anymore.

Thanks

Kai-Uwe Bux
************************************************** *

I ran the reported test on visual studio professional 2005 with its
standard STL implementation, which should be supplyed by Dinkumware.
My cpu is a dual core t2500 with 2gb ddr2.

I tryed both the intel 9.1 compiler and the Microsoft one.
In both cases I used the O3 optimizations, release mode, and with the
Intel one I also tryed the /Qansi_alias /Qipo options.

Results:

Microsoft:
vector: 141
array: 94
stupid: 93
ptr: 172
ptr: 78

Intel:
vector: 312
array: 156 // becomes 45 if I require P4 extensions, other values
remains nearly the same
stupid: 157
ptr: 1047
ptr: 156

I admit I'm quite disappointed wit the reults obtained with the Intel
compiler.
Is there any fault in the way the tast was conducted or with the
source code I posted?
If everything is correct, how could I investigate where is the
problem?

Cheers
StephQ

Apr 30 '07 #1

Subscribe Post Reply

3008

Roland Pibinger

On 30 Apr 2007 05:48:31 -0700, StephQ wrote:

>I ran the reported test on visual studio professional 2005 with its
standard STL implementation, which should be supplyed by Dinkumware.
My cpu is a dual core t2500 with 2gb ddr2.
I tryed both the intel 9.1 compiler and the Microsoft one.
In both cases I used the O3 optimizations, release mode, and with the
Intel one I also tryed the /Qansi_alias /Qipo options.

Have you turned off checked iterators? (see:
http://www.codeproject.com/vcpp/stl/...diterators.asp)
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch

Apr 30 '07 #2

StephQ

Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)

Thank you for very usefull suggestion. I didn't know that checked
iterators were turned on even in release mode in vc8 by default.

The new results (with checked iterators turned off) are:

Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96

Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140

The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.

Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?

Thank you

Cheers
StephQ

Apr 30 '07 #3

StephQ

On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:

Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)

Thank you for very usefull suggestion. I didn't know that checked
iterators were turned on even in release mode in vc8 by default.

The new results (with checked iterators turned off) are:

Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96

Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140

The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.

Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?

Thank you

Cheers
StephQ

I reply to myself just to tell you that I don't mind investigating any
more these issues.
I ran the test using doubles instead of int and the results are very
similar, with the microsoft compiler having something like 3% more
performance.

However the Stepanov Abstraction test favours the intel compiler by a
large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2

With Microsoft:
1.11

A curiosity..... how is it possible to get an abstraction penalty
below 1 ?

Chhers
StephQ

Apr 30 '07 #4

Puppet_Sock

On Apr 30, 11:35 am, StephQ <askmeo...@mailinator.comwrote:
[snip]

However the Stepanov Abstraction test favours the intel compiler by a
large margin.

Could you clue me in on what a "Stepanov Abstraction" test is?
Socks

Apr 30 '07 #5

peter koch

On 30 Apr., 17:41, Puppet_Sock <puppet_s...@hotmail.comwrote:

On Apr 30, 11:35 am, StephQ <askmeo...@mailinator.comwrote:
[snip]

However the Stepanov Abstraction test favours the intel compiler by a
large margin.

Could you clue me in on what a "Stepanov Abstraction" test is?
Socks

The Stepanov abstraction penalty is a benchmark made by Alexander
Stepanov, the man behind STL and templates in C++. Google if you want
to know more.

/Peter

Apr 30 '07 #6

peter koch

On 30 Apr., 17:35, StephQ <askmeo...@mailinator.comwrote:

On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:

Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)

Thank you for very usefull suggestion. I didn't know that checked
iterators were turned on even in release mode in vc8 by default.

The new results (with checked iterators turned off) are:

Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96

Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140

The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.

Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?

Thank you

Cheers
StephQ

I reply to myself just to tell you that I don't mind investigating any
more these issues.
I ran the test using doubles instead of int and the results are very
similar, with the microsoft compiler having something like 3% more
performance.

However the Stepanov Abstraction test favours the intel compiler by a
large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2

With Microsoft:
1.11

A curiosity..... how is it possible to get an abstraction penalty
below 1 ?

Perhaps because you had a bad test? Rerun the benchmarks more than one
time and remember that caching has a huge effect on results (I believe
a factor of ten is quite normal). So you should know how to e.g. clear
(or fill) the cache as appropriate.
Writing a good benchmark is not easy.

/Peter

Apr 30 '07 #7

StephQ

On Apr 30, 7:58 pm, peter koch <peter.koch.lar...@gmail.comwrote:

On 30 Apr., 17:35, StephQ <askmeo...@mailinator.comwrote:

On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:

Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)

Thank you for very usefull suggestion. I didn't know that checked
iterators were turned on even in release mode in vc8 by default.

The new results (with checked iterators turned off) are:

Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96

Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140

The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.

Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?

Thank you

Cheers
StephQ

I reply to myself just to tell you that I don't mind investigating any
more these issues.
I ran the test using doubles instead of int and the results are very
similar, with the microsoft compiler having something like 3% more
performance.

However the Stepanov Abstraction test favours the intel compiler by a
large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2

With Microsoft:
1.11

A curiosity..... how is it possible to get an abstraction penalty
below 1 ?

Perhaps because you had a bad test? Rerun the benchmarks more than one
time and remember that caching has a huge effect on results (I believe
a factor of ten is quite normal). So you should know how to e.g. clear
(or fill) the cache as appropriate.
Writing a good benchmark is not easy.

/Peter

I'm quite a newbie....
Do you suggest that the initial run is the "right" one, while
subsequent runs get distrorted by caching or the opposite thing?
By caching you mean that the objects of interests are loaded in the L1/
L2 cache right?
But I obtained these results in a stable way with different runs....

I remember that caching influences the results of subsequent runs of
benchmarks, but I don't understand why. Isn't cache/memory freed after
the software exit?

Anyway I increased the number of calculations in the test becouse it
was taking too few time to run.

StephQ

Apr 30 '07 #8

Roland Pibinger

On 30 Apr 2007 12:53:47 -0700, StephQ wrote:

>On Apr 30, 7:58 pm, peter koch <peter.koch.lar...@gmail.comwrote:
>Perhaps because you had a bad test? Rerun the benchmarks more than one
time and remember that caching has a huge effect on results

I'm quite a newbie....
Do you suggest that the initial run is the "right" one, while
subsequent runs get distrorted by caching or the opposite thing?
By caching you mean that the objects of interests are loaded in the L1/
L2 cache right?
But I obtained these results in a stable way with different runs....

It may be any 'position effect'. Divide the test into functions (one
for each test) and call the functions several (many) times in
randomized order. Include a 'warm up' at the beginning.
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch

Apr 30 '07 #9

peter koch

On 30 Apr., 21:53, StephQ <askmeo...@mailinator.comwrote:

On Apr 30, 7:58 pm, peter koch <peter.koch.lar...@gmail.comwrote:

On 30 Apr., 17:35, StephQ <askmeo...@mailinator.comwrote:

On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:

Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)

Thank you for very usefull suggestion. I didn't know that checked
iterators were turned on even in release mode in vc8 by default.

The new results (with checked iterators turned off) are:

Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96

Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140

The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.

Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?

Thank you

Cheers
StephQ

I reply to myself just to tell you that I don't mind investigating any
more these issues.
I ran the test using doubles instead of int and the results are very
similar, with the microsoft compiler having something like 3% more
performance.

However the Stepanov Abstraction test favours the intel compiler by a
large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2

With Microsoft:
1.11

A curiosity..... how is it possible to get an abstraction penalty
below 1 ?

Perhaps because you had a bad test? Rerun the benchmarks more than one
time and remember that caching has a huge effect on results (I believe
a factor of ten is quite normal). So you should know how to e.g. clear
(or fill) the cache as appropriate.
Writing a good benchmark is not easy.

/Peter

I'm quite a newbie....
Do you suggest that the initial run is the "right" one, while
subsequent runs get distrorted by caching or the opposite thing?
By caching you mean that the objects of interests are loaded in the L1/
L2 cache right?

Yes.

But I obtained these results in a stable way with different runs....

I remember that caching influences the results of subsequent runs of
benchmarks, but I don't understand why. Isn't cache/memory freed after
the software exit?

No. Caching takes place at the hardware level, so no freeing takes
place just as freeing memory does not remove physical memory.

>
Anyway I increased the number of calculations in the test becouse it
was taking too few time to run.

Right. But try to follow Roland Pibingers advice and see if that
explains anything.

/Peter

Apr 30 '07 #10

Markus Schoder

peter koch wrote:

On 30 Apr., 21:53, StephQ <askmeo...@mailinator.comwrote:
>On Apr 30, 7:58 pm, peter koch <peter.koch.lar...@gmail.comwrote:

On 30 Apr., 17:35, StephQ <askmeo...@mailinator.comwrote:

On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:

Have you turned off checked iterators?
>

(see:http://www.codeproject.com/vcpp/stl/...diterators.asp)

>>
Thank you for very usefull suggestion. I didn't know that
checked iterators were turned on even in release mode in vc8 by
default.

The new results (with checked iterators turned off) are:

Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96

Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception
handling ptr: 141
ptr: 140

The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35%
better in all the situations except the "vector iterator" one.

Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?

Thank you

Cheers
StephQ

I reply to myself just to tell you that I don't mind
investigating any more these issues.
I ran the test using doubles instead of int and the results are
very similar, with the microsoft compiler having something like
3% more performance.

However the Stepanov Abstraction test favours the intel compiler
by a large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2

With Microsoft:
1.11

A curiosity..... how is it possible to get an abstraction penalty
below 1 ?

Perhaps because you had a bad test? Rerun the benchmarks more than
one time and remember that caching has a huge effect on results (I
believe a factor of ten is quite normal). So you should know how to
e.g. clear (or fill) the cache as appropriate.
Writing a good benchmark is not easy.

/Peter

I'm quite a newbie....
Do you suggest that the initial run is the "right" one, while
subsequent runs get distrorted by caching or the opposite thing?
By caching you mean that the objects of interests are loaded in the
L1/ L2 cache right?
Yes.

>But I obtained these results in a stable way with different runs....

I remember that caching influences the results of subsequent runs of
benchmarks, but I don't understand why. Isn't cache/memory freed
after the software exit?
No. Caching takes place at the hardware level, so no freeing takes
place just as freeing memory does not remove physical memory.

Every sane operating system will clear the memory handed out to a new
process otherwise you could accidentally read what another process
maybe run by another user stored in memory.

I know that Linux does this and I am pretty sure that Windows does it
too nowadays. So memory caching between program runs should never
occur.

--
Markus

Apr 30 '07 #11

Similar topics

On benchmarks, heaps, priority queues

by: aaronwmail-usenet | last post by:

I've been wondering about benchmarks recently. What is a fair benchmark? How should benchmarks be vetted or judged? I decided to see what you folks thought, so for discussion I compared two...

Python

I've got the Vector (input parsing) blues

by: {AGUT2}=IWIK= | last post by:

Hello all, It's my fisrt post here and I am feeling a little stupid here, so go easy.. :) (Oh, and I've spent _hours_ searching...) I am desperately trying to read in an ASCII...

C / C++

Std Iterator Problem

by: Adam Hartshorne | last post by:

Hi All, I was wondering if anybody can tell me what is wrong with the following code, in a .h file I have std::list<std::vector<Site> > positions ; std::list<std::vector<Site> >::iterator...

C / C++

STL Vector vs Arrays

by: Havatcha | last post by:

Does anyone have a benchmark for the processing overhead of the STL Vector class, vs a C style array? I would dearly love to use Vectors, but am paranoid about slowing my real-time code down. Can...

C / C++

Scope of std::vector

by: Steve | last post by:

I have defined the following private object: std::vector<Banana> bananas; in my header file. I have also added a method called FillVector(), which sets the size of the vector and fills it with...

C / C++

Roll your own std::vector ???

by: Peter Olcott | last post by:

I need std::vector like capability for several custom classes. I already discussed this extensively in the thread named ArrayList without Boxing and Unboxing. The solution was to simply create...

C# / C Sharp

problem overloading the ++ operator of an iterator

by: mkborregaard | last post by:

Hi, I have the weirdest problem, and I can not see what is going wrong. I have made a 2d container class, and am implementing an iterator for that class. However, the ++ operator is behaving very...

C / C++

Namespace problem

by: tech | last post by:

Hi, i have the following problem In file1.h namespace A { class Bar { void foo();

C / C++

Dev C++ compile problem

by: SamuelXiao | last post by:

Here is the error, and the program run no problem in VC++; Compiler: Default compiler Building Makefile: "H:\CS2332\Summer\STL001\Makefile.win" Executing make... make.exe -f...

C / C++

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware