overhead of using std::sort

Ireneusz SZCZESNIAK

I want to sort a vector with the std::sort function. There are two
functions: one with two arguments, the other with three arguments. I
am using the one with three arguments. I noticed that there is a huge
overhead of using this function, which comes from copying the function
object, i.e., the object that compares the elements of the vector,
which is passed to the function not by reference but by value.

Now, at the end of this mail there is a source code. It's not the
actual code of the problem I have, but only a distilled version of it.
Here the SO class has no fields, so copying an object of this class
is not a big deal. However, in my actual problem the function object
has a very large vector inside, and this function object should not be
copied even once. If you run the program, you'll see the function
object is copied seven times for sorting a vector with three
elements!

So, the problem is that when I use the std::sort function the function
object is copied many times! The larger the sequence to sort, the
greater the number of copies of the function object. It's very bad
for me because I sort small number of elements (up to four), but I do
the sorting many times (millions).

I want to use the sort function in such a way that the function object
is not coppied. I tried to call the sort function this way:

sort<vector<irek *>::iterator, SO &>(v.begin(), v.end(), so);

but it reduced the number of copies by one only.

I am convinced that I am doing something wrong. I would appreciate
your advice on how to solve my problem.
Thanks & best,
Irek
****************************************
Ireneusz SZCZESNIAK - research assistant
http://www.iitis.gliwice.pl/~iszczesniak
****************************************

************************************************** *******************

#include <algorithm>
#include <vector>
#include <iostream>
#include <functional>

using namespace std;

class irek
{
int i;

public:
irek(int _i) {i = _i;}
int get_i() const {return i;}
};

struct SO : binary_function<irek, irek, bool>
{
public:
SO() {}
SO(const SO&)
{
cout << "coppied!" << endl;
}

result_type operator()(const first_argument_type *a,
const second_argument_type *b)
{
return a->get_i() < b->get_i();
}
};
main()
{
// this is our function object
SO so;

irek a(2), b(3), c(1);

vector<irek *> v;
v.push_back(&b);
v.push_back(&c);
v.push_back(&a);

sort(v.begin(), v.end(), so);

for(int i = 0; i < v.size(); i++)
cout << v[i]->get_i() << endl;
}

Jul 22 '05 #1

Subscribe Reply

3520

Karl Heinz Buchegger

Ireneusz SZCZESNIAK, UNEXPECTED_DATA_AFTER_ADDRESS@.SYNTAX-ERROR. wrote:

Now, at the end of this mail there is a source code. It's not the
actual code of the problem I have, but only a distilled version of it.
Here the SO class has no fields, so copying an object of this class
is not a big deal. However, in my actual problem the function object
has a very large vector inside,
What for?
The purpose of the functor is to compare 2 objects. Nothing more,
nothing less.
and this function object should not be
copied even once. If you run the program, you'll see the function
object is copied seven times for sorting a vector with three
elements!

If you really need that vector inside the functor, well, you can
always have the vector outside the class and just give a pointer
to it to the functor object. This way copying the functor object
stays cheap.

--
Karl Heinz Buchegger
kb******@gascad.at

Jul 22 '05 #2

tom_usenet

On Thu, 29 Jul 2004 13:42:24 +0200, Ireneusz SZCZESNIAK
<no****@hotmail.com>, UNEXPECTED_DATA_AFTER_ADDRESS@.SYNTAX-ERROR.
wrote:

I want to sort a vector with the std::sort function. There are two
functions: one with two arguments, the other with three arguments. I
am using the one with three arguments. I noticed that there is a huge
overhead of using this function, which comes from copying the function
object, i.e., the object that compares the elements of the vector,
which is passed to the function not by reference but by value.
"huge overhead"!? Most function objects are stateless, and the
overhead of copying them is precisely zero.

Obviously taking the function object by reference is not a good
option:

sort(v.begin(), v.end(), functor()); //error!

since you can't bind a temporary to a non-const refence. Equally,
passing by const reference requires your operator() to be a const
member function, which might be annoying (requiring mutable for any
members you want to change).

So passing by value is the best option.

However, I don't see any reason why the internals of std::sort need to
copy the functor - they could certainly pass it around by non-const
reference; that's just a quality of implementation issue though - the
standard allows the function object to be copied an arbitrary number
of times by any algorithm.
Now, at the end of this mail there is a source code. It's not the
actual code of the problem I have, but only a distilled version of it.
Here the SO class has no fields, so copying an object of this class
is not a big deal. However, in my actual problem the function object
has a very large vector inside, and this function object should not be
copied even once. If you run the program, you'll see the function
object is copied seven times for sorting a vector with three
elements!
The solution is to use reference semantics. For example, your function
object should contain a reference to the vector, not the vector
itself. Then the compiler generated copy constructor will be efficient
and fast, and with inlining, the overhead may well be zero.
So, the problem is that when I use the std::sort function the function
object is copied many times! The larger the sequence to sort, the
greater the number of copies of the function object. It's very bad
for me because I sort small number of elements (up to four), but I do
the sorting many times (millions).
If you are only sorting up to 4 elements, you could come up with a
faster algorithm than std::sort in any case. An explicit if ladder
with no loop will be fast - in effect unrolling the entire sorting
operation.
I want to use the sort function in such a way that the function object
is not coppied. I tried to call the sort function this way:

sort<vector<irek *>::iterator, SO &>(v.begin(), v.end(), so);

but it reduced the number of copies by one only.
Yes, that is not a solution.
I am convinced that I am doing something wrong. I would appreciate
your advice on how to solve my problem.

If performance is important, make sure your function objects are cheap
to copy.

Tom

Jul 22 '05 #3

Ireneusz SZCZESNIAK

On Thu, 29 Jul 2004, John Harrison wrote:

Recode your function object so that it holds a pointer to the vector
data instead of holding the vector itself.
This is the way I have it now implemented.
This will make it cheap to copy.
But it still costs to copy the function object.
You could use an ordinary pointer for this, but a better choice
might be the shared array class from boost. See
http://www.boost.org/libs/smart_ptr/smart_ptr.htm

Thanks, I will have a look.
Irek

****************************************
http://www.iitis.gliwice.pl/~iszczesniak
****************************************

Jul 22 '05 #4

John Harrison

"Ireneusz SZCZESNIAK" <no****@hotmail.com> wrote in message
news:Pi*******************************@irek.iitis. gliwice.pl...

On Thu, 29 Jul 2004, John Harrison wrote:
Recode your function object so that it holds a pointer to the vector
data instead of holding the vector itself.

This is the way I have it now implemented.
This will make it cheap to copy.

But it still costs to copy the function object.

IIRC the standard allows the std::sort to make copies of the supplied
function object. Function objects must be effectively stateless, so that if
the sort algorithm uses one copy half the time and another copy the other
half of the time it should not matter.

I would imagine that your function object is being copied to other helper
algorithms internal to std::sort. You are just going to have to live with
this behaviour.

john

Jul 22 '05 #5

Ireneusz SZCZESNIAK

> "huge overhead"!? Most function objects are stateless, and the

overhead of copying them is precisely zero.
My functor is not stateless, because it needs some extra data. It
used to have a vector inside, but now it has just a pointer to this
vector. Coping my functor requires at least coping a pointer.

I believe it's huge. For sorting 3 elements the functor was copied 7
times, for 1000 elements - 1328 times, and for 1000000 elements -
1312948 times. These numbers slightly change with different element
arrangements in the vector.
However, I don't see any reason why the internals of std::sort need
to copy the functor - they could certainly pass it around by
non-const reference; (...)
When you implement templated std::sort this way, then you can pass
both a functor and a pointer to a function. If std::sort took a
functor by reference, then you would not be able to pass a pointer to
a function. All the std::sort function want to do with this that you
pass is apply to it the "()" operator.
If you are only sorting up to 4 elements, you could come up with a
faster algorithm than std::sort in any case. An explicit if ladder
with no loop will be fast - in effect unrolling the entire sorting
operation.

Now it's up to four elements, but soon I will sort 16, 64, 128 and
more elements. I need to stick to std::sort.

I didn't realize that the C++ standard imposes no restrictions on
copying functors. This is why implementations are free to make those
copies of my functor.
Thanks,
Irek

Jul 22 '05 #6

Ireneusz SZCZESNIAK

On Thu, 29 Jul 2004, John Harrison wrote:

Function objects must be effectively stateless, so that if the sort
algorithm uses one copy half the time and another copy the other
half of the time it should not matter.
When I create my functor I give it some state, and this state will not
change later. But different functors have different states. Well, a
state for me is just the value of a pointer field in my functor. When
I create a functor I pass a some pointer to its constructor.
I would imagine that your function object is being copied to other
helper algorithms internal to std::sort.
Yes, this is what happens.
You are just going to have to live with this behaviour.

I guess so.
Thanks,
Irek

****************************************
http://www.iitis.gliwice.pl/~iszczesniak
****************************************

Jul 22 '05 #7

tom_usenet

On Thu, 29 Jul 2004 15:54:39 +0200, Ireneusz SZCZESNIAK
<no****@hotmail.com> wrote:

"huge overhead"!? Most function objects are stateless, and the
overhead of copying them is precisely zero.

My functor is not stateless, because it needs some extra data. It
used to have a vector inside, but now it has just a pointer to this
vector. Coping my functor requires at least coping a pointer.

I believe it's huge. For sorting 3 elements the functor was copied 7
times, for 1000 elements - 1328 times, and for 1000000 elements -
1312948 times. These numbers slightly change with different element
arrangements in the vector.

That is a surprisingly large number of copies (O(n) by the look of
things). Clearly, if copying the functor is significantly more
expensive than, say, calling the comparator, it will become the
performance limiting factor until n is really huge.

However, I don't see any reason why the internals of std::sort need
to copy the functor - they could certainly pass it around by
non-const reference; (...)

When you implement templated std::sort this way, then you can pass
both a functor and a pointer to a function. If std::sort took a
functor by reference, then you would not be able to pass a pointer to
a function.

You would, just not an rvalue one. If you pass just the function name,
it would work fine, since you can form a reference to a function.

But I wasn't suggesting that sort took its parameter by reference (I
posted details of why not), but that the internals might. e.g.

template...
void sort_aux(It begin, It end, Comp& comp);

template...
void sort(It begin, It end, Comp comp)
{
sort_aux(begin, end, comp);
}

That would work fine for function pointers, since inside sort, comp is
an lvalue. However, it would break if you explicitly made Comp a
reference type (due to trying to form a reference to reference).
Obviously an implementation could get around this, using
boost::add_reference or similar.

If you are only sorting up to 4 elements, you could come up with a
faster algorithm than std::sort in any case. An explicit if ladder
with no loop will be fast - in effect unrolling the entire sorting
operation.

Now it's up to four elements, but soon I will sort 16, 64, 128 and
more elements. I need to stick to std::sort.

I didn't realize that the C++ standard imposes no restrictions on
copying functors. This is why implementations are free to make those
copies of my functor.

Personally I'm quite surprised at the huge number of copies - I
imagined there would be O(log n) copies at most (one for each divide
and conquer, if you see what I mean), but I'm sure if I think about it
the reason that it's O(n) copies may become clear. But, as you say
(and I said), it is conforming in any case.

Tom

Jul 22 '05 #8

Similar topics

3483

problem on creating functor for std::sort

by: lok | last post by:

i have a class: template <class T1, class T2> class CPairMapping { public: typedef std::pair<T1, T2> ValuePair_t; typedef std::vector<ValuePair_t> ValueList_t; typedef std::binary_function<...

C / C++

4218

Sorting records using sort()

by: Elijah Bailey | last post by:

I want to sort a set of records using STL's sort() function, but dont see an easy way to do it. I have a char *data; which has size mn bytes where m is size of the record and n is the...

C / C++

2311

using std::sort;

by: alexhong2001 | last post by:

Does "std::sort" work only with sequence containers, not associative containers at all? Among sequential containers, can it be used with "list", "queue" and other sequence containers besides...

C / C++

13075

std::sort

by: JasBascom | last post by:

say i had 97456 and 23456 is there already a sort function to check which one begins with the smaller number rearrange it. in this case the bottom number should clearly be rearranged to the...

C / C++

3308

trouble when overloading function used as predicate in std::sort

by: hall | last post by:

I accidently overloaded a static member function that I use as predicate in the std::sort() for a vector and ended up with a compiler error. Is this kind of overload not allowed for predicates and...

C / C++

4208

[STL] Segmentation fault on using std::sort

by: Manfred | last post by:

Hello I am new to template programming, so i tried the 'example' from http://www.sgi.com/tech/stl/functors.html. I can compile the code but when i want to run the program I get a segmentation...

C / C++

3878

Need help on std::sort comparison function

by: fade | last post by:

Good afternoon, I need some advice on the following: I've got a class that has a member std::vector<CStringm_vFileName and a member CString m_path; The vector contains a bunch of filenames with...

C / C++

5149

std::sort std::vector predicate problem

by: mscava | last post by:

How can I make this construction valid? It still gives me error about no matching function std::sort(...). I made a little search and wrong thing is probably the predicate... template <typename...

C / C++

1775

why std::sort take no template parameter to swap?

by: panzhiyong | last post by:

Hi there! I wonder why there is no such a overload version of sort function in STL: template<class RandomAccessIterator, class Pr, class IterSwap> void sort( RandomAccessIterator _First,...

C / C++

7041

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

6908

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

6737

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

6921

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

4776

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

2995

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

2984

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1300

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

563

php

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP