P: n/a

Sometimes I write code using std::for_each. Sometimes I use it on
several container classes that are the same size.
for_each(v.begin(), v.end(), f1());
for_each(u.begin(), u.end(), f2());
This seems inefficient so I rewrite it as:
int n = v.size();
for (int i = 0; i < n; ++i)
{
f1(v[i]);
f2(u[i]);
}
Is there some mechanism that lets me say the following:
for_each((v.begin(), u.begin()), (v.end(), u.end()), (f1(), f2()));
Thanks

sashan http://www.cs.auckland.ac.nz/~sgov008/  
Share this Question
P: n/a

"sashan" <ma***@operamail.com> wrote in message
news:bl**********@lust.ihug.co.nz... Sometimes I write code using std::for_each. Sometimes I use it on several container classes that are the same size.
for_each(v.begin(), v.end(), f1()); for_each(u.begin(), u.end(), f2());
This seems inefficient so I rewrite it as:
int n = v.size(); for (int i = 0; i < n; ++i) { f1(v[i]); f2(u[i]); }
What makes you think this is any more efficient? If I had to guess, I would
bet that your second version runs *slower* due to the fact you are reducing
memory locality jumping back and forth between the different arrays and
functions. Also, by indexing into the array each time instead of using the
iterators which can simply be incremented, you are making it more difficult
for the compiler to optimize your sequential access (if u and v are less
trivial than a vector, like perhaps a deque, you are making it significantly
more difficult). Current compilers should be able to optimize away the
indexing, but you certainly aren't doing the compiler any favors.
Ken  
P: n/a

"seems inefficient"...do you have proof that it actually *is* inefficient
for your particular case?
Not only have you turned two lines of code into 4+, but you've replaced the
wellknown std::for_each idiom with a custom for() loop that everybody
looking at this code will have to take the time to inspect.
Dan
"sashan" <ma***@operamail.com> wrote in message
news:bl**********@lust.ihug.co.nz... Sometimes I write code using std::for_each. Sometimes I use it on several container classes that are the same size.
for_each(v.begin(), v.end(), f1()); for_each(u.begin(), u.end(), f2());
This seems inefficient so I rewrite it as:
int n = v.size(); for (int i = 0; i < n; ++i) { f1(v[i]); f2(u[i]); }
Is there some mechanism that lets me say the following:
for_each((v.begin(), u.begin()), (v.end(), u.end()), (f1(), f2()));
Thanks
 sashan http://www.cs.auckland.ac.nz/~sgov008/  
P: n/a

Ken Alverson wrote: "sashan" <ma***@operamail.com> wrote in message news:bl**********@lust.ihug.co.nz...
Sometimes I write code using std::for_each. Sometimes I use it on several container classes that are the same size.
for_each(v.begin(), v.end(), f1()); for_each(u.begin(), u.end(), f2());
This seems inefficient so I rewrite it as:
int n = v.size(); for (int i = 0; i < n; ++i) { f1(v[i]); f2(u[i]); }
What makes you think this is any more efficient?
Less comparisons and less increments, I guess.
I've attached a program that times the operations using the above
methods. In the program method 1 uses the for_each algorithm, method 2
is a for loop like above and method 3 is a for loop using iterators.
Method 3 is the sort of thing I was aiming for with my initial question.
The program tests method 1 and 2 and 3 on std::vector and method 1 and 3
on std::list (std::list doesn't support random access so method 2
doesn't work).
Please check it and tell me if anything's wrong (maybe I'm using
QueryPerformance counter wrong or something like that)
In summary, when in release build, method 1 is the slowest, method 2 and
3 are the fastest for std::vector. For std::list method 3 is the
fastest. Compiled using vc7.1.
#include <iostream>
//#include <iomanip.h>
#include <math.h>
#include <windows.h>
#include <vector>
#include <functional>
#include <algorithm>
using namespace std;
struct Sqr :
unary_function<int,int>
{
void operator()(argument_type& i){i *= i;}
};
int vector_timing()
{
cout << "vector timing" << endl;
LARGE_INTEGER Freq, Start, End;
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
BOOL Valid = QueryPerformanceFrequency(&Freq);
if (!Valid)
return 1;
//method 1
vector<int> u0(1000,3),v0(1000,3),w0(1000,3),x0(1000,3);
QueryPerformanceCounter(&Start);
for_each(u0.begin(), u0.end(), Sqr());
for_each(v0.begin(), v0.end(), Sqr());
for_each(w0.begin(), w0.end(), Sqr());
for_each(x0.begin(), x0.end(), Sqr());
QueryPerformanceCounter(&End);
std::cout << "Method 1 " << End.QuadPart  Start.QuadPart << endl;
//method 2
vector<int> u1(1000,3),v1(1000,3),w1(1000,3),x1(1000,3);
QueryPerformanceCounter(&Start);
vector<int>::size_type n = u1.size();
for (int i = 0; i < n; ++i)
{
Sqr()(u1[i]);
Sqr()(v1[i]);
Sqr()(w1[i]);
Sqr()(x1[i]);
}
QueryPerformanceCounter(&End);
std::cout << "Method 2 " << End.QuadPart  Start.QuadPart << endl;
//method 3
vector<int> u2(1000,3),v2(1000,3),w2(1000,3),x2(1000,3);
vector<int>::iterator itu2, itv2, itw2, itx2;
QueryPerformanceCounter(&Start);
for (itu2 = u2.begin(), itv2 = v2.begin(), itw2 = w2.begin(), itx2 = x2.begin();
itu2 != u2.end();
++itu2, ++itv2, ++itw2, ++itx2)
{
Sqr()(*itu2);
Sqr()(*itv2);
Sqr()(*itw2);
Sqr()(*itx2);
}
QueryPerformanceCounter(&End);
std::cout << "Method 3 " << End.QuadPart  Start.QuadPart << endl;
}
int list_timing()
{
LARGE_INTEGER Freq, Start, End;
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
BOOL Valid = QueryPerformanceFrequency(&Freq);
if (!Valid)
return 1;
cout << "list timing" << endl;
//method 1
vector<int> u0(1000,3),v0(1000,3),w0(1000,3),x0(1000,3);
QueryPerformanceCounter(&Start);
for_each(u0.begin(), u0.end(), Sqr());
for_each(v0.begin(), v0.end(), Sqr());
for_each(w0.begin(), w0.end(), Sqr());
for_each(x0.begin(), x0.end(), Sqr());
QueryPerformanceCounter(&End);
std::cout << "Method 1 " << End.QuadPart  Start.QuadPart << endl;
//method 3
vector<int> u2(1000,3),v2(1000,3),w2(1000,3),x2(1000,3);
vector<int>::iterator itu2, itv2, itw2, itx2;
QueryPerformanceCounter(&Start);
for (itu2 = u2.begin(), itv2 = v2.begin(), itw2 = w2.begin(), itx2 = x2.begin();
itu2 != u2.end();
++itu2, ++itv2, ++itw2, ++itx2)
{
Sqr()(*itu2);
Sqr()(*itv2);
Sqr()(*itw2);
Sqr()(*itx2);
}
QueryPerformanceCounter(&End);
std::cout << "Method 3 " << End.QuadPart  Start.QuadPart << endl;
}
int main()
{
vector_timing();
list_timing();
return 0;
}  
P: n/a

"sashan" <ma***@operamail.com> skrev i meddelandet
news:bl*********@lust.ihug.co.nz... Ken Alverson wrote:
"sashan" <ma***@operamail.com> wrote in message news:bl**********@lust.ihug.co.nz...
Sometimes I write code using std::for_each. Sometimes I use it on several container classes that are the same size.
for_each(v.begin(), v.end(), f1()); for_each(u.begin(), u.end(), f2());
This seems inefficient so I rewrite it as:
int n = v.size(); for (int i = 0; i < n; ++i) { f1(v[i]); f2(u[i]); }
What makes you think this is any more efficient? Less comparisons and less increments, I guess.
I've attached a program that times the operations using the above methods. In the program method 1 uses the for_each algorithm, method
2 is a for loop like above and method 3 is a for loop using iterators. Method 3 is the sort of thing I was aiming for with my initial
question. The program tests method 1 and 2 and 3 on std::vector and method 1
and 3 on std::list (std::list doesn't support random access so method 2 doesn't work).
Please check it and tell me if anything's wrong (maybe I'm using QueryPerformance counter wrong or something like that)
What's wrong is that with only 1000 iterations nothing really matters,
because just about anything is fast enough. In summary, when in release build, method 1 is the slowest, method 2
and 3 are the fastest for std::vector. For std::list method 3 is the fastest. Compiled using vc7.1.
When I increase it to a million iterations, I get this result:
vector timing
Method 1 70041
Method 2 382012
Method 3 450109
list timing
Method 1 69126
Method 3 399203
Here for_each is *much* faster.
Bo Persson  
P: n/a

There was a silly mistake in the previous post  the list_timing function used vector as it's container class. I've posted an ammended version that corrects the mistake. It also allows you to change the size of the container and the number of times the various methods are performed. I also change the function object to add one to the given argument instead of squaring.
Here are some results (these are all based on release builds):
For a container size of 1000000 repeating each method 1 time:
vector timing
Method 1 166452
Method 2 431866
Method 3 372398
list timing
Method 1 756946
Method 3 674451
For a container size of 1000000 repeating each method 100 time:
vector timing
Method 1 12679666
Method 2 36049916
Method 3 36056914
list timing
Method 1 81043026
Method 3 70404243
For a container size of 1000 repeating each method 100 time:
vector timing
Method 1 2284
Method 2 1095
Method 3 1308
list timing
Method 1 9081
Method 3 7118
For a container size of 1000 repeating each method 1 time:
vector timing
Method 1 25
Method 2 17
Method 3 18
list timing
Method 1 96
Method 3 62
For the case of the std::list Method 3 is fastest. In the case of std::vector method 1 (the for_each method) is faster when the container size is sufficiently large (in this case 1 000 000). How does one know when a container is sufficiently large? Dunno.
#include <iostream>
//#include <iomanip.h>
#include <math.h>
#include <windows.h>
#include <vector>
#include <functional>
#include <algorithm>
#include <list>
using namespace std;
struct AddOne :
unary_function<int,void>
{
void operator()(argument_type& i);
};
void AddOne::operator() (argument_type& i)
{
i += 1;
}
int vector_timing(long Size, long RepCount)
{
cout << "vector timing" << endl;
LARGE_INTEGER Freq, Start, End;
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
BOOL Valid = QueryPerformanceFrequency(&Freq);
if (!Valid)
return 1;
//method 1
vector<int> u0(Size,3),v0(Size,3),w0(Size,3),x0(Size,3);
QueryPerformanceCounter(&Start);
for (int i = 0; i < RepCount; ++i)
{
for_each(u0.begin(), u0.end(), AddOne());
for_each(v0.begin(), v0.end(), AddOne());
for_each(w0.begin(), w0.end(), AddOne());
for_each(x0.begin(), x0.end(), AddOne());
}
QueryPerformanceCounter(&End);
std::cout << "Method 1 " << End.QuadPart  Start.QuadPart << endl;
//method 2
vector<int> u1(Size,3),v1(Size,3),w1(Size,3),x1(Size,3);
QueryPerformanceCounter(&Start);
vector<int>::size_type n = u1.size();
for (int i = 0; i < RepCount; ++i)
{
for (int i = 0; i < n; ++i)
{
AddOne()(u1[i]);
AddOne()(v1[i]);
AddOne()(w1[i]);
AddOne()(x1[i]);
}
}
QueryPerformanceCounter(&End);
std::cout << "Method 2 " << End.QuadPart  Start.QuadPart << endl;
//method 3
vector<int> u2(Size,3),v2(Size,3),w2(Size,3),x2(Size,3);
vector<int>::iterator itu2, itv2, itw2, itx2;
QueryPerformanceCounter(&Start);
for (int i = 0; i < RepCount; ++i)
{
for (itu2 = u2.begin(), itv2 = v2.begin(), itw2 = w2.begin(), itx2 = x2.begin();
itu2 != u2.end();
++itu2, ++itv2, ++itw2, ++itx2)
{
AddOne()(*itu2);
AddOne()(*itv2);
AddOne()(*itw2);
AddOne()(*itx2);
}
}
QueryPerformanceCounter(&End);
std::cout << "Method 3 " << End.QuadPart  Start.QuadPart << endl;
return 0;
}
int list_timing(long Size, long RepCount)
{
LARGE_INTEGER Freq, Start, End;
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
BOOL Valid = QueryPerformanceFrequency(&Freq);
if (!Valid)
return 1;
cout << "list timing" << endl;
//method 1
list<int> u0(Size,3),v0(Size,3),w0(Size,3),x0(Size,3);
QueryPerformanceCounter(&Start);
for (int i = 0; i < RepCount; ++i)
{
for_each(u0.begin(), u0.end(), AddOne());
for_each(v0.begin(), v0.end(), AddOne());
for_each(w0.begin(), w0.end(), AddOne());
for_each(x0.begin(), x0.end(), AddOne());
}
QueryPerformanceCounter(&End);
std::cout << "Method 1 " << End.QuadPart  Start.QuadPart << endl;
//method 3
list<int> u2(Size,3),v2(Size,3),w2(Size,3),x2(Size,3);
list<int>::iterator itu2, itv2, itw2, itx2;
QueryPerformanceCounter(&Start);
for (int i = 0; i < RepCount; ++i)
{
for (itu2 = u2.begin(), itv2 = v2.begin(), itw2 = w2.begin(), itx2 = x2.begin();
itu2 != u2.end();
++itu2, ++itv2, ++itw2, ++itx2)
{
AddOne()(*itu2);
AddOne()(*itv2);
AddOne()(*itw2);
AddOne()(*itx2);
}
}
QueryPerformanceCounter(&End);
std::cout << "Method 3 " << End.QuadPart  Start.QuadPart << endl;
return 0;
}
int main()
{
vector_timing(1000,1);
list_timing(1000,1);
return 0;
}  
P: n/a

>
In summary, when in release build, method 1 is the slowest, method 2
and
3 are the fastest for std::vector. For std::list method 3 is the fastest. Compiled using vc7.1.
When I increase it to a million iterations, I get this result: vector timing Method 1 70041 Method 2 382012 Method 3 450109 list timing Method 1 69126 Method 3 399203 Here for_each is *much* faster.
Sorry there was a mistake in the code I originally posted. The function
list_timing used std::vector as its container when it was meant to use
std::list. I've corrected the mistake and posted it as a reply to my
original code post.  
P: n/a

Dan Smith wrote: "seems inefficient"...do you have proof that it actually *is* inefficient for your particular case?
It seemed inefficient because more comparisons and increments happen in
this case
for_each(v.begin(), v.end(), f1());
for_each(u.begin(), u.end(), f2());  
P: n/a

"sashan" <ma***@operamail.com> wrote in message
news:bl**********@lust.ihug.co.nz... For the case of the std::list Method 3 is fastest. In the case of std::vector method 1 (the for_each method) is faster when the container size is sufficiently large (in this case 1 000 000). How does one know when a container is sufficiently large? Dunno.
I suspect that the reason method 3 is faster than method 1 for list containers
is because you are only checking for the end of one container, not all four.
This is a valid optimization if you know that the lists are all the same
length. The difference is pretty minimal though, especially if the operation
is less trivial than adding one, so unless it was highly speed critical I
would still favor the for_each method.
I suspect the reason method 1 is slower with small containers is because it is
method 1. That is, method 1 is suffering all the cache misses, while the
other two methods benefit from the data being already cached from method 1's
operation. With larger data sets, the early part of the data set is pushed
out of the cache by the later data before the second method runs, so it is has
to deal with the same cache misses that the first one did.
I'll bet if you run "method 1" second, it will perform better with the small
containers.
Ken  
P: n/a

> I suspect that the reason method 3 is faster than method 1 for list containers is because you are only checking for the end of one container, not all four. This is a valid optimization if you know that the lists are all the same length. The difference is pretty minimal though, especially if the operation is less trivial than adding one, so unless it was highly speed critical I would still favor the for_each method.
Would you use the following if it were available?
for_each((v.begin(), u.begin()), (v.end(), u.end()), (f1(), f2())); I suspect the reason method 1 is slower with small containers is because it is method 1. That is, method 1 is suffering all the cache misses, while the other two methods benefit from the data being already cached from method 1's operation. With larger data sets, the early part of the data set is pushed out of the cache by the later data before the second method runs, so it is has to deal with the same cache misses that the first one did.
I'll bet if you run "method 1" second, it will perform better with the small containers.
That sounded like a reasonable sort of argument so I tried it and got
these results for 1000 vectors
vector timing
Method 2 16
Method 3 17
Method 1 28
list timing
Method 3 67
Method 1 96
Method 1 (the for_each method) is still slower.

sashan http://www.cs.auckland.ac.nz/~sgov008/  
P: n/a

"sashan" <ma***@operamail.com> wrote in message
news:bm**********@lust.ihug.co.nz... The difference is pretty minimal though, especially if the operation is less trivial than adding one, so unless it was highly speed critical I would still favor the for_each method.
Would you use the following if it were available? for_each((v.begin(), u.begin()), (v.end(), u.end()), (f1(), f2()));
I wouldn't be averse to it, though the syntax as you write it couldn't work
without changes to the language. Perhaps if there was a form of for_each that
took pair<Iterator,Iterator>'s.
You can sort of fake it out with:
mismatch(v.begin(),v.end(),u.begin(),function_runn er());
where function_runner does:
bool operator()(LHS_t lhs, RHS_t rhs) {
f1(lhs);
f2(rhs);
return true;
}
Of course, I'm not sure that's any clearer than writing it out or creating
your own for_each2 function. I'll bet if you run "method 1" second, it will perform better with the
smallcontainers.
That sounded like a reasonable sort of argument so I tried it and got these results for 1000 vectors
vector timing Method 2 16 Method 3 17 Method 1 28
Very odd. If it wasn't cache misses, I'd be very curious to know why the
smaller arrays perform so differently than the larger ones.
Ken  
P: n/a

"sashan" <ma***@operamail.com> wrote in message
news:bm**********@lust.ihug.co.nz... I suspect that the reason method 3 is faster than method 1 for list
containersis because you are only checking for the end of one container, not all
four.This is a valid optimization if you know that the lists are all the same length. The difference is pretty minimal though, especially if the
operationis less trivial than adding one, so unless it was highly speed critical I would still favor the for_each method.
Would you use the following if it were available? for_each((v.begin(), u.begin()), (v.end(), u.end()), (f1(), f2()));
BTW, if you want to suggest language/library features like this, a good place
to bring it up would be comp.std.c++.
Ken  
P: n/a

I think might be possible to get pretty close to this already, say something along the lines of:
typedef std::vector<int> IntVector_t;
#include <iterator>
struct TwoIntVectorsAsOne
{
struct function_wrapper
{
function_wrapper(Sqr& uf) : m_uf(uf) {}
void operator()(const TwoIntVectorsAsOne& tao);
private:
Sqr& m_uf;
};
struct iterator : public std::iterator<std::forward_iterator_tag, TwoIntVectorsAsOne>
{
iterator() {}
TwoIntVectorsAsOne& operator*() const;
iterator& operator++();
friend bool operator==(const iterator& x, const iterator& y);
friend bool operator!=(const iterator& x, const iterator& y) {
return (!(x == y));
}
};
iterator begin();
iterator end();
TwoIntVectorsAsOne(const IntVector_t& iv1, const IntVector_t& iv2) : m_iv1(iv1), m_iv2(iv2) {}
private:
const IntVector_t& m_iv1;
const IntVector_t& m_iv2;
};
vector<int> u3(10,3),v3(10,3);
TwoIntVectorsAsOne two_as_one(u3, v3);
Sqr sqr;
TwoIntVectorsAsOne::function_wrapper fw(sqr);
for_each(two_as_one.begin(), two_as_one.end(), fw);
Obviously, this is FAR from complete, but it does illustrate one possible approach using the existing language.
Dan
"Ken Alverson" <Ke*@Alverson.net> wrote in message news:uP**************@TK2MSFTNGP10.phx.gbl... "sashan" <ma***@operamail.com> wrote in message news:bm**********@lust.ihug.co.nz... I suspect that the reason method 3 is faster than method 1 for list containersis because you are only checking for the end of one container, not all four.This is a valid optimization if you know that the lists are all the same length. The difference is pretty minimal though, especially if the operationis less trivial than adding one, so unless it was highly speed critical I would still favor the for_each method.
Would you use the following if it were available? for_each((v.begin(), u.begin()), (v.end(), u.end()), (f1(), f2()));
BTW, if you want to suggest language/library features like this, a good place to bring it up would be comp.std.c++. Ken   This discussion thread is closed Replies have been disabled for this discussion.   Question stats  viewed: 2347
 replies: 12
 date asked: Nov 16 '05
