468,771 Members | 1,935 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,771 developers. It's quick & easy.

much optimizing = bad code ? (array/vector)

Hi,

I have a question that you probably shouldn´t worry about since the
compiler cares for it, but anyways:

When you run your compiler with optimization turned on (eg. g++ with -Ox
flag) and your program gets significantly faster than without, did you
write bad code/ have a bad design?

Cause what happens in those optimization steps is, I think, mostly
rearranging of orders. If you had programmed without creating
unnecessary objects and the like it wouldn´t be much left for the
compiler to optimize.

And what is best to think about when trying to improve your code, what
optimization steps cannot be done by the compiler. For example I had a
lot of one line functions. When I inlined those, the resulting
executable got smaller but with no difference in running time or even
running time got worse.

Can you improve your code much by hand if you use STL objects? For this
I have some concrete data:
I have a program that used arrays, for flexibility I changed array to
vector. The vector version was unbelievable slow, so I didn´t access the
elements via [] but changed to iterators. This still was way too slow
and gprof showed that iterator functions consumed most of the time and I
changed to pointers. So here are the numbers for the different versions
compiled via
g++ -o target source.cpp:

array: 48,5 seconds
veciter: 3,5 minutes
vecpoint: 3 minutes

Since someone wrote some time ago that most compilers do a good job
optimizing vector routines I compiled with
g++ -O3 -o target source.cpp:

array: 15 seconds
veciter: 38 seconds
vecpoint: 36 seconds

So did I write initially bad code so that array could be improved more
than 3 times by a compiler? The very bad performance of the vector
versions you can´t blame (fully) on the code since the only change was
array to vector, but how can I try to optimize this code to get the
runtime down to the array version? For this I have to depend on the
compiler optimization I guess (otherwise I should stop programming cause
2+ minutes is a long way to go), but what does the compiler do anyway
and wouldn´t have a effect if I change by hand?

Thanks for any answer, I kind of got carried away from the initial question.

Best wishes, Hagen.
Jul 23 '05 #1
8 1742
Hagen wrote:
I have a question that you probably shouldn´t worry about since the
compiler cares for it, but anyways:

When you run your compiler with optimization turned on (eg. g++ with -Ox
flag) and your program gets significantly faster than without, did you
write bad code/ have a bad design?
[...]


No.

The explanation is simple: design and often code are governed not only
and not as much by the performance of the target application/program as
by the requirements for maintainability and extendability (which do
imply readability and understandability).

Optimisers, OTOH, exist solely for the purpose of catching what the
programmer didn't pay attention to because he/she was concerned with
other, more important things.

I would say, pay attention to how you code if after you use the compiler
optimisation switch, you still don't get good enough performance. Then,
use a performance analysis tool and improve your code in such way that
it doesn't drive you farther away from other goals, if they are more or
even equally important.

HTH

V
Jul 23 '05 #2
Hagen wrote:
When you run your compiler with optimization turned on (eg. g++ with -Ox flag) and your program gets significantly faster than without, did you write bad code/ have a bad design?
No, not at all! When optimizing, the compiler may detect common
subexpressions, propagate constants (i.e. compute things during
compile time because it realized that it does not change), really
inline functions rather than calling local versions, etc. There
is a host of stuff a compiler can do but typically does to avoid
unnecessary analysis durnig development where compilation speed
is crucial.
Cause what happens in those optimization steps is, I think, mostly
rearranging of orders.
I think, you think wrongly. Optimization is about lots of things.
Ordering processor operation such that the processor is optimally
used is one of them but definitely not the only thing.
array: 48,5 seconds
veciter: 3,5 minutes
vecpoint: 3 minutes


These results look odd: the vector version should actually be no
slower or at most only slightly slower than the array version.
Are you sure you don't compare apples to oranges? For example,
the following two excerpts will have grossly different performance
characteristics:

// excerpt 1:
int const repeat = 1000;
int const size = 100;
for (int r = 0; r < repeat; ++r)
{
int a[size];
for (int i = 0; i < size; ++i)
a[i] = i;
}

// excerpt 2:
int const repeat = 1000;
int const size = 100;
for (int r = 0; r < repeat; ++r)
{
std::vector<int> a(size);
for (int i = 0; i < size; ++i)
a[i] = i;
}

The second code fragment should be substantially slower than the
first one! This is, however, not directly related to the use of
built-in arrays vs. vector but because the second code uses a lot
of dynamic memory allocation and initializes/destructs its
elements while the first one does not (for arrays of PODs the
compiler just finds that neither the constructor nor the
destructor has to be called).

There are a few alternatives how to address this:
- The container could be hoisted out of the loop resulting in
just one pair of memory operations and one unnecessary
initialization.
- You could use a different container, e.g. 'std::tr1::array'
which behaves much more like a built-in array than 'std::vector'
does.

Since I don't know what your code looks like, I cannot tell for
sure that it is a problem like the one above. However, generally
there should be no huge difference between using built-in arrays
or 'std::vector' unless you use a fixed sized array. In this case
the 'std::vector' replacement is probably inappropriate and you
should consider a fixed-size array class like 'std::tr1::array'
instead (since the latter is not yet widely available, you can get
a similar class from <http://www.boost.org/>).
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting

Jul 23 '05 #3
Hagen wrote:
I have a question that you probably shouldn´t worry about
since the compiler [takes care of] it, but anyways:

When you run your compiler with optimization turned on
(e.g. g++ with -Ox [option])
and your program gets significantly faster than without,
did you write bad code and/or have a bad design?
Probably not.
If you write bad code, you are likely to frustrate the optimizer.
Cause what happens in those optimization steps is, I think,
mostly rearranging of orders.
Obviously, you don't know anything about optimization.
If you had programmed without creating unnecessary objects
If they are really unnecessary objects and the like,
a good optimizer will get rid of them.
[there] wouldn´t be much left for the compiler to optimize.

And what is best to think about [optimization]
when trying to improve your code? What optimization steps cannot be done by the compiler.
A C++ compiler does not choose algorithms.
You must select the best algorithms and implement them.
For example, I had a lot of one line functions.
When I inline'd [them], the resulting executable got smaller
but with no difference in running time or even running time got worse.
You probably screwed up.
But we'd need to see your code to tell you what you did wrong.
Can you improve your code much by hand if you use STL objects?
For this I have some concrete data:
I have a program that used arrays.
For flexibility, I changed array to vector.
The vector version was unbelievable slow
so I didn´t access the elements via [] but changed to iterators.
This still was way too slow and gprof showed that
iterator functions consumed most of the time and I changed to pointers.
So here are the numbers for the different versions compiled via

g++ -o target source.cpp:

array: 48,5 seconds
veciter: 3,5 minutes
vecpoint: 3 minutes

Since someone wrote some time ago that
most compilers do a good job optimizing vector routines,
I compiled with g++ -O3 -o target source.cpp:

array: 15 seconds
veciter: 38 seconds
vecpoint: 36 seconds

So did I write initially bad code
so that array could be improved more than 3 times by a compiler?
Probably.
The very bad performance of the vector
versions you can´t blame [entirely] on the code
since the only change was array to vector.
But how can I try to optimize this code
to get the runtime down to the array version?
For this, I have to depend on the compiler optimization I guess.
(Otherwise, I should stop programming cause 2+ minutes is a long way to go.)
But what does the compiler do anyway?
And wouldn´t have a effect if I change by hand?

Thanks for any answer, question.
I kind of got carried away from the initial.


It's pretty hard to figure out what you did wrong
unless you can show us your code.

Jul 23 '05 #4
"Victor Bazarov" <v.********@comAcast.net> wrote in message
news:uR******************@newsread1.mlpsca01.us.to .verio.net...
Optimisers, OTOH, exist solely for the purpose of catching what the
programmer didn't pay attention to because he/she was concerned with
other, more important things.


This claim may have been true at one time, but it doesn't apply to modern
computers.

On many modern machines, it is possible to make programs run much faster by
reordering code at the machine-instruction level so that the machine can
execute several operations in parallel. There is typically no way of
expressing such reorderings at the source-code level at all. Optimizers
also often perform other tasks, such as register allocation, over which the
language does not offer fine enough control to meet the needs of the
hardware.

I have seen a number of cases in which turning on optimization gains a
factor of three or more in execution time, regardless of how the program
might have been coded originally at the source level.
Jul 23 '05 #5
> Probably not.
If you write bad code, you are likely to frustrate the optimizer.
That's something I would like to see. ;-)
Cause what happens in those optimization steps is, I think, mostly
rearranging of orders.


Obviously, you don't know anything about optimization.


Changing the order to increase parallelism is OK, as long as invariance
rules are not neglected and the resulting precision is the same. These
things stand to a good reason, it would be intolerable that same expression
would evaluate to different value at random, this applies especially to
floating point and such.. when runtime performance is more critical than
precision some compilers naturally offer the choise between the two..

Hmmm, maybe an example is appropriate..

float computeX(float a, float b, float c)
{
return a + b * c;
}

Is not guaranteed to evaluate to the same values as:

float computeY(float a, float b, float c)
{
float v = b * c;
return a + v;
}

It *does* matter how you write the code, optimizer can do a lot of things
but what it can do is closely tied to knowing how the platform you are
working on actually works and how the language works.. that gives the
possibilities the compiler has, now the trick is to know what
transformations the compiler actually is programmed to perform..
If they are really unnecessary objects and the like,
a good optimizer will get rid of them.


Or it might not, depending on matter of possible side effects the compiler
might or might not be aware of. Usually doing instruction selection and
register allocation at linking time increases the chances of what you
describe taking place more frequently but is not guaranteed.
Jul 23 '05 #6
Dietmar Kuehl wrote:
Hagen wrote:
When you run your compiler with optimization turned on (eg. g++ with
-Ox
flag) and your program gets significantly faster than without, did


you
write bad code/ have a bad design?

No, not at all! When optimizing, the compiler may detect common
subexpressions, propagate constants (i.e. compute things during
compile time because it realized that it does not change), really
inline functions rather than calling local versions, etc. There
is a host of stuff a compiler can do but typically does to avoid
unnecessary analysis durnig development where compilation speed
is crucial.

Cause what happens in those optimization steps is, I think, mostly
rearranging of orders.

I think, you think wrongly. Optimization is about lots of things.
Ordering processor operation such that the processor is optimally
used is one of them but definitely not the only thing.

array: 48,5 seconds
veciter: 3,5 minutes
vecpoint: 3 minutes

These results look odd: the vector version should actually be no
slower or at most only slightly slower than the array version.
Are you sure you don't compare apples to oranges? For example,
the following two excerpts will have grossly different performance
characteristics:

// excerpt 1:
int const repeat = 1000;
int const size = 100;
for (int r = 0; r < repeat; ++r)
{
int a[size];
for (int i = 0; i < size; ++i)
a[i] = i;
}

// excerpt 2:
int const repeat = 1000;
int const size = 100;
for (int r = 0; r < repeat; ++r)
{
std::vector<int> a(size);
for (int i = 0; i < size; ++i)
a[i] = i;
}

The second code fragment should be substantially slower
than the first one! This is, however, not directly related
to the use of built-in arrays vs. vector
but because the second code uses a lot of dynamic memory allocation
and initializes/destructs its elements
while the first one does not
(for arrays of PODs the compiler just finds that
neither the constructor nor the destructor has to be called).


We can test your hypothesis:
cat excerpt.cc #include <vector>

int main(int argc, char* argv[]) {
int const repeat = 1024*1024;
int const size = 1024;
for (int r = 0; r < repeat; ++r) {
#ifdef UMMY
int* a = new int[size];
#else
std::vector<int> a(size);
#endif
for (int i = 0; i < size; ++i)
a[i] = i;
#ifdef UMMY
delete [] a;
#endif
}

return 0;
}
g++ -DUMMY -Wall -ansi -pedantic -O2 -o excerpt excerpt.cc
time ./excerpt 4.203u 0.007s 0:04.21 99.7% 0+0k 0+0io 0pf+0w g++ -Wall -ansi -pedantic -O2 -o excerpt excerpt.cc
time ./excerpt 11.492u 0.006s 0:11.50 99.9% 0+0k 0+0io 0pf+0w
There are a few alternatives how to address this:
- The container could be hoisted out of the loop resulting in
just one pair of memory operations and one unnecessary
initialization.
We can test that too:
cat excerpt.cc #include <vector>

int main(int argc, char* argv[]) {
int const repeat = 1024*1024;
int const size = 1024;
#ifdef UMMY
int* a = new int[size];
#else
std::vector<int> a(size);
#endif
for (int r = 0; r < repeat; ++r) {
for (int i = 0; i < size; ++i)
a[i] = i;
}
#ifdef UMMY
delete [] a;
#endif

return 0;
}
g++ -DUMMY -Wall -ansi -pedantic -O2 -o excerpt excerpt.cc
time ./excerpt 3.648u 0.006s 0:03.65 99.7% 0+0k 0+0io 0pf+0w g++ -Wall -ansi -pedantic -O2 -o excerpt excerpt.cc
time ./excerpt 3.670u 0.002s 0:03.67 100.0% 0+0k 0+0io 0pf+0w
- You could use a different container, e.g. 'std::tr1::array'
which behaves much more like a built-in array
than 'std::vector' does.

Since I don't know what your code looks like,
I cannot tell for sure that it is a problem like the one above.
However, generally, there should be no huge difference
between using built-in arrays or 'std::vector'
unless you use a fixed sized array.
In this case, the 'std::vector' replacement
is probably inappropriate and you should consider
a fixed-size array class like 'std::tr1::array' instead
(since the latter is not yet widely available,
you can get a similar class from <http://www.boost.org/>).

Jul 23 '05 #7
assaarpa wrote:
Hmmm, maybe an example is appropriate..

float computeX(float a, float b, float c) {
return a + b * c;
}

Is not guaranteed to evaluate to the same values as:

float computeY(float a, float b, float c) {
float v = b * c;
return a + v;
}

It *does* matter how you write the code, optimizer can do a lot of things
but what it can do is closely tied to knowing how the platform
[that] you are working on actually works and how the language works..
that gives the possibilities the compiler has,
now the trick is to know
what transformations the compiler actually is programmed to perform..


You need to elaborate.
Differences in the results returned by functions computeX and computeY
should *not* matter
or you have serious problems with your numerical analysis.
Jul 23 '05 #8
"Andrew Koenig" <ar*@acm.org> wrote in message news:<ja*******************@bgtnsc04-news.ops.worldnet.att.net>...

I have seen a number of cases in which turning on optimization gains a
factor of three or more in execution time, regardless of how the program
might have been coded originally at the source level.


With MSVC (6.0) and STL code (or just heavily templated code in
general), if I don't get 5 or 6 times the speed after optimization I'm
usually worried.

I'm not sure how you can ever say "regardless of how the program might
have been coded originally": there's always multiple ways of
implementing the same algorithm, and out of those methods, some will
benefit more from compiler optimization than others. Of course
choosing a sensible algorithm in the first place is still the
programmer's job: no compiler is ever going to make a bubble sort run
as fast as an insertion sort.
Jul 23 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by TF | last post: by
16 posts views Thread by JustSomeGuy | last post: by
12 posts views Thread by Steven T. Hatton | last post: by
4 posts views Thread by P Kenter | last post: by
5 posts views Thread by Rich S. | last post: by
3 posts views Thread by Lars Grobe | last post: by
6 posts views Thread by zl2k | last post: by
5 posts views Thread by mygulamali | last post: by
2 posts views Thread by Andrea Taverna | last post: by
1 post views Thread by CARIGAR | last post: by
1 post views Thread by Marin | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.