473,245 Members | 1,725 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,245 software developers and data experts.

much optimizing = bad code ? (array/vector)

Hi,

I have a question that you probably shouldn´t worry about since the
compiler cares for it, but anyways:

When you run your compiler with optimization turned on (eg. g++ with -Ox
flag) and your program gets significantly faster than without, did you
write bad code/ have a bad design?

Cause what happens in those optimization steps is, I think, mostly
rearranging of orders. If you had programmed without creating
unnecessary objects and the like it wouldn´t be much left for the
compiler to optimize.

And what is best to think about when trying to improve your code, what
optimization steps cannot be done by the compiler. For example I had a
lot of one line functions. When I inlined those, the resulting
executable got smaller but with no difference in running time or even
running time got worse.

Can you improve your code much by hand if you use STL objects? For this
I have some concrete data:
I have a program that used arrays, for flexibility I changed array to
vector. The vector version was unbelievable slow, so I didn´t access the
elements via [] but changed to iterators. This still was way too slow
and gprof showed that iterator functions consumed most of the time and I
changed to pointers. So here are the numbers for the different versions
compiled via
g++ -o target source.cpp:

array: 48,5 seconds
veciter: 3,5 minutes
vecpoint: 3 minutes

Since someone wrote some time ago that most compilers do a good job
optimizing vector routines I compiled with
g++ -O3 -o target source.cpp:

array: 15 seconds
veciter: 38 seconds
vecpoint: 36 seconds

So did I write initially bad code so that array could be improved more
than 3 times by a compiler? The very bad performance of the vector
versions you can´t blame (fully) on the code since the only change was
array to vector, but how can I try to optimize this code to get the
runtime down to the array version? For this I have to depend on the
compiler optimization I guess (otherwise I should stop programming cause
2+ minutes is a long way to go), but what does the compiler do anyway
and wouldn´t have a effect if I change by hand?

Thanks for any answer, I kind of got carried away from the initial question.

Best wishes, Hagen.
Jul 23 '05 #1
8 1913
Hagen wrote:
I have a question that you probably shouldn´t worry about since the
compiler cares for it, but anyways:

When you run your compiler with optimization turned on (eg. g++ with -Ox
flag) and your program gets significantly faster than without, did you
write bad code/ have a bad design?
[...]


No.

The explanation is simple: design and often code are governed not only
and not as much by the performance of the target application/program as
by the requirements for maintainability and extendability (which do
imply readability and understandability).

Optimisers, OTOH, exist solely for the purpose of catching what the
programmer didn't pay attention to because he/she was concerned with
other, more important things.

I would say, pay attention to how you code if after you use the compiler
optimisation switch, you still don't get good enough performance. Then,
use a performance analysis tool and improve your code in such way that
it doesn't drive you farther away from other goals, if they are more or
even equally important.

HTH

V
Jul 23 '05 #2
Hagen wrote:
When you run your compiler with optimization turned on (eg. g++ with -Ox flag) and your program gets significantly faster than without, did you write bad code/ have a bad design?
No, not at all! When optimizing, the compiler may detect common
subexpressions, propagate constants (i.e. compute things during
compile time because it realized that it does not change), really
inline functions rather than calling local versions, etc. There
is a host of stuff a compiler can do but typically does to avoid
unnecessary analysis durnig development where compilation speed
is crucial.
Cause what happens in those optimization steps is, I think, mostly
rearranging of orders.
I think, you think wrongly. Optimization is about lots of things.
Ordering processor operation such that the processor is optimally
used is one of them but definitely not the only thing.
array: 48,5 seconds
veciter: 3,5 minutes
vecpoint: 3 minutes


These results look odd: the vector version should actually be no
slower or at most only slightly slower than the array version.
Are you sure you don't compare apples to oranges? For example,
the following two excerpts will have grossly different performance
characteristics:

// excerpt 1:
int const repeat = 1000;
int const size = 100;
for (int r = 0; r < repeat; ++r)
{
int a[size];
for (int i = 0; i < size; ++i)
a[i] = i;
}

// excerpt 2:
int const repeat = 1000;
int const size = 100;
for (int r = 0; r < repeat; ++r)
{
std::vector<int> a(size);
for (int i = 0; i < size; ++i)
a[i] = i;
}

The second code fragment should be substantially slower than the
first one! This is, however, not directly related to the use of
built-in arrays vs. vector but because the second code uses a lot
of dynamic memory allocation and initializes/destructs its
elements while the first one does not (for arrays of PODs the
compiler just finds that neither the constructor nor the
destructor has to be called).

There are a few alternatives how to address this:
- The container could be hoisted out of the loop resulting in
just one pair of memory operations and one unnecessary
initialization.
- You could use a different container, e.g. 'std::tr1::array'
which behaves much more like a built-in array than 'std::vector'
does.

Since I don't know what your code looks like, I cannot tell for
sure that it is a problem like the one above. However, generally
there should be no huge difference between using built-in arrays
or 'std::vector' unless you use a fixed sized array. In this case
the 'std::vector' replacement is probably inappropriate and you
should consider a fixed-size array class like 'std::tr1::array'
instead (since the latter is not yet widely available, you can get
a similar class from <http://www.boost.org/>).
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting

Jul 23 '05 #3
Hagen wrote:
I have a question that you probably shouldn´t worry about
since the compiler [takes care of] it, but anyways:

When you run your compiler with optimization turned on
(e.g. g++ with -Ox [option])
and your program gets significantly faster than without,
did you write bad code and/or have a bad design?
Probably not.
If you write bad code, you are likely to frustrate the optimizer.
Cause what happens in those optimization steps is, I think,
mostly rearranging of orders.
Obviously, you don't know anything about optimization.
If you had programmed without creating unnecessary objects
If they are really unnecessary objects and the like,
a good optimizer will get rid of them.
[there] wouldn´t be much left for the compiler to optimize.

And what is best to think about [optimization]
when trying to improve your code? What optimization steps cannot be done by the compiler.
A C++ compiler does not choose algorithms.
You must select the best algorithms and implement them.
For example, I had a lot of one line functions.
When I inline'd [them], the resulting executable got smaller
but with no difference in running time or even running time got worse.
You probably screwed up.
But we'd need to see your code to tell you what you did wrong.
Can you improve your code much by hand if you use STL objects?
For this I have some concrete data:
I have a program that used arrays.
For flexibility, I changed array to vector.
The vector version was unbelievable slow
so I didn´t access the elements via [] but changed to iterators.
This still was way too slow and gprof showed that
iterator functions consumed most of the time and I changed to pointers.
So here are the numbers for the different versions compiled via

g++ -o target source.cpp:

array: 48,5 seconds
veciter: 3,5 minutes
vecpoint: 3 minutes

Since someone wrote some time ago that
most compilers do a good job optimizing vector routines,
I compiled with g++ -O3 -o target source.cpp:

array: 15 seconds
veciter: 38 seconds
vecpoint: 36 seconds

So did I write initially bad code
so that array could be improved more than 3 times by a compiler?
Probably.
The very bad performance of the vector
versions you can´t blame [entirely] on the code
since the only change was array to vector.
But how can I try to optimize this code
to get the runtime down to the array version?
For this, I have to depend on the compiler optimization I guess.
(Otherwise, I should stop programming cause 2+ minutes is a long way to go.)
But what does the compiler do anyway?
And wouldn´t have a effect if I change by hand?

Thanks for any answer, question.
I kind of got carried away from the initial.


It's pretty hard to figure out what you did wrong
unless you can show us your code.

Jul 23 '05 #4
"Victor Bazarov" <v.********@comAcast.net> wrote in message
news:uR******************@newsread1.mlpsca01.us.to .verio.net...
Optimisers, OTOH, exist solely for the purpose of catching what the
programmer didn't pay attention to because he/she was concerned with
other, more important things.


This claim may have been true at one time, but it doesn't apply to modern
computers.

On many modern machines, it is possible to make programs run much faster by
reordering code at the machine-instruction level so that the machine can
execute several operations in parallel. There is typically no way of
expressing such reorderings at the source-code level at all. Optimizers
also often perform other tasks, such as register allocation, over which the
language does not offer fine enough control to meet the needs of the
hardware.

I have seen a number of cases in which turning on optimization gains a
factor of three or more in execution time, regardless of how the program
might have been coded originally at the source level.
Jul 23 '05 #5
> Probably not.
If you write bad code, you are likely to frustrate the optimizer.
That's something I would like to see. ;-)
Cause what happens in those optimization steps is, I think, mostly
rearranging of orders.


Obviously, you don't know anything about optimization.


Changing the order to increase parallelism is OK, as long as invariance
rules are not neglected and the resulting precision is the same. These
things stand to a good reason, it would be intolerable that same expression
would evaluate to different value at random, this applies especially to
floating point and such.. when runtime performance is more critical than
precision some compilers naturally offer the choise between the two..

Hmmm, maybe an example is appropriate..

float computeX(float a, float b, float c)
{
return a + b * c;
}

Is not guaranteed to evaluate to the same values as:

float computeY(float a, float b, float c)
{
float v = b * c;
return a + v;
}

It *does* matter how you write the code, optimizer can do a lot of things
but what it can do is closely tied to knowing how the platform you are
working on actually works and how the language works.. that gives the
possibilities the compiler has, now the trick is to know what
transformations the compiler actually is programmed to perform..
If they are really unnecessary objects and the like,
a good optimizer will get rid of them.


Or it might not, depending on matter of possible side effects the compiler
might or might not be aware of. Usually doing instruction selection and
register allocation at linking time increases the chances of what you
describe taking place more frequently but is not guaranteed.
Jul 23 '05 #6
Dietmar Kuehl wrote:
Hagen wrote:
When you run your compiler with optimization turned on (eg. g++ with
-Ox
flag) and your program gets significantly faster than without, did


you
write bad code/ have a bad design?

No, not at all! When optimizing, the compiler may detect common
subexpressions, propagate constants (i.e. compute things during
compile time because it realized that it does not change), really
inline functions rather than calling local versions, etc. There
is a host of stuff a compiler can do but typically does to avoid
unnecessary analysis durnig development where compilation speed
is crucial.

Cause what happens in those optimization steps is, I think, mostly
rearranging of orders.

I think, you think wrongly. Optimization is about lots of things.
Ordering processor operation such that the processor is optimally
used is one of them but definitely not the only thing.

array: 48,5 seconds
veciter: 3,5 minutes
vecpoint: 3 minutes

These results look odd: the vector version should actually be no
slower or at most only slightly slower than the array version.
Are you sure you don't compare apples to oranges? For example,
the following two excerpts will have grossly different performance
characteristics:

// excerpt 1:
int const repeat = 1000;
int const size = 100;
for (int r = 0; r < repeat; ++r)
{
int a[size];
for (int i = 0; i < size; ++i)
a[i] = i;
}

// excerpt 2:
int const repeat = 1000;
int const size = 100;
for (int r = 0; r < repeat; ++r)
{
std::vector<int> a(size);
for (int i = 0; i < size; ++i)
a[i] = i;
}

The second code fragment should be substantially slower
than the first one! This is, however, not directly related
to the use of built-in arrays vs. vector
but because the second code uses a lot of dynamic memory allocation
and initializes/destructs its elements
while the first one does not
(for arrays of PODs the compiler just finds that
neither the constructor nor the destructor has to be called).


We can test your hypothesis:
cat excerpt.cc #include <vector>

int main(int argc, char* argv[]) {
int const repeat = 1024*1024;
int const size = 1024;
for (int r = 0; r < repeat; ++r) {
#ifdef UMMY
int* a = new int[size];
#else
std::vector<int> a(size);
#endif
for (int i = 0; i < size; ++i)
a[i] = i;
#ifdef UMMY
delete [] a;
#endif
}

return 0;
}
g++ -DUMMY -Wall -ansi -pedantic -O2 -o excerpt excerpt.cc
time ./excerpt 4.203u 0.007s 0:04.21 99.7% 0+0k 0+0io 0pf+0w g++ -Wall -ansi -pedantic -O2 -o excerpt excerpt.cc
time ./excerpt 11.492u 0.006s 0:11.50 99.9% 0+0k 0+0io 0pf+0w
There are a few alternatives how to address this:
- The container could be hoisted out of the loop resulting in
just one pair of memory operations and one unnecessary
initialization.
We can test that too:
cat excerpt.cc #include <vector>

int main(int argc, char* argv[]) {
int const repeat = 1024*1024;
int const size = 1024;
#ifdef UMMY
int* a = new int[size];
#else
std::vector<int> a(size);
#endif
for (int r = 0; r < repeat; ++r) {
for (int i = 0; i < size; ++i)
a[i] = i;
}
#ifdef UMMY
delete [] a;
#endif

return 0;
}
g++ -DUMMY -Wall -ansi -pedantic -O2 -o excerpt excerpt.cc
time ./excerpt 3.648u 0.006s 0:03.65 99.7% 0+0k 0+0io 0pf+0w g++ -Wall -ansi -pedantic -O2 -o excerpt excerpt.cc
time ./excerpt 3.670u 0.002s 0:03.67 100.0% 0+0k 0+0io 0pf+0w
- You could use a different container, e.g. 'std::tr1::array'
which behaves much more like a built-in array
than 'std::vector' does.

Since I don't know what your code looks like,
I cannot tell for sure that it is a problem like the one above.
However, generally, there should be no huge difference
between using built-in arrays or 'std::vector'
unless you use a fixed sized array.
In this case, the 'std::vector' replacement
is probably inappropriate and you should consider
a fixed-size array class like 'std::tr1::array' instead
(since the latter is not yet widely available,
you can get a similar class from <http://www.boost.org/>).

Jul 23 '05 #7
assaarpa wrote:
Hmmm, maybe an example is appropriate..

float computeX(float a, float b, float c) {
return a + b * c;
}

Is not guaranteed to evaluate to the same values as:

float computeY(float a, float b, float c) {
float v = b * c;
return a + v;
}

It *does* matter how you write the code, optimizer can do a lot of things
but what it can do is closely tied to knowing how the platform
[that] you are working on actually works and how the language works..
that gives the possibilities the compiler has,
now the trick is to know
what transformations the compiler actually is programmed to perform..


You need to elaborate.
Differences in the results returned by functions computeX and computeY
should *not* matter
or you have serious problems with your numerical analysis.
Jul 23 '05 #8
"Andrew Koenig" <ar*@acm.org> wrote in message news:<ja*******************@bgtnsc04-news.ops.worldnet.att.net>...

I have seen a number of cases in which turning on optimization gains a
factor of three or more in execution time, regardless of how the program
might have been coded originally at the source level.


With MSVC (6.0) and STL code (or just heavily templated code in
general), if I don't get 5 or 6 times the speed after optimization I'm
usually worried.

I'm not sure how you can ever say "regardless of how the program might
have been coded originally": there's always multiple ways of
implementing the same algorithm, and out of those methods, some will
benefit more from compiler optimization than others. Of course
choosing a sensible algorithm in the first place is still the
programmer's job: no compiler is ever going to make a bubble sort run
as fast as an insertion sort.
Jul 23 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: TF | last post by:
I have a fixed array of vectors like: vector<string> a_vStr; Then I grow each vector item and use it later. Everything works fine but I am curious about following potential problem. When we...
16
by: JustSomeGuy | last post by:
I have a routine that evaluates a polynomial equation that have 3 variables x,y,z of orders 1,2,3 the coefficients of the polynomial are in an array. This routine is quite slow and I'd like to...
12
by: Steven T. Hatton | last post by:
This is something I've been looking at because it is central to a currently broken part of the KDevelop new application wizard. I'm not complaining about it being broken, It's a CVS images. ...
4
by: P Kenter | last post by:
Dear all. I'm currently writing unit tests in C++ and am checking 2D or 3D 'arrays' for their dimension. These 'arrays' are implemented as vectors of vectors. The problem is, the vectors of a...
5
by: Rich S. | last post by:
Hi, Is the code below the best way to have the less than function for an 80 bit bitset or is there something faster / better? When I populate this map with millions (... and millions) of sets...
3
by: Lars Grobe | last post by:
Hi, first hello, I am new to the list, and I guess my question will show that clearly. I want to use some vector operations (at the moment altivec) in existing code. It is a raytracing-based...
6
by: zl2k | last post by:
hi, there I am using a big, sparse binary array (size of 256^3). The size may be changed in run time. I first thought about using the bitset but found its size is unchangeable. If I use the...
5
by: mygulamali | last post by:
Hi all! So I'm trying to determine how much memory is being used by the following class member in my code: vector<map<unsigned int, double Would the snippet below be the right way of going...
2
by: Andrea Taverna | last post by:
Hello everyone, I wrote a bunch of recursive functions to operate on multi-dimensional matrices. The matrices are allocated dynamically in a non-contiguous way, i.e. as an array of pointers...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, youll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.