Optimiser question

Dave S

Hi All,
I have been given some code to wok on. It relies heavily on the
optimiser to run at the correct speed. With this in mind I have been
loking through it to see if I can help it out a bit. I have limitied
knowledge of how compilers works, but I understand that some
constructs optimise easier / better. We are using a variant of gcc
(3.4.1 I think) on Nios II platform, if that makes a difference.
The question I have is are these 2 snippets functionally the same( I
think they are), and would the optimiser prefer the second one:
First snippet, as it is now:

while(count-->0)
{
*pDest++=nPattern;
}

how I propose to 'improve' the code:

while(count>0)
{
*pDest=nPattern;
++pDest;
--count;
}

There are numerous bits a bit like this, so this would probably be a
big help overall if I am correct.

cheers

Dave

Dec 13 '07 #1

Subscribe Post Reply

1738

Johannes Bauer

Dave S schrieb:

The question I have is are these 2 snippets functionally the same( I
think they are), and would the optimiser prefer the second one:

Seriously, why don't you just try it out? Compile both pieces of code,
do an objdump -d on them as take a look at the assembly output. Nobody
but you will be able to give you a precise answer considering that
you're using unusual-platform-3.4.1-i-think-gcc as a compiler.

Greetings,
Johannes

--
"Viele der Theorien der Mathematiker sind falsch und klar
GotteslÃ¤sterlich. Ich vermute, dass diese falschen Theorien genau
deshalb so geliebt werden." -- Prophet und VisionÃ¤r Hans Joss aka
HJP in de.sci.mathematik <47**********************@news.sunrise.ch>

Dec 13 '07 #2

Mark Bluemel

Dave S wrote:

Hi All,
I have been given some code to wok on. It relies heavily on the
optimiser to run at the correct speed. With this in mind I have been
loking through it to see if I can help it out a bit. I have limitied
knowledge of how compilers works, but I understand that some
constructs optimise easier / better. We are using a variant of gcc
(3.4.1 I think) on Nios II platform, if that makes a difference.
The question I have is are these 2 snippets functionally the same( I
think they are), and would the optimiser prefer the second one:
First snippet, as it is now:

while(count-->0)
{
*pDest++=nPattern;
}

how I propose to 'improve' the code:

while(count>0)
{
*pDest=nPattern;
++pDest;
--count;
}

The two ways you could investigate this are
a) measuring
or
b) examining the generated code

I'm no expert, but I doubt that you will find that such simple changes
would reap significant benefits.

In general, I would suspect you'd be better off looking at
a) using a higher optimisation level on the compiler
b) seeing if a recent compiler build has better optimisation
c) looking at your algorithms (rather than your code)

Dec 13 '07 #3

Chris Dollin

Dave S wrote:

Hi All,
I have been given some code to wok on. It relies heavily on the
optimiser to run at the correct speed. With this in mind I have been
loking through it to see if I can help it out a bit. I have limitied
knowledge of how compilers works, but I understand that some
constructs optimise easier / better. We are using a variant of gcc
(3.4.1 I think) on Nios II platform, if that makes a difference.
The question I have is are these 2 snippets functionally the same( I
think they are), and would the optimiser prefer the second one:
First snippet, as it is now:

while(count-->0)
{
*pDest++=nPattern;
}

how I propose to 'improve' the code:

while(count>0)
{
*pDest=nPattern;
++pDest;
--count;
}

The first snippet is more idiomatically C and so no less likely to
be optimised than the second.

If the program isn't running fast enough, and you wish to make it
go faster, the first thing you /must/ do is /find out where the
time goes/. For the gods sake don't just trawl through the program
looking for bits you think you can help with: /find out/ which bits
are eating the time using whatever profiling tools the platform
has available. Finding out which are the slow bits and doing an
/algorithmic/ improvement will get much better results. Even just
finding the slow bits and improving the code will help.

You don't say what the program does, so it's hard to guess what
might be cycle-sinks, but if it does any string-hacking, remember
that `strlen` and `strcat` (as examples) are not constant-time
operations ...

--
Chris "who knows where the time goes?" Dollin

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Dec 13 '07 #4

Ben Bacarisse

Dave S <da*************@bem.fki-et.comwrites:

I have been given some code to wok on. It relies heavily on the
optimiser to run at the correct speed. With this in mind I have been
loking through it to see if I can help it out a bit. I have limitied
knowledge of how compilers works, but I understand that some
constructs optimise easier / better. We are using a variant of gcc
(3.4.1 I think) on Nios II platform, if that makes a difference.
The question I have is are these 2 snippets functionally the same( I
think they are),

No. See below.

and would the optimiser prefer the second one:

I can't see why, but there are real compiler experts here how may know
better.

First snippet, as it is now:

while(count-->0)
{
*pDest++=nPattern;
}

how I propose to 'improve' the code:

while(count>0)
{
*pDest=nPattern;
++pDest;
--count;
}

The original decrements 'count' even when the condition is false so they
terminate with different values in 'count'.

If 'sizeof *pDest' == 1 (i.e. if 'pDest' it is some form of 'char *')
consider using memset.

--
Ben.

Dec 13 '07 #5

Dave S

On Dec 13, 3:01 pm, Chris Dollin <chris.dol...@hp.comwrote:

Dave S wrote:
Hi All,
I have been given some code to wok on. It relies heavily on the
optimiser to run at the correct speed. With this in mind I have been
loking through it to see if I can help it out a bit. I have limitied
knowledge of how compilers works, but I understand that some
constructs optimise easier / better. We are using a variant of gcc
(3.4.1 I think) on Nios II platform, if that makes a difference.
The question I have is are these 2 snippets functionally the same( I
think they are), and would the optimiser prefer the second one:
First snippet, as it is now:

while(count-->0)
{
*pDest++=nPattern;
}

how I propose to 'improve' the code:

while(count>0)
{
*pDest=nPattern;
++pDest;
--count;
}

The first snippet is more idiomatically C and so no less likely to
be optimised than the second.

If the program isn't running fast enough, and you wish to make it
go faster, the first thing you /must/ do is /find out where the
time goes/. For the gods sake don't just trawl through the program
looking for bits you think you can help with: /find out/ which bits
are eating the time using whatever profiling tools the platform
has available. Finding out which are the slow bits and doing an
/algorithmic/ improvement will get much better results. Even just
finding the slow bits and improving the code will help.

You don't say what the program does, so it's hard to guess what
might be cycle-sinks, but if it does any string-hacking, remember
that `strlen` and `strcat` (as examples) are not constant-time
operations ...

--
Chris "who knows where the time goes?" Dollin

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England- Hide quoted text -

- Show quoted text -

Thanks all,
The code is used for embedded real time control. It runs fast enough,
but only with the compiler optimisations turned up to maximum and no
debug info (although I dont think debug info and optimisations mix
anyway?). The algorithms have already been tweaked,and the
hardware(FPGA) is also similarly optimised. Im at a slight loose end
whilst I wait for the finalised target hardware (actual circuit
board), so I thought I would go and 'tidy up' and see if I could help
the compiler out any. The tidy up was provoked by an article in the
IET Electronics magazine where ++i is esier to optimise than i++ due
to the need to store i and then increment it. (think I got that
correct). As I dont know really how optimsers work I thought Id ask
the question.

cheers
Dave

Dec 13 '07 #6

Dave S

On Dec 13, 3:11 pm, Ben Bacarisse <ben.use...@bsb.me.ukwrote:

Dave S <david.sander...@bem.fki-et.comwrites:
I have been given some code to wok on. It relies heavily on the
optimiser to run at the correct speed. With this in mind I have been
loking through it to see if I can help it out a bit. I have limitied
knowledge of how compilers works, but I understand that some
constructs optimise easier / better. We are using a variant of gcc
(3.4.1 I think) on Nios II platform, if that makes a difference.
The question I have is are these 2 snippets functionally the same( I
think they are),

No. See below.

and would the optimiser prefer the second one:

I can't see why, but there are real compiler experts here how may know
better.

First snippet, as it is now:

while(count-->0)
{
*pDest++=nPattern;
}

how I propose to 'improve' the code:

while(count>0)
{
*pDest=nPattern;
++pDest;
--count;
}

The original decrements 'count' even when the condition is false so they
terminate with different values in 'count'.

Yes, in this instance it makes no odds, but in another it could.

If 'sizeof *pDest' == 1 (i.e. if 'pDest' it is some form of 'char *')
consider using memset.

I might normally use a bzero or similar, but this is a set of
structures and pointers to structures.

Dave

Dec 13 '07 #7

Chris Dollin

Dave S wrote:

Thanks all,
The code is used for embedded real time control. It runs fast enough,
but only with the compiler optimisations turned up to maximum and no
debug info (although I dont think debug info and optimisations mix
anyway?). The algorithms have already been tweaked,and the
hardware(FPGA) is also similarly optimised.

You need measurements (and, perhaps, well-grounded estimates) to find
out what's taking the time. Really. Otherwise you're just whistling
in the dark. If performance is important to you, some investment in
measurement seems worthwhile.

If you can't instrument the embedded application, is it /possible/ to
compile on say an x86 desktop and use that platform's profiling tools
to get an idea of where to look? Clearly you'd have to fake out the
FPGA stuff, and that might make a nonsense of the results; I don't do
embedded, so I don't have a feel for it, but some here do and might
be able to advise.

The tidy up was provoked by an article in the
IET Electronics magazine where ++i is esier to optimise than i++ due
to the need to store i and then increment it.

It may be "easier" to optimise, but that doesn't matter; C compilers
that can't generate the "best" output just because you've used `i++`
rather than `++i` have learned nothing from the past twenty years.

[In C++, where `i` might be some big class instance, I understand that
it can make rather more difference. But in C? Unlikely. Not impossible;
just unlikely, given that you're using a gcc variant.]

--
Chris "i += 1" Dollin

Hewlett-Packard Limited registered no:
registered office: Cain Road, Bracknell, Berks RG12 1HN 690597 England

Dec 13 '07 #8

Philip Potter

Dave S wrote:

The tidy up was provoked by an article in the
IET Electronics magazine where ++i is esier to optimise than i++ due
to the need to store i and then increment it. (think I got that
correct).

This is often said, but mainly by people who are talking about C++, and
even then only when i is a class type rather than a POD (plain old
data). If i is an int, then i++; and ++i; are equivalent statements, and
any compiler worth its salt knows this.

Your compiler may not know this optimization, but unless you can prove
that it doesn't, I will assume that it does. Similarly for any other
common peephole optimization trick.

Dec 13 '07 #9

Ben Bacarisse

Dave S <da*************@bem.fki-et.comwrites:

On Dec 13, 3:11 pm, Ben Bacarisse <ben.use...@bsb.me.ukwrote:
>Dave S <david.sander...@bem.fki-et.comwrites:
I have been given some code to wok on. It relies heavily on the
optimiser to run at the correct speed.

<snip>

First snippet, as it is now:

while(count-->0)
{
*pDest++=nPattern;
}

how I propose to 'improve' the code:

while(count>0)
{
*pDest=nPattern;
++pDest;
--count;
}

<snip>

I might normally use a bzero or similar, but this is a set of
structures and pointers to structures.

Then you might consider Duff's device:

http://www.lysator.liu.se/c/duffs-device.html

(view on an empty stomach if it is new to you) but, as always, measure
before investing any effort in tricksy code changes.

--
Ben.

Dec 13 '07 #10

Flash Gordon

Dave S wrote, On 13/12/07 15:53:

<snip>

>If 'sizeof *pDest' == 1 (i.e. if 'pDest' it is some form of 'char *')
consider using memset.

I might normally use a bzero or similar, but this is a set of
structures and pointers to structures.

I would strongly suggest using memset rather than bzero for the simple
reason that memset is part of standard C and so available almost
everywhere (non-hosted implementations are not required to provide it,
but are likely to anyway).
--
Flash Gordon

Dec 13 '07 #11

Dave S

On 13 Dec, 16:55, Philip Potter <p...@doc.ic.ac.ukwrote:

Dave S wrote:
The tidy up was provoked by an article in the
IET Electronics magazine where ++i is esier to optimise than i++ due
to the need to store i and then increment it. (think I got that
correct).

This is often said, but mainly by people who are talking about C++, and
even then only when i is a class type rather than a POD (plain old
data). If i is an int, then i++; and ++i; are equivalent statements, and
any compiler worth its salt knows this.

Your compiler may not know this optimization, but unless you can prove
that it doesn't, I will assume that it does. Similarly for any other
common peephole optimization trick.

Evening all,
I guess as most of this is POD Ill find something else to do
tommorow...
I usually assume the compiler writer knows better than I do about
compilers.
If I have to 'hand optimise' things they are usually in assembler
anyway <shrug>

cheers
Dave

Dec 13 '07 #12

Mark F. Haigh

On Dec 13, 8:06 am, Chris Dollin <chris.dol...@hp.comwrote:

Dave S wrote:
Thanks all,
The code is used for embedded real time control. It runs fast enough,
but only with the compiler optimisations turned up to maximum and no
debug info (although I dont think debug info and optimisations mix
anyway?). The algorithms have already been tweaked,and the
hardware(FPGA) is also similarly optimised.

You need measurements (and, perhaps, well-grounded estimates) to find
out what's taking the time. Really. Otherwise you're just whistling
in the dark. If performance is important to you, some investment in
measurement seems worthwhile.

If you can't instrument the embedded application, is it /possible/ to
compile on say an x86 desktop and use that platform's profiling tools
to get an idea of where to look? Clearly you'd have to fake out the
FPGA stuff, and that might make a nonsense of the results; I don't do
embedded, so I don't have a feel for it, but some here do and might
be able to advise.

<snip>

Typically, something like the following is implemented when there's no
native profiling support:

1. An inline function is written that returns a high-resolution
timestamp value. Typically this will be a CPU timestamp counter or
cycle counter.

2. An array of arrays is created for holding recorded timestamp
values. The first dimension is the index of the particular
measurement, the other is the current trial number.

3. Temporary code is inserted into the codebase that looks like this:

start_profile(MAIN, trial);

start_profile(FOO, trial);
foo();
stop_profile(FOO, trial);

start_profile(BAR, trial);
bar();
stop_profile(BAR, trial);

stop_profile(MAIN, trial);

4. When the desired number of trials is done, you print out a
delimited list, which you can analyze by hand or with a spreadsheet.
The start_profile function simply stores the current timestamp value
to the appropriate array element. The stop_profile function
overwrites the value in the array with the difference between
"current" and the previous value.

It'll take a couple of passes to figure out where the bottlenecks
are.

Mark F. Haigh
mf*****@sbcglobal.net

Dec 13 '07 #13

Jack Klein

On Thu, 13 Dec 2007 07:51:17 -0800 (PST), Dave S
<da*************@bem.fki-et.comwrote in comp.lang.c:

[snip]

Thanks all,
The code is used for embedded real time control. It runs fast enough,
but only with the compiler optimisations turned up to maximum and no
debug info (although I dont think debug info and optimisations mix
anyway?). The algorithms have already been tweaked,and the

Whether or not optimizations and debug info mix is up to the tool set,
not the language.

hardware(FPGA) is also similarly optimised. Im at a slight loose end
whilst I wait for the finalised target hardware (actual circuit
board), so I thought I would go and 'tidy up' and see if I could help
the compiler out any. The tidy up was provoked by an article in the
IET Electronics magazine where ++i is esier to optimise than i++ due
to the need to store i and then increment it. (think I got that
correct). As I dont know really how optimsers work I thought Id ask
the question.

In the first place, if you want to talk about debugging embedded
systems, and also working with a soft core processor architecture that
only exists on one vendor's brand of FPGA, you really ought to be
discussing it in news:comp.arch.embedded. There will be people there
who have actually used NIOS.

And they can also tell you all about ways to time code tests for
optimization, like toggling an i/o pin and measuring execution time
with an oscilloscope.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html

Dec 13 '07 #14

christian.bau

Telling us the "platform" is rather pointless. Telling us which
processor you are using would be a bit more useful.

There is nowhere near enough information. What types are involved? How
many iterations of the loop?

Get a disassembly of the code. Does the compiler do any loop
unrolling? If not, consider doing that yourself.

How many operations do you perform in each iteration? My count is one
store, one pointer increment, one integer decrement, one comparison. A
good compiler will get that down to three operations instead of four.
If not, you should get it down to three in your source code. Loop
unrolling with a factor eight would get it down to 1.25 operations.

Before you start changing source code everywhere, try designing a
macro that can perform this kind of loop operation and can be modified
in one place so you can try out what gives best results.

Dec 13 '07 #15

Stephen Sprunk

"Dave S" <da*************@bem.fki-et.comwrote in message
news:cc**********************************@e25g2000 prg.googlegroups.com...

The tidy up was provoked by an article in the IET Electronics
magazine where ++i is esier to optimise than i++ due to the need
to store i and then increment it.

Unless you're taking the value of the expression (in which case they're not
interchangeable), any modern compiler will reduce them to the same thing.

This sort of advice was frequently given in the 70s and 80s, back when
compilers barely optimized (if at all), but it's worse than useless today
and you should be suspicious of anything else from the same source.

(Note that the answer may be different for C++, Java, etc., but you asked in
a C group.)

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Dec 14 '07 #16

Stephen Sprunk

"Dave S" <da*************@bem.fki-et.comwrote in message
news:62**********************************@d21g2000 prf.googlegroups.com...

I have been given some code to wok on. It relies heavily on the
optimiser to run at the correct speed. With this in mind I have been
loking through it to see if I can help it out a bit. I have limitied
knowledge of how compilers works, but I understand that some
constructs optimise easier / better. We are using a variant of gcc
(3.4.1 I think) on Nios II platform, if that makes a difference.
The question I have is are these 2 snippets functionally the same( I
think they are), and would the optimiser prefer the second one:
First snippet, as it is now:

while(count-->0)
{
*pDest++=nPattern;
}

how I propose to 'improve' the code:

while(count>0)
{
*pDest=nPattern;
++pDest;
--count;
}

Other than ending with a different value in "count", these two sequences
should produce identical code with any decent compiler. If your compiler
isn't decent, spend your time figuring out how to replace it, not accomodate
its failings.

I don't know NIOS, but your GCC appears to be rather old. Assuming support
for that platform hasn't been dropped in more recent versions, you should
upgrade to get the benefit of improved optimizations in later releases.

There are numerous bits a bit like this, so this would probably be a
big help overall if I am correct.

Not really; micro-optimizations like this are generally useless on modern
compilers and your attempt to "help" may subtly change the meaning in a way
that causes the code to be slower or even behave incorrectly. Write the
clearest, most maintainable code you can and let the compiler worry about
making it fast.

What you need to focus on are macro-optimizations, where you consider things
like data structures, algorithms, etc. This should be guided by actually
measuring where the performance problems are and taking a high-level view on
what would reduce the total amount of work to be done -- not how to make the
same amount of work go faster.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Dec 14 '07 #17

Robert Latest

Chris Dollin wrote:

You don't say what the program does, so it's hard to guess what
might be cycle-sinks, but if it does any string-hacking, remember
that `strlen` and `strcat` (as examples) are not constant-time

Depending on the implementation they may be. Nothing stops a compiler from
generating additional code that keeps track of string lengths.

robert

Dec 14 '07 #18

Robert Latest

Dave S wrote:

The tidy up was provoked by an article in the
IET Electronics magazine where ++i is esier to optimise than i++ due
to the need to store i and then increment it. (think I got that
correct).

That's an old myth. If the result of the "++i" or "i++" isn't needed,
any compiler will recognize this and generate identical code. If you do need
the side-effect of post-increment, the above "disadvantage" of i++ vs ++i
vanishes anyway. Think about it: "n[i++]=0" and "n[i]=0;++i" are exactly
equivalent. As long as the C standard doesn't say: "pre-increment is faster
than post-increment" we can't universally answer your question.

As I dont know really how optimsers work I thought Id ask
the question.

As far as the C language itself is concerned, nobody knows what an optimizer
is. Your problem is an algorithm/implementation problem, not a language
problem.

robert

Dec 14 '07 #19

Chris Dollin

Robert Latest wrote:

Chris Dollin wrote:

>You don't say what the program does, so it's hard to guess what
might be cycle-sinks, but if it does any string-hacking, remember
that `strlen` and `strcat` (as examples) are not constant-time

Depending on the implementation they may be. Nothing stops a compiler from
generating additional code that keeps track of string lengths.

Nothing except that if the char*s being tested might be widely accessible
and there are any functions called between tests the compiler can't assume
that the result is unchanged.

I agree that compilers /can/, in principle, do this. What's not so clear
is (a) whether any of them do and (b) how widely applicable the optimisation
is.

/Assuming/ that `strlen` (say) is a constant-time operation is ... sub-optimal.
(I'm not saying you're making that assumption.)

--
Chris "constant time my railway-timetable <redacted>" Dollin

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Dec 14 '07 #20

CBFalconer

Philip Potter wrote:

Dave S wrote:

>The tidy up was provoked by an article in the IET Electronics
magazine where ++i is esier to optimise than i++ due to the need
to store i and then increment it. (think I got that correct).

This is often said, but mainly by people who are talking about C++,
and even then only when i is a class type rather than a POD (plain
old data). If i is an int, then i++; and ++i; are equivalent
statements, and any compiler worth its salt knows this.

Your compiler may not know this optimization, but unless you can
prove that it doesn't, I will assume that it does. Similarly for
any other common peephole optimization trick.

No, they are not equivalent. They differ in the value to preserve
for the complete expression. If that value is discarded, then it
doesn't matter, and the optimizer will usually reduce the
statements to the same code. Since the prefix ++ (or --) operator
yields the same value as the final operand value, it may be more
efficient when the expression value is used.

--
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>
Try the download section.

--
Posted via a free Usenet account from http://www.teranews.com

Dec 14 '07 #21

CBFalconer

Dave S wrote:

>

.... snip ...

>
First snippet, as it is now:

while(count-- 0) {
*pDest++ = nPattern;
}

how I propose to 'improve' the code:

while(count 0) {
*pDest = nPattern;
++pDest;
--count;
}

There are numerous bits a bit like this, so this would probably
be a big help overall if I am correct.

Use spaces, for clarity and accuracy, as I have altered your quote.

Your modification, if not removed by the optimizer, would be less
efficient. It requires loading the count variable twice. You
probably won't be able to beat the compiler.

--
Merry Christmas, Happy Hanukah, Happy New Year
Joyeux Noel, Bonne Annee.
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Dec 14 '07 #22

Chris Dollin

CBFalconer wrote:

Philip Potter wrote:

>This is often said, but mainly by people who are talking about C++,
and even then only when i is a class type rather than a POD (plain
old data). If i is an int, then i++; and ++i; are equivalent
statements, and any compiler worth its salt knows this.

No, they are not equivalent. They differ in the value to preserve
for the complete expression.

Which is why Philip specifically said that `i++;` and `++i;` were
equivalent; they're statements that contain /complete/ expressions.

--
Chris "you never gets nowhere if you're too hasty" Dollin

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Dec 14 '07 #23

Stephen Sprunk

"Chris Dollin" <ch**********@hp.comwrote in message
news:fj**********@tadcaster.hpl.hp.com...

Robert Latest wrote:
>Chris Dollin wrote:
>>You don't say what the program does, so it's hard to guess what
might be cycle-sinks, but if it does any string-hacking, remember
that `strlen` and `strcat` (as examples) are not constant-time

Depending on the implementation they may be. Nothing stops a
compiler from generating additional code that keeps track of
string lengths.

Nothing except that if the char*s being tested might be widely
accessible and there are any functions called between tests the
compiler can't assume that the result is unchanged.

.... unless the compiler has interprocedual (aka whole-program) optimization,
where it _can_ determine if the string won't be changed. Of course, if the
compiler is capable of that, it should also be smart enough to remove the
second (and later) calls to strlen() via CSE without having to track the
length explicitly.

I agree that compilers /can/, in principle, do this. What's not so
clear is (a) whether any of them do and (b) how widely applicable
the optimisation is.

A few do, or at least attempt to. I'm not sure this specific example is all
that helpful, but IPO and WPO can provide significant gains.

/Assuming/ that `strlen` (say) is a constant-time operation is ... sub-
optimal. (I'm not saying you're making that assumption.)

C90/99 don't say it's constant-time, so it's a flawed assumption from the
get-go. Of course, the ISO standard offers very little in the way of
performance guarantees or even guidance, so the lack of a statement isn't
all that meaningful.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Dec 14 '07 #24

CBFalconer

Chris Dollin wrote:

CBFalconer wrote:
>Philip Potter wrote:

>>This is often said, but mainly by people who are talking about C++,
and even then only when i is a class type rather than a POD (plain
old data). If i is an int, then i++; and ++i; are equivalent
statements, and any compiler worth its salt knows this.

>No, they are not equivalent. They differ in the value to preserve
for the complete expression.

Which is why Philip specifically said that `i++;` and `++i;` were
equivalent; they're statements that contain /complete/ expressions.

They have the equivalent effects on i, but return different
values. That affects the if statement action.

--
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>
Try the download section.

--
Posted via a free Usenet account from http://www.teranews.com

Dec 14 '07 #25

Willem

CBFalconer wrote:
) Chris Dollin wrote:
)Which is why Philip specifically said that `i++;` and `++i;` were
)equivalent; they're statements that contain /complete/ expressions.
)
) They have the equivalent effects on i, but return different
) values. That affects the if statement action.

Statements don't return values. Expressions do.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Dec 15 '07 #26

Philip Potter

CBFalconer wrote:

Philip Potter wrote:
>Dave S wrote:

>>The tidy up was provoked by an article in the IET Electronics
magazine where ++i is esier to optimise than i++ due to the need
to store i and then increment it. (think I got that correct).
This is often said, but mainly by people who are talking about C++,
and even then only when i is a class type rather than a POD (plain
old data). If i is an int, then i++; and ++i; are equivalent
statements, and any compiler worth its salt knows this.

Your compiler may not know this optimization, but unless you can
prove that it doesn't, I will assume that it does. Similarly for
any other common peephole optimization trick.

No, they are not equivalent. They differ in the value to preserve
for the complete expression. If that value is discarded, then it
doesn't matter, and the optimizer will usually reduce the
statements to the same code. Since the prefix ++ (or --) operator
yields the same value as the final operand value, it may be more
efficient when the expression value is used.

Although Chris Dollin has said why I feel my original statement was
correct, you've shown that it has the capacity to confuse. What I meant
to say was that, in a standalone statement, i++ is the same as ++i.
CBFalconer is right that if i++ or ++i appear as part of a larger
expression in which the return value is used, they are *not*
equivalent[1] - but then they're not complete statements.

Philip

[1] Actually, they can be equivalent even when their return value is used:
i = i++;
is equivalent to
i = ++i;
....because both have undefined behaviour ;)

Dec 15 '07 #27

lawrence.jones

Willem <wi****@stack.nlwrote:

>
Statements don't return values. Expressions do.

What about the "return" statement? ;-)

-Larry Jones

Nobody knows how to pamper like a Mom. -- Calvin

Dec 15 '07 #28

Willem

la************@siemens.com wrote:
) Willem <wi****@stack.nlwrote:
)>
)Statements don't return values. Expressions do.
)
) What about the "return" statement? ;-)

The return statement never even returns (to the function). ;-)
It causes an exit (out of the function).
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Dec 16 '07 #29

Nick Keighley

On 15 Dec, 09:34, Philip Potter <p...@doc.ic.ac.ukwrote:

CBFalconer wrote:
Philip Potter wrote:
Dave S wrote:

[1] Actually, they can be equivalent even when their return value is used:
i = i++;
is equivalent to
i = ++i;
...because both have undefined behaviour ;)- Hide quoted text -

buts that's on par with comparing NaNs.
Both statements may exhibit UB, bit it may be *different*
UB.

--
Nick Keighley

Dec 17 '07 #30

Philip Potter

Nick Keighley wrote:

On 15 Dec, 09:34, Philip Potter <p...@doc.ic.ac.ukwrote:
>CBFalconer wrote:
>>Philip Potter wrote:
Dave S wrote:

>[1] Actually, they can be equivalent even when their return value is used:
i = i++;
is equivalent to
i = ++i;
...because both have undefined behaviour ;)- Hide quoted text -

buts that's on par with comparing NaNs.
Both statements may exhibit UB, bit it may be *different*
UB.

I thought of this (even the comparison with NaN) after posting. You are
quite correct. The equivalence argument loses all credibility when one
considers that a particular implementation is free to define each
expression to mean different things.

Dec 17 '07 #31

Kevin D. Quitt

On Thu, 13 Dec 2007 06:41:46 -0800 (PST), Dave S
<da*************@bem.fki-et.comwrote:

>We are using a variant of gcc
(3.4.1 I think) on Nios II platform, if that makes a difference.

All the difference in the world. I just finished a product using the Nios
II. One of my problems was that several of my code snippets were *too
fast* and interfered in the operation of DMA. (>It *says* OT in the
subject! <<)

Feel free to carry this on in email, though, as micro-optimization really
doesn't belong here.

The fastest loop you can do what you're doing (with no further information)
is:

pEnd = pDest + count;

while ( pDest < pEnd )
*pDest++ = nPattern;

With more information, this loop can be sped up, possibly by almost four
times. As I said, feel free to contact me by email.
--
#include <standard.disclaimer>
_
Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up

Dec 20 '07 #32

Optimiser question

Similar topics