optimizers are overrated

copx

Optimizers are overrated

I started learning ASM not long ago to improve my understanding of the
hardware architecture and my ability to optimize C code. The results of my
first experiment were surprising to say at least. After reading the chapter
on loops in my ASM book I wanted to test whether modern C compilers are
actually as smart as commonly claimed. I chose a most simple loop: calling
putchar 100 times. The first function (foo) uses a typical C style loop to
test the assumption that "the compiler will optimize that better than any
human could". The second function (bar) is based my newly gained knowledge,
the loop is basically ASM written in C. Now I am certain if I asked here
which one is more efficient all you guys would reply "the compiler will most
likely generate the same code in both cases" (I have read such claims
countless times here). Well, look at the ASM output below to see how wrong
your assumption is.

/* C Code */

void foo(void)
{
int i;

for (i = 0; i < 100; i++) putchar('a');
}
void bar(void)
{
int i = 100;

do {
putchar('a');
} while (--i);
}

As I said, damn simple. No nasty side effects, no access to global
variables, etc. The optimizer has no excuses. It should generate optimial
code in both cases. But see the result:
/* GCC 4.3.0 on x86/Windows, -O2 */

/* foo */
L7:
subl $12, %esp
pushl $97
call _putchar

incl %ebx
addl $16, %esp
cmpl $100, %ebx
jne L7
/* bar */
L2:
subl $12, %esp
pushl $97
call _putchar
addl $16, %esp

decl %ebx
jne L2
Comment: See, even the most recent version of the probably most widely used
compiler can not correctly optimize a most simple loop! At least GCC
understood the bar loop, so my "write C like ASM" optimization worked.

At this point you might wonder what horrible things an average C compiler
will do when GCC already fails so badly. Here is the gruesome result:
/* lccwin32, optimize on */

/* foo */
_$4:
pushl $97
call _putchar
popl %ecx

incl %edi
cmpl $100,%edi
jl _$4
/* bar */
_$10:
pushl $97
call _putchar
popl %ecx

movl %edi,%eax
decl %eax
movl %eax,%edi
or %eax,%eax
jne _$10
Comment: lcc is unable to optimize the loop just like GCC, but it adds
insults to injury by actually generating worse code for the ASM-style loop!
So you cannot even optimize the loop yourself!
/* MS Visual C++ 6 /O2 */

For this compiler I had to replace the putchar call with a call to a custom
my_putchar function otherwise the compiler replaces the putchar calls with
direct OS API stuff. While this is a good optimization it is not the
subject of this test, and only makes the resulting asm harder to read, so I
supressed that.
/* foo */

jmp SHORT $L833
$L834:
mov eax, DWORD PTR _i$[ebp]
add eax, 1
mov DWORD PTR _i$[ebp], eax
$L833:
cmp DWORD PTR _i$[ebp], 100
jge SHORT $L835

push 97
call _my_putchar
add esp, 4

jmp SHORT $L834
$L835:
/* bar */

$L840:
push 97
call _my_putchar
add esp, 4

mov eax, DWORD PTR _i$[ebp]
sub eax, 1
mov DWORD PTR _i$[ebp], eax
cmp DWORD PTR _i$[ebp], 0
jne SHORT $L840
Comment: Amazingly enough, this compiler has found yet another way to screw
up. Would you have thought that each compiler generates different code for
such a simple construct?
I hope you agree that the compiler of the beast deserves the award "Worst of
Show" for this mess. Are MS compilers still this bad?

Jun 27 '08

Subscribe Post Reply

2350

Richard Tobin

In article <48***************@bytecraft.com>,
Walter Banks <wa****@bytecraft.comwrote:

>The for and while implementations have different meaning.

No, they have the same meaning. Used in other contexts - in
particular, where the loop variable is used afterwards - they
would have different meanings.

-- Richard
--
:wq

Jun 27 '08 #51

Eric Sosman

Charlton Wilbur wrote:

>>>>>"ES" == Eric Sosman <Er*********@sun.comwrites:

ES<off-topic>

ESA quartet of kazooists, my teacher among
ESthem, then gave the first "original instruments" performance
ESof this long-misunderstood masterpiece of, er, casuistry.

Did they seriously perform something from Der Kunst der Fuge on kazoo?

They did, indeed. "Daaaah, daaaah, daaaah, daaaah, daaaah,
daah daah daaaaaah, dah dah dah daaaaaah," and so on.

That takes an incredible amount of skill to get reasonably right.

"Reasonably right?" Where did *that* notion come from?
They were about as right as (violent wrench towards topicality)
`void main()' -- recognizable, but not of the highest Standard.

--
Er*********@sun.com

Jun 27 '08 #52

Walter Banks

Willem wrote:

Walter wrote:
) I understand about i and the results are the same. I also understand
) that it could be optimized to be the same and many compilers would.
)
) The for and while implementations have different meaning. It is a suttle
) point that I am making essentially about the care that is needed in
) creating benchmarks like this.

I don't get your point. Could you elaborate ?

The C standard allows that a conforming compiler creates
exactly the same machine code for the two code snippets.
To me, that implies that the two have exactly the same meaning.

I would have been a lot more comfortable comparing

for (i =100 ; i 0; i--) putchar('a');

int i = 100;
do {
putchar('a');
} while (--i);

My point is this thread is on the line between
optimization and algorithm rewriting and was a simple
example encompassing both. for and while loops have
different meanings and in this specific example the
for can be rewritten as a while and the count sequence
can be changed from an up count to a down count
and produce the same results.

The change I made is an optimization test of
for loop implementation.

w..

Jun 27 '08 #53

Bartc

"santosh" <sa*********@gmail.comwrote in message
news:fu**********@registered.motzarella.org...

Bartc wrote:

>>>Just yesterday I finished porting to C, the key module of a
byte-code interpreter. Testing this on only one input program, the
results on two compilers I tried (x86 target) were:

lccwin32 3000ms with -O
gcc 3.4.5 1600ms with no optimisation (any -O switches made much
slower)
original 800ms, using tight mix of asm and compiled code (not
C).

>>>Have some avenues to explore yet, but so far disappointing.

>I've tried extra options, with little effect.

I've since looked in more detail at gcc code output. And decided to
write my main dispatcher in ASM! Which will also take over some of the
simplest operations (this is a byte-code interpreter).

If you want heavy optimisations the choice is between gcc and intel (or
perhaps another compiler like DigitalMars).

I think Intel would know their own CPU pretty well.

Can't use DigitalMars (DMC) yet because my project is still in hybrid stage
and it doesn't like my NASM/OBJ files. But it picked up a few errors missed
by gcc and lccwin, eg. typing:

p; instead of p();

was missed by these two but picked up by DMC.

jacob navia will be (I'm
sure) the first to admit the lcc-win32 doesn't do all the optimisations
under the Sun. He has admitted in the past to do peephole optimisation
though.

I'm curious enough to do more specific tests so that I can see what the
problem is. On another input file, lccwin was 40% slower than gcc. My own
compiler (for a language similar to C) performed pretty much as well as gcc,
and my code generator is simplistic (it doesn't even convert a*16 to a<<4
for example). So something strange going on I think, in my program or in
lccwin.

>... It's still surprising to hear that
the C version is about twice as slow as your assembler one. Is the
speed gain of the assembler code spread out uniformly over the source
code or is it confined to one or more specific constructs/routines?

I've concentrated on integer code. On a more general input program there
would be more non-asm code being executed and the differences will be
smaller. But fast integer code is important so that the input language can
be used more widely.

But, the gcc-compiled version is still 2-3 times faster than the latest
Python, when executing the same input program.

-- Bartc

Jun 27 '08 #54

jacob navia

Bartc wrote:

"santosh" <sa*********@gmail.comwrote in message
news:fu**********@registered.motzarella.org...
>Bartc wrote:

>>>>Just yesterday I finished porting to C, the key module of a
byte-code interpreter. Testing this on only one input program, the
results on two compilers I tried (x86 target) were:
>
lccwin32 3000ms with -O
gcc 3.4.5 1600ms with no optimisation (any -O switches made much
slower)
original 800ms, using tight mix of asm and compiled code (not
C).

>>>>Have some avenues to explore yet, but so far disappointing.

>>I've tried extra options, with little effect.

I've since looked in more detail at gcc code output. And decided to
write my main dispatcher in ASM! Which will also take over some of the
simplest operations (this is a byte-code interpreter).

>If you want heavy optimisations the choice is between gcc and intel (or
perhaps another compiler like DigitalMars).

I think Intel would know their own CPU pretty well.

Can't use DigitalMars (DMC) yet because my project is still in hybrid stage
and it doesn't like my NASM/OBJ files. But it picked up a few errors missed
by gcc and lccwin, eg. typing:

p; instead of p();

was missed by these two but picked up by DMC.

>jacob navia will be (I'm
sure) the first to admit the lcc-win32 doesn't do all the optimisations
under the Sun. He has admitted in the past to do peephole optimisation
though.

I'm curious enough to do more specific tests so that I can see what the
problem is. On another input file, lccwin was 40% slower than gcc. My own
compiler (for a language similar to C) performed pretty much as well as gcc,
and my code generator is simplistic (it doesn't even convert a*16 to a<<4
for example). So something strange going on I think, in my program or in
lccwin.

If you send me the assembler output I will take a look at it.
Or you can send me the object files. Please tell me where are the
important points so that I can see where the optimizer goes wrong.

Thanks

--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Jun 27 '08 #55

Ben Bacarisse

"Bartc" <bc@freeuk.comwrites:
<snip>

Can't use DigitalMars (DMC) yet because my project is still in hybrid stage
and it doesn't like my NASM/OBJ files. But it picked up a few errors missed
by gcc and lccwin, eg. typing:

p; instead of p();

My gcc warns me about a "statement with no effect" with only the most
modest warning settings (-W).

--
Ben.

Jun 27 '08 #56

Flash Gordon

Nick Keighley wrote, On 24/04/08 12:53:

On 23 Apr, 20:15, Flash Gordon <s...@flash-gordon.me.ukwrote:

>The larger the function/module/program the more likely it is for the
compiler to beet hand coded assembler

club it to death with a root vegetable?

Yes, just as soon as the program dereferences a null pointer at run
time. You know how dangerous undefined behaviour can be.
--
Flash Gordon

Jun 27 '08 #57

Similar topics