473,432 Members | 1,696 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,432 software developers and data experts.

optimizers are overrated

Optimizers are overrated

I started learning ASM not long ago to improve my understanding of the
hardware architecture and my ability to optimize C code. The results of my
first experiment were surprising to say at least. After reading the chapter
on loops in my ASM book I wanted to test whether modern C compilers are
actually as smart as commonly claimed. I chose a most simple loop: calling
putchar 100 times. The first function (foo) uses a typical C style loop to
test the assumption that "the compiler will optimize that better than any
human could". The second function (bar) is based my newly gained knowledge,
the loop is basically ASM written in C. Now I am certain if I asked here
which one is more efficient all you guys would reply "the compiler will most
likely generate the same code in both cases" (I have read such claims
countless times here). Well, look at the ASM output below to see how wrong
your assumption is.

/* C Code */

void foo(void)
{
int i;

for (i = 0; i < 100; i++) putchar('a');
}
void bar(void)
{
int i = 100;

do {
putchar('a');
} while (--i);
}

As I said, damn simple. No nasty side effects, no access to global
variables, etc. The optimizer has no excuses. It should generate optimial
code in both cases. But see the result:
/* GCC 4.3.0 on x86/Windows, -O2 */

/* foo */
L7:
subl $12, %esp
pushl $97
call _putchar

incl %ebx
addl $16, %esp
cmpl $100, %ebx
jne L7
/* bar */
L2:
subl $12, %esp
pushl $97
call _putchar
addl $16, %esp

decl %ebx
jne L2
Comment: See, even the most recent version of the probably most widely used
compiler can not correctly optimize a most simple loop! At least GCC
understood the bar loop, so my "write C like ASM" optimization worked.

At this point you might wonder what horrible things an average C compiler
will do when GCC already fails so badly. Here is the gruesome result:
/* lccwin32, optimize on */

/* foo */
_$4:
pushl $97
call _putchar
popl %ecx

incl %edi
cmpl $100,%edi
jl _$4
/* bar */
_$10:
pushl $97
call _putchar
popl %ecx

movl %edi,%eax
decl %eax
movl %eax,%edi
or %eax,%eax
jne _$10
Comment: lcc is unable to optimize the loop just like GCC, but it adds
insults to injury by actually generating worse code for the ASM-style loop!
So you cannot even optimize the loop yourself!
/* MS Visual C++ 6 /O2 */

For this compiler I had to replace the putchar call with a call to a custom
my_putchar function otherwise the compiler replaces the putchar calls with
direct OS API stuff. While this is a good optimization it is not the
subject of this test, and only makes the resulting asm harder to read, so I
supressed that.
/* foo */

jmp SHORT $L833
$L834:
mov eax, DWORD PTR _i$[ebp]
add eax, 1
mov DWORD PTR _i$[ebp], eax
$L833:
cmp DWORD PTR _i$[ebp], 100
jge SHORT $L835

push 97
call _my_putchar
add esp, 4

jmp SHORT $L834
$L835:
/* bar */

$L840:
push 97
call _my_putchar
add esp, 4

mov eax, DWORD PTR _i$[ebp]
sub eax, 1
mov DWORD PTR _i$[ebp], eax
cmp DWORD PTR _i$[ebp], 0
jne SHORT $L840
Comment: Amazingly enough, this compiler has found yet another way to screw
up. Would you have thought that each compiler generates different code for
such a simple construct?
I hope you agree that the compiler of the beast deserves the award "Worst of
Show" for this mess. Are MS compilers still this bad?


Jun 27 '08
56 2350
In article <48***************@bytecraft.com>,
Walter Banks <wa****@bytecraft.comwrote:
>The for and while implementations have different meaning.
No, they have the same meaning. Used in other contexts - in
particular, where the loop variable is used afterwards - they
would have different meanings.

-- Richard
--
:wq
Jun 27 '08 #51
Charlton Wilbur wrote:
>>>>>"ES" == Eric Sosman <Er*********@sun.comwrites:

ES<off-topic>

ESA quartet of kazooists, my teacher among
ESthem, then gave the first "original instruments" performance
ESof this long-misunderstood masterpiece of, er, casuistry.

Did they seriously perform something from Der Kunst der Fuge on kazoo?
They did, indeed. "Daaaah, daaaah, daaaah, daaaah, daaaah,
daah daah daaaaaah, dah dah dah daaaaaah," and so on.
That takes an incredible amount of skill to get reasonably right.
"Reasonably right?" Where did *that* notion come from?
They were about as right as (violent wrench towards topicality)
`void main()' -- recognizable, but not of the highest Standard.

--
Er*********@sun.com
Jun 27 '08 #52


Willem wrote:
Walter wrote:
) I understand about i and the results are the same. I also understand
) that it could be optimized to be the same and many compilers would.
)
) The for and while implementations have different meaning. It is a suttle
) point that I am making essentially about the care that is needed in
) creating benchmarks like this.

I don't get your point. Could you elaborate ?

The C standard allows that a conforming compiler creates
exactly the same machine code for the two code snippets.
To me, that implies that the two have exactly the same meaning.
I would have been a lot more comfortable comparing

for (i =100 ; i 0; i--) putchar('a');

int i = 100;
do {
putchar('a');
} while (--i);

My point is this thread is on the line between
optimization and algorithm rewriting and was a simple
example encompassing both. for and while loops have
different meanings and in this specific example the
for can be rewritten as a while and the count sequence
can be changed from an up count to a down count
and produce the same results.

The change I made is an optimization test of
for loop implementation.

w..

Jun 27 '08 #53

"santosh" <sa*********@gmail.comwrote in message
news:fu**********@registered.motzarella.org...
Bartc wrote:
>>>Just yesterday I finished porting to C, the key module of a
byte-code interpreter. Testing this on only one input program, the
results on two compilers I tried (x86 target) were:

lccwin32 3000ms with -O
gcc 3.4.5 1600ms with no optimisation (any -O switches made much
slower)
original 800ms, using tight mix of asm and compiled code (not
C).
>>>Have some avenues to explore yet, but so far disappointing.
>I've tried extra options, with little effect.

I've since looked in more detail at gcc code output. And decided to
write my main dispatcher in ASM! Which will also take over some of the
simplest operations (this is a byte-code interpreter).
If you want heavy optimisations the choice is between gcc and intel (or
perhaps another compiler like DigitalMars).
I think Intel would know their own CPU pretty well.

Can't use DigitalMars (DMC) yet because my project is still in hybrid stage
and it doesn't like my NASM/OBJ files. But it picked up a few errors missed
by gcc and lccwin, eg. typing:

p; instead of p();

was missed by these two but picked up by DMC.
jacob navia will be (I'm
sure) the first to admit the lcc-win32 doesn't do all the optimisations
under the Sun. He has admitted in the past to do peephole optimisation
though.
I'm curious enough to do more specific tests so that I can see what the
problem is. On another input file, lccwin was 40% slower than gcc. My own
compiler (for a language similar to C) performed pretty much as well as gcc,
and my code generator is simplistic (it doesn't even convert a*16 to a<<4
for example). So something strange going on I think, in my program or in
lccwin.
>... It's still surprising to hear that
the C version is about twice as slow as your assembler one. Is the
speed gain of the assembler code spread out uniformly over the source
code or is it confined to one or more specific constructs/routines?
I've concentrated on integer code. On a more general input program there
would be more non-asm code being executed and the differences will be
smaller. But fast integer code is important so that the input language can
be used more widely.

But, the gcc-compiled version is still 2-3 times faster than the latest
Python, when executing the same input program.

-- Bartc


Jun 27 '08 #54
Bartc wrote:
"santosh" <sa*********@gmail.comwrote in message
news:fu**********@registered.motzarella.org...
>Bartc wrote:
>>>>Just yesterday I finished porting to C, the key module of a
byte-code interpreter. Testing this on only one input program, the
results on two compilers I tried (x86 target) were:
>
lccwin32 3000ms with -O
gcc 3.4.5 1600ms with no optimisation (any -O switches made much
slower)
original 800ms, using tight mix of asm and compiled code (not
C).
>>>>Have some avenues to explore yet, but so far disappointing.
>>I've tried extra options, with little effect.

I've since looked in more detail at gcc code output. And decided to
write my main dispatcher in ASM! Which will also take over some of the
simplest operations (this is a byte-code interpreter).
>If you want heavy optimisations the choice is between gcc and intel (or
perhaps another compiler like DigitalMars).

I think Intel would know their own CPU pretty well.

Can't use DigitalMars (DMC) yet because my project is still in hybrid stage
and it doesn't like my NASM/OBJ files. But it picked up a few errors missed
by gcc and lccwin, eg. typing:

p; instead of p();

was missed by these two but picked up by DMC.
>jacob navia will be (I'm
sure) the first to admit the lcc-win32 doesn't do all the optimisations
under the Sun. He has admitted in the past to do peephole optimisation
though.

I'm curious enough to do more specific tests so that I can see what the
problem is. On another input file, lccwin was 40% slower than gcc. My own
compiler (for a language similar to C) performed pretty much as well as gcc,
and my code generator is simplistic (it doesn't even convert a*16 to a<<4
for example). So something strange going on I think, in my program or in
lccwin.
If you send me the assembler output I will take a look at it.
Or you can send me the object files. Please tell me where are the
important points so that I can see where the optimizer goes wrong.

Thanks

--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
Jun 27 '08 #55
"Bartc" <bc@freeuk.comwrites:
<snip>
Can't use DigitalMars (DMC) yet because my project is still in hybrid stage
and it doesn't like my NASM/OBJ files. But it picked up a few errors missed
by gcc and lccwin, eg. typing:

p; instead of p();
My gcc warns me about a "statement with no effect" with only the most
modest warning settings (-W).

--
Ben.
Jun 27 '08 #56
Nick Keighley wrote, On 24/04/08 12:53:
On 23 Apr, 20:15, Flash Gordon <s...@flash-gordon.me.ukwrote:
>The larger the function/module/program the more likely it is for the
compiler to beet hand coded assembler

club it to death with a root vegetable?
Yes, just as soon as the program dereferences a null pointer at run
time. You know how dangerous undefined behaviour can be.
--
Flash Gordon
Jun 27 '08 #57

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.