Curious about loop optimization C++ - assembly

Egbert Nierop $MVP for IIS$

Hi,

Out of curiousity, I sometimes look at the produced assembly after
compilation in release mode.

What you often see, is that CPP, always fully addresses registers to copy
values from a to b...

While stosb,stosw, stosd etc and the same for movs[x] are one statement, and
internally use registers ESI and EDI (source, destination) to copy data.

This seems (imho) more efficient, however, CPP never uses this construct...
it always uses a lot more instructions.

imagine this loop (I simplified the idea, of course, memcpy would be
normally used)
DWORD anArray [10000];

// copy array while skipping uneven element positions

for (int mycounter=5000; mycounter != 0; mycounter--, element+=2)
anArray[element] = somesource[element];
could be optimized to

setup source and destination

MOV EDI, [anArray]
MOV ESI, [somesource]
MOV ECX, myCounter
DEC ECX
CLD // forward copy

mylabel:
MOVSD <--- actual loop and copy instruction
LOOP mylabel <-- decrement ECX until ECX == 0
Q: is the mentioned construct, simply not so efficient or is there a reason
the C++ compiler team decided not to try to optimize to this level?

Mar 20 '06 #1

Subscribe Post Reply

1699

Carl Daniel [VC++ MVP]

Egbert Nierop (MVP for IIS) wrote:

Hi,

Out of curiousity, I sometimes look at the produced assembly after
compilation in release mode.

What you often see, is that CPP, always fully addresses registers to
copy values from a to b...

While stosb,stosw, stosd etc and the same for movs[x] are one
statement, and internally use registers ESI and EDI (source,
destination) to copy data.
This seems (imho) more efficient, however, CPP never uses this
construct... it always uses a lot more instructions.

imagine this loop (I simplified the idea, of course, memcpy would be
normally used)
DWORD anArray [10000];

// copy array while skipping uneven element positions

for (int mycounter=5000; mycounter != 0; mycounter--, element+=2)
anArray[element] = somesource[element];
could be optimized to

setup source and destination

MOV EDI, [anArray]
MOV ESI, [somesource]
MOV ECX, myCounter
DEC ECX
CLD // forward copy

mylabel:
MOVSD <--- actual loop and copy instruction
LOOP mylabel <-- decrement ECX until ECX == 0
Q: is the mentioned construct, simply not so efficient or is there a
reason the C++ compiler team decided not to try to optimize to this
level?

The LOOP and MOVS instructions are horribly slow on modern CPUs because they
don't make effective use of the deep pipeline in the CPU. The longer
instruction sequence actually executes many times faster.

IIRC, VC++ did generate LOOP/MOVS years ago (VC1-4 maybe?), but has gone
away from using those constructs since maybe the Pentium.

-cd

Mar 20 '06 #2

Egbert Nierop $MVP for IIS$

"Carl Daniel [VC++ MVP]" <cp*****************************@mvps.org.nospam >
wrote in message news:el**************@TK2MSFTNGP11.phx.gbl...

Egbert Nierop (MVP for IIS) wrote:
Hi,

DWORD anArray [10000];

// copy array while skipping uneven element positions

for (int mycounter=5000; mycounter != 0; mycounter--, element+=2)
anArray[element] = somesource[element];
could be optimized to

setup source and destination

MOV EDI, [anArray]
MOV ESI, [somesource]
MOV ECX, myCounter
DEC ECX
CLD // forward copy

mylabel:
MOVSD <--- actual loop and copy instruction
LOOP mylabel <-- decrement ECX until ECX == 0
Q: is the mentioned construct, simply not so efficient or is there a
reason the C++ compiler team decided not to try to optimize to this
level?

The LOOP and MOVS instructions are horribly slow on modern CPUs because
they don't make effective use of the deep pipeline in the CPU. The longer
instruction sequence actually executes many times faster.

Interesting!

This seems to prove the remark of some C++ / ASM programmer somewhere on the
web. He stated that he could not optimize code anymore better than C++ did.
Normally, I tend to think 'ok, so he was not up to the task' but I seemed
wrong (again :-) ) .

Mar 20 '06 #3

Egbert Nierop $MVP for IIS$

"Carl Daniel [VC++ MVP]" <cp*****************************@mvps.org.nospam >
wrote in message news:el**************@TK2MSFTNGP11.phx.gbl...

Egbert Nierop (MVP for IIS) wrote:
Hi,

Nope,
I've beaten the C++ optimization by 25% (by testing 100MB !) but this might
be true on a ATHLON 64, not for other CPUS possibly...

Anyway, you were right, that one cannot state, the less ASM instructions,
the faster!
ps: Function below is not meant to 'decode' be for real (it skips unicode
coding). Just for fun...

void __stdcall AnsiToBstr(PCSTR ansi, BSTR bstr, int writtenLen)
{
//#ifdef _M_IX86
DWORD ticks = GetTickCount();
__asm XOR AH, AH // just to clear the high part of our unicode char (= 2
bytes)
__asm MOV ECX, writtenLen // initialize our loop
__asm DEC ECX // our loop counter
__asm MOV EDI, [bstr] // destination index
__asm MOV ESI, [ansi] // source index
__asm labell:
__asm MOV AL, BYTE PTR [ESI] // copy a string byte
__asm MOV [EDI], AX
__asm INC EDI
__asm INC EDI
__asm INC ESI
__asm DEC ECX
__asm JNZ labell

//#else
wprintf(L"%d\n", GetTickCount() - ticks);
ticks = GetTickCount();

for (int loopit =
writtenLen - 1;
loopit != 0;
loopit--, bstr++, ansi++)
bstr[0] = ansi[0];
wprintf(L"%d\n", GetTickCount() - ticks);

//#endif
}

Mar 20 '06 #4

by: John Edwards | last post by:

Hello, I have sort of a newbie question. I'm trying to optimize a loop by breaking it into two passes. Example: for(i = 0; i < max; i++) {

C / C++

loop performance question

by: pertheli | last post by:

Hello, I have a large array of pointer to some object. I have to run test such that every possible pair in the array is tested. eg. if A,B,C,D are items of the array, possible pairs are AB, AC,...

C / C++

strlen in a for loop with malloc-ed char*

by: apropo | last post by:

what is wrong with this code? someone told me there is a BAD practice with that strlen in the for loop, but i don't get it exactly. Could anyone explain me in plain english,please? char...

C / C++

For-loop expression?

by: Dave Veeneman | last post by:

In a for-loop, is a calculated expression re-calculated on each pass through the loop, or only once, when the loop is initialized? For example, assume the following loop: for (int i = 0; i <...

C# / C Sharp

Lacking for-loop optimization in C#

by: MariusI | last post by:

I stumbled over an optimization (or lack of one, to be specific) when viewing IL opcodes generated by the compiler using ms .net 2003. I was testing fast pixel manipulation using Bitmap.LockBits...

C# / C Sharp

Endless Loop in C-Code when Using /Og

by: Bev in TX | last post by:

We are using Visual Studio .NET 2003. When using that compiler, the following example code goes into an endless loop in the "while" loop when the /Og optimization option is used: #include...

.NET Framework

102

removing a loop cause it to go at half the speed?

by: tom fredriksen | last post by:

Hi I was doing a simple test of the speed of a "maths" operation and when I tested it I found that removing the loop that initialises the data array for the operation caused the whole program to...

C / C++

Slow For-Loop (Picture Data Access)

by: hufaunder | last post by:

I have 16-bit data that I want to display. In order to display it I compress a certain range of the input data into 8 bit (I need control over this). All seems to work ok except that it is dead...

C# / C Sharp

break statement in a for loop

by: a.mil | last post by:

I am programming for code-speed, not for ansi or other nice-guy stuff and I encountered the following problem: When I have a for loop like this: b=b0; for (a=0,i=0;i<100;i++,b--) { if (b%i)...

C / C++

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Curious about loop optimization C++ - assembly

Similar topics