473,386 Members | 1,795 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Curious about loop optimization C++ - assembly

Hi,

Out of curiousity, I sometimes look at the produced assembly after
compilation in release mode.

What you often see, is that CPP, always fully addresses registers to copy
values from a to b...

While stosb,stosw, stosd etc and the same for movs[x] are one statement, and
internally use registers ESI and EDI (source, destination) to copy data.

This seems (imho) more efficient, however, CPP never uses this construct...
it always uses a lot more instructions.

imagine this loop (I simplified the idea, of course, memcpy would be
normally used)
DWORD anArray [10000];

// copy array while skipping uneven element positions

for (int mycounter=5000; mycounter != 0; mycounter--, element+=2)
anArray[element] = somesource[element];
could be optimized to

setup source and destination

MOV EDI, [anArray]
MOV ESI, [somesource]
MOV ECX, myCounter
DEC ECX
CLD // forward copy

mylabel:
MOVSD <--- actual loop and copy instruction
LOOP mylabel <-- decrement ECX until ECX == 0
Q: is the mentioned construct, simply not so efficient or is there a reason
the C++ compiler team decided not to try to optimize to this level?

Mar 20 '06 #1
3 1699
Egbert Nierop (MVP for IIS) wrote:
Hi,

Out of curiousity, I sometimes look at the produced assembly after
compilation in release mode.

What you often see, is that CPP, always fully addresses registers to
copy values from a to b...

While stosb,stosw, stosd etc and the same for movs[x] are one
statement, and internally use registers ESI and EDI (source,
destination) to copy data.
This seems (imho) more efficient, however, CPP never uses this
construct... it always uses a lot more instructions.

imagine this loop (I simplified the idea, of course, memcpy would be
normally used)
DWORD anArray [10000];

// copy array while skipping uneven element positions

for (int mycounter=5000; mycounter != 0; mycounter--, element+=2)
anArray[element] = somesource[element];
could be optimized to

setup source and destination

MOV EDI, [anArray]
MOV ESI, [somesource]
MOV ECX, myCounter
DEC ECX
CLD // forward copy

mylabel:
MOVSD <--- actual loop and copy instruction
LOOP mylabel <-- decrement ECX until ECX == 0
Q: is the mentioned construct, simply not so efficient or is there a
reason the C++ compiler team decided not to try to optimize to this
level?


The LOOP and MOVS instructions are horribly slow on modern CPUs because they
don't make effective use of the deep pipeline in the CPU. The longer
instruction sequence actually executes many times faster.

IIRC, VC++ did generate LOOP/MOVS years ago (VC1-4 maybe?), but has gone
away from using those constructs since maybe the Pentium.

-cd
Mar 20 '06 #2

"Carl Daniel [VC++ MVP]" <cp*****************************@mvps.org.nospam >
wrote in message news:el**************@TK2MSFTNGP11.phx.gbl...
Egbert Nierop (MVP for IIS) wrote:
Hi,

DWORD anArray [10000];

// copy array while skipping uneven element positions

for (int mycounter=5000; mycounter != 0; mycounter--, element+=2)
anArray[element] = somesource[element];
could be optimized to

setup source and destination

MOV EDI, [anArray]
MOV ESI, [somesource]
MOV ECX, myCounter
DEC ECX
CLD // forward copy

mylabel:
MOVSD <--- actual loop and copy instruction
LOOP mylabel <-- decrement ECX until ECX == 0
Q: is the mentioned construct, simply not so efficient or is there a
reason the C++ compiler team decided not to try to optimize to this
level?


The LOOP and MOVS instructions are horribly slow on modern CPUs because
they don't make effective use of the deep pipeline in the CPU. The longer
instruction sequence actually executes many times faster.


Interesting!

This seems to prove the remark of some C++ / ASM programmer somewhere on the
web. He stated that he could not optimize code anymore better than C++ did.
Normally, I tend to think 'ok, so he was not up to the task' but I seemed
wrong (again :-) ) .
Mar 20 '06 #3

"Carl Daniel [VC++ MVP]" <cp*****************************@mvps.org.nospam >
wrote in message news:el**************@TK2MSFTNGP11.phx.gbl...
Egbert Nierop (MVP for IIS) wrote:
Hi,


Nope,
I've beaten the C++ optimization by 25% (by testing 100MB !) but this might
be true on a ATHLON 64, not for other CPUS possibly...

Anyway, you were right, that one cannot state, the less ASM instructions,
the faster!
ps: Function below is not meant to 'decode' be for real (it skips unicode
coding). Just for fun...

void __stdcall AnsiToBstr(PCSTR ansi, BSTR bstr, int writtenLen)
{
//#ifdef _M_IX86
DWORD ticks = GetTickCount();
__asm XOR AH, AH // just to clear the high part of our unicode char (= 2
bytes)
__asm MOV ECX, writtenLen // initialize our loop
__asm DEC ECX // our loop counter
__asm MOV EDI, [bstr] // destination index
__asm MOV ESI, [ansi] // source index
__asm labell:
__asm MOV AL, BYTE PTR [ESI] // copy a string byte
__asm MOV [EDI], AX
__asm INC EDI
__asm INC EDI
__asm INC ESI
__asm DEC ECX
__asm JNZ labell

//#else
wprintf(L"%d\n", GetTickCount() - ticks);
ticks = GetTickCount();

for (int loopit =
writtenLen - 1;
loopit != 0;
loopit--, bstr++, ansi++)
bstr[0] = ansi[0];
wprintf(L"%d\n", GetTickCount() - ticks);

//#endif
}
Mar 20 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: John Edwards | last post by:
Hello, I have sort of a newbie question. I'm trying to optimize a loop by breaking it into two passes. Example: for(i = 0; i < max; i++) {
3
by: pertheli | last post by:
Hello, I have a large array of pointer to some object. I have to run test such that every possible pair in the array is tested. eg. if A,B,C,D are items of the array, possible pairs are AB, AC,...
33
by: apropo | last post by:
what is wrong with this code? someone told me there is a BAD practice with that strlen in the for loop, but i don't get it exactly. Could anyone explain me in plain english,please? char...
8
by: Dave Veeneman | last post by:
In a for-loop, is a calculated expression re-calculated on each pass through the loop, or only once, when the loop is initialized? For example, assume the following loop: for (int i = 0; i <...
10
by: MariusI | last post by:
I stumbled over an optimization (or lack of one, to be specific) when viewing IL opcodes generated by the compiler using ms .net 2003. I was testing fast pixel manipulation using Bitmap.LockBits...
13
by: Bev in TX | last post by:
We are using Visual Studio .NET 2003. When using that compiler, the following example code goes into an endless loop in the "while" loop when the /Og optimization option is used: #include...
102
by: tom fredriksen | last post by:
Hi I was doing a simple test of the speed of a "maths" operation and when I tested it I found that removing the loop that initialises the data array for the operation caused the whole program to...
20
by: hufaunder | last post by:
I have 16-bit data that I want to display. In order to display it I compress a certain range of the input data into 8 bit (I need control over this). All seems to work ok except that it is dead...
26
by: a.mil | last post by:
I am programming for code-speed, not for ansi or other nice-guy stuff and I encountered the following problem: When I have a for loop like this: b=b0; for (a=0,i=0;i<100;i++,b--) { if (b%i)...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.