Hi,
Out of curiousity, I sometimes look at the produced assembly after
compilation in release mode.
What you often see, is that CPP, always fully addresses registers to copy
values from a to b...
While stosb,stosw, stosd etc and the same for movs[x] are one statement, and
internally use registers ESI and EDI (source, destination) to copy data.
This seems (imho) more efficient, however, CPP never uses this construct...
it always uses a lot more instructions.
imagine this loop (I simplified the idea, of course, memcpy would be
normally used)
DWORD anArray [10000];
// copy array while skipping uneven element positions
for (int mycounter=5000; mycounter != 0; mycounter--, element+=2)
anArray[element] = somesource[element];
could be optimized to
setup source and destination
MOV EDI, [anArray]
MOV ESI, [somesource]
MOV ECX, myCounter
DEC ECX
CLD // forward copy
mylabel:
MOVSD <--- actual loop and copy instruction
LOOP mylabel <-- decrement ECX until ECX == 0
Q: is the mentioned construct, simply not so efficient or is there a reason
the C++ compiler team decided not to try to optimize to this level? 3 1699
Egbert Nierop (MVP for IIS) wrote: Hi,
Out of curiousity, I sometimes look at the produced assembly after compilation in release mode.
What you often see, is that CPP, always fully addresses registers to copy values from a to b...
While stosb,stosw, stosd etc and the same for movs[x] are one statement, and internally use registers ESI and EDI (source, destination) to copy data. This seems (imho) more efficient, however, CPP never uses this construct... it always uses a lot more instructions.
imagine this loop (I simplified the idea, of course, memcpy would be normally used)
DWORD anArray [10000];
// copy array while skipping uneven element positions
for (int mycounter=5000; mycounter != 0; mycounter--, element+=2) anArray[element] = somesource[element];
could be optimized to
setup source and destination
MOV EDI, [anArray] MOV ESI, [somesource] MOV ECX, myCounter DEC ECX CLD // forward copy
mylabel: MOVSD <--- actual loop and copy instruction LOOP mylabel <-- decrement ECX until ECX == 0
Q: is the mentioned construct, simply not so efficient or is there a reason the C++ compiler team decided not to try to optimize to this level?
The LOOP and MOVS instructions are horribly slow on modern CPUs because they
don't make effective use of the deep pipeline in the CPU. The longer
instruction sequence actually executes many times faster.
IIRC, VC++ did generate LOOP/MOVS years ago (VC1-4 maybe?), but has gone
away from using those constructs since maybe the Pentium.
-cd
"Carl Daniel [VC++ MVP]" <cp*****************************@mvps.org.nospam >
wrote in message news:el**************@TK2MSFTNGP11.phx.gbl... Egbert Nierop (MVP for IIS) wrote: Hi,
DWORD anArray [10000];
// copy array while skipping uneven element positions
for (int mycounter=5000; mycounter != 0; mycounter--, element+=2) anArray[element] = somesource[element];
could be optimized to
setup source and destination
MOV EDI, [anArray] MOV ESI, [somesource] MOV ECX, myCounter DEC ECX CLD // forward copy
mylabel: MOVSD <--- actual loop and copy instruction LOOP mylabel <-- decrement ECX until ECX == 0
Q: is the mentioned construct, simply not so efficient or is there a reason the C++ compiler team decided not to try to optimize to this level?
The LOOP and MOVS instructions are horribly slow on modern CPUs because they don't make effective use of the deep pipeline in the CPU. The longer instruction sequence actually executes many times faster.
Interesting!
This seems to prove the remark of some C++ / ASM programmer somewhere on the
web. He stated that he could not optimize code anymore better than C++ did.
Normally, I tend to think 'ok, so he was not up to the task' but I seemed
wrong (again :-) ) .
"Carl Daniel [VC++ MVP]" <cp*****************************@mvps.org.nospam >
wrote in message news:el**************@TK2MSFTNGP11.phx.gbl... Egbert Nierop (MVP for IIS) wrote: Hi,
Nope,
I've beaten the C++ optimization by 25% (by testing 100MB !) but this might
be true on a ATHLON 64, not for other CPUS possibly...
Anyway, you were right, that one cannot state, the less ASM instructions,
the faster!
ps: Function below is not meant to 'decode' be for real (it skips unicode
coding). Just for fun...
void __stdcall AnsiToBstr(PCSTR ansi, BSTR bstr, int writtenLen)
{
//#ifdef _M_IX86
DWORD ticks = GetTickCount();
__asm XOR AH, AH // just to clear the high part of our unicode char (= 2
bytes)
__asm MOV ECX, writtenLen // initialize our loop
__asm DEC ECX // our loop counter
__asm MOV EDI, [bstr] // destination index
__asm MOV ESI, [ansi] // source index
__asm labell:
__asm MOV AL, BYTE PTR [ESI] // copy a string byte
__asm MOV [EDI], AX
__asm INC EDI
__asm INC EDI
__asm INC ESI
__asm DEC ECX
__asm JNZ labell
//#else
wprintf(L"%d\n", GetTickCount() - ticks);
ticks = GetTickCount();
for (int loopit =
writtenLen - 1;
loopit != 0;
loopit--, bstr++, ansi++)
bstr[0] = ansi[0];
wprintf(L"%d\n", GetTickCount() - ticks);
//#endif
} This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: John Edwards |
last post by:
Hello,
I have sort of a newbie question. I'm trying to optimize a loop by
breaking it into two
passes.
Example:
for(i = 0; i < max; i++)
{
|
by: pertheli |
last post by:
Hello,
I have a large array of pointer to some object. I have to run test
such that every possible pair in the array is tested.
eg. if A,B,C,D are items of the array,
possible pairs are AB, AC,...
|
by: apropo |
last post by:
what is wrong with this code? someone told me there is a BAD practice with
that strlen in the for loop, but i don't get it exactly. Could anyone
explain me in plain english,please?
char...
|
by: Dave Veeneman |
last post by:
In a for-loop, is a calculated expression re-calculated on each pass through
the loop, or only once, when the loop is initialized? For example, assume
the following loop:
for (int i = 0; i <...
|
by: MariusI |
last post by:
I stumbled over an optimization (or lack of one, to be specific) when viewing
IL opcodes generated by the compiler using ms .net 2003. I was testing fast
pixel manipulation using Bitmap.LockBits...
|
by: Bev in TX |
last post by:
We are using Visual Studio .NET 2003. When using that compiler, the
following example code goes into an endless loop in the "while" loop when the
/Og optimization option is used:
#include...
|
by: tom fredriksen |
last post by:
Hi
I was doing a simple test of the speed of a "maths" operation and when I
tested it I found that removing the loop that initialises the data array
for the operation caused the whole program to...
|
by: hufaunder |
last post by:
I have 16-bit data that I want to display. In order to display it I
compress a certain range of the input data into 8 bit (I need control
over this). All seems to work ok except that it is dead...
|
by: a.mil |
last post by:
I am programming for code-speed, not for ansi or other nice-guy stuff
and I encountered the following problem:
When I have a for loop like this:
b=b0;
for (a=0,i=0;i<100;i++,b--) {
if (b%i)...
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |