473,320 Members | 1,879 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Performance Question ...

Hi,

I've a list of objects. I iterate the list and read the value of each
object for many operation like:

x = myList[i].value1 + myList[i].value2
etc.

My question:
Is it efficient to always use myList[i] or should I get the pointer to
the current element and use this instead like

currentElement->value1 ...

Is it low-performance to always let the list return the i's element?

Thank you
Konrad
Jun 27 '08 #1
8 1596
Konrad Muhler wrote:
Hi,

I've a list of objects. I iterate the list and read the value of each
object for many operation like:

x = myList[i].value1 + myList[i].value2
etc.
So you don't have a std::list of objects as std::list has no op[] (BECAUSE
such an op[] would be too expensive). What does your list::op[] do? What is
the algorithmic complexity of it?
My question:
Is it efficient to always use myList[i] or should I get the pointer to
the current element and use this instead like
Depends on your imlementation of list. If your list is actually something
like a std::vector then an op[] is fast enough. If your data set is much
larger than your data cache then that will be a performance bottleneck no
matter if you use references/pointers/iterators to elements or a fast op[].
currentElement->value1 ...

Is it low-performance to always let the list return the i's element?
Have you benchmarked/profiled it? What results does it have on your
particular combination of platform, implementation, compiler (version),
optimization flags?

Acording to that make your decision.

--
Dizzy

Jun 27 '08 #2
Konrad Mühler wrote:
I've a list of objects. I iterate the list and read the value of each
object for many operation like:
x = myList[i].value1 + myList[i].value2
etc.
My question:
Is it efficient to always use myList[i] or should I get the pointer to
the current element and use this instead like
currentElement->value1 ...
Is it low-performance to always let the list return the i's element?
Thats depends on the compiler. The newer gcc's will
optimize the differences away, whereas older ones
(and maybe Visual Studio) will not.

try it:

==>
#include <cstdlib>
#include <cstring>
#include <ctime>
#include <iostream>

struct listitem { int value1, value2, value3; };

int test_array(listitem *list, int nitems)
{
int x=0;
for(int j=0; j<nitems; ++j)
for(int i=0; i<nitems; ++i)
x += (list[i].value1 - list[j].value1) + (list[i].value2 - list[j].value2);
return x;
}

int test_pointer(listitem *list, int nitems)
{
int x=0;
for(listitem *pj=list; pj<list+nitems; ++pj)
for(listitem *pi=list; pi<list+nitems; ++pi)
x += (pi->value1 - pj->value1) + (pi->value2 - pj->value2);
return x;
}

double timethis(const char *s, int(*p)(listitem*,int),listitem *list, int nitems)
{
std::cout << s << std::endl;
time_t t=clock();
int x = p(list, nitems); // fool optimizer! (x is always zero)
return double(clock() - t + x)/CLOCKS_PER_SEC;
}

int main(int argc, char*argv[])
{
int nitems = argv[1] ? atoi(argv[1]) : 20000;
listitem *list = new listitem[nitems];
memset(list, 0x1A, sizeof(listitem)*nitems);

double ta=0, tp=0;
tp += timethis("pointer", test_pointer, list, nitems);
ta += timethis("array", test_array, list, nitems);
tp += timethis("pointer", test_pointer, list, nitems);
ta += timethis("array", test_array, list, nitems);

std::cout << "length: " << nitems << std::endl
<< "array: " << ta << " sec" << std::endl
<< "pointer: " << tp << " sec" << std::endl;

delete [] list;
return 0;
}
<==

Regards

Mirco

Jun 27 '08 #3
In article <g2**********@nserver.hrz.tu-freiberg.de>, wa***@chemie.uni-
halle.de says...
Konrad Mühler wrote:
I've a list of objects. I iterate the list and read the value of each
[ ... ]
Thats depends on the compiler. The newer gcc's will
optimize the differences away, whereas older ones
(and maybe Visual Studio) will not.
This depends heavily upon interpretation. Some people only use "list" to
mean "linked list". In that case, I find it difficult to believe that a
compiler is capable of optimizing out the differences between subscript-
style notation and an explicit iterator (a pointer, in this case).

Other people use "list" to mean almost any sort of linear data
structure. In this case, (e.g. using an array) optimizing away the
difference between array-style and pointer-style access has been routine
among nearly all compilers for years -- all versions of "Visual Studio"
(and versions of Microsoft C++ and Microsoft C since long before they
started using the "Visual" brand) have known how to do this kind of
optimization.

Likewise, while there may have been _some_ early version of gcc that
couldn't do it, I doubt that version was ever in wide use -- I'm pretty
sure by the time it was labeled "version 1.0", it could do this.
try it:
[ code using dynamically allocated array elided ... ]

Just keep in mind that this may not respond to the question being asked
by the OP -- it all depends on what he means by "list".

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 27 '08 #4
Jerry Coffin wrote:
In article <g2**********@nserver.hrz.tu-freiberg.de>, wa***@chemie.uni-
halle.de says...
>Thats depends on the compiler. The newer gcc's will
optimize the differences away, whereas older ones
(and maybe Visual Studio) will not.

This depends heavily upon interpretation. Some people only use "list" to
mean "linked list". In that case, I find it difficult to believe that a
compiler is capable of optimizing out the differences between subscript-
style notation and an explicit iterator (a pointer, in this case).
Right. And this was my interpretation of the OP's list.
How does this:
>>x = myList[i].value1 + myList[i].value2
look like in your eyes?
Other people use "list" to mean almost any sort of linear data
structure. In this case, (e.g. using an array) optimizing away the
difference between array-style and pointer-style access has been routine
among nearly all compilers for years -- all versions of "Visual Studio"
(and versions of Microsoft C++ and Microsoft C since long before they
started using the "Visual" brand) have known how to do this kind of
optimization.
Not entirely true. The code posted by me *will* show significant
differences for pointer vs. array formulation between gcc 3.x and
4.x *and* between VC++6 and VC++9. I tested it ;-)
[ code using dynamically allocated array elided ... ]
Just keep in mind that this may not respond to the question being asked
by the OP -- it all depends on what he means by "list".
OK, I'm not absolutely sure (but so aren't you) ;-)

From interpreting the OP's code, this seems (at last
for me) a valid interpretation.

Regards & Thanks

Mirco
Jun 27 '08 #5
In article <g2**********@nserver.hrz.tu-freiberg.de>, wa***@chemie.uni-
halle.de says...

[ ... ]
Right. And this was my interpretation of the OP's list.
How does this:
>x = myList[i].value1 + myList[i].value2

look like in your eyes?
This looks like it requires operator[] to be valid for whatever sort of
things myList is -- but that doesn't tell us a thing. Overloading
operator[] for a linked list is trivial.
Other people use "list" to mean almost any sort of linear data
structure. In this case, (e.g. using an array) optimizing away the
difference between array-style and pointer-style access has been routine
among nearly all compilers for years -- all versions of "Visual Studio"
(and versions of Microsoft C++ and Microsoft C since long before they
started using the "Visual" brand) have known how to do this kind of
optimization.

Not entirely true. The code posted by me *will* show significant
differences for pointer vs. array formulation between gcc 3.x and
4.x *and* between VC++6 and VC++9. I tested it ;-)
I glanced through your code, and frankly I wasn't much impressed. First
of all, what you timed was substantially different from what the OP
asked about. Second, you called the functions through a pointer
(repeatedly) and included the time to call the function via a pointer in
the overall time for the function. Third, you included the first run
through the data in the time for executing the code -- on the first run,
the data will frequently be in main memory, but on the second and
subsequent runs, it's likely to be in the cache, so the first run will
often be substantially slower than the second and subsequent runs.

Here's a bit of code that gets rid of at least those specific problems:

#include <cstdlib>
#include <cstring>
#include <ctime>
#include <iostream>
#include <vector>
#include <algorithm>

struct listitem { int value1, value2; };

listitem gen_rand() {
listitem ret;
ret.value1 = rand();
ret.value2 = rand();
return ret;
}

int test_array(listitem const *list, size_t nitems)
{
int x=0;

for(size_t i=0; i<nitems; ++i)
x += list[i].value1 + list[i].value2;
return x;
}

int test_pointer(listitem const *list, size_t nitems)
{
int x=0;
for(listitem const *pi=list; pi<list+nitems; ++pi)
x += pi->value1 + pi->value2;
return x;
}

int main(int argc, char*argv[]) {
const int rep_count = 20;
const int size=2000000;

static listitem list[size];

size_t i;

for (i=0; i<size; i++)
list[i] = gen_rand();

int val1 = test_array(list, size); // prime the cache.
int val2 = test_pointer(list, size);

clock_t time1 = 0;

// do the real tests
for (i=0; i<rep_count; i++) {
clock_t start1 = clock();
val1 = test_array(list, size);
clock_t stop1 = clock();
time1 += stop1 - start1;
}

clock_t time2 = 0;

for (i=0; i<rep_count; i++) {
clock_t start2 = clock();
val2 = test_pointer(list, size);
clock_t stop2 = clock();
time2 += stop2-start2;
}

std::cout << "Array value: " << val1;
std::cout << "\nPointer value: " << val2;

std::cout << "\nArray Time: " << time1;
std::cout << "\nPointer Time : " << time2;

return 0;
}

Compiled with VC++ 6.0, using the default "Release" settings for a
console application, the results I get from a few runs of this are:

C:\C\source\Release>test_ptr
Array value: 1141859694
Pointer value: 1141859694
Array Time: 110
Pointer Time : 93
C:\C\source\Release>test_ptr
Array value: 1141859694
Pointer value: 1141859694
Array Time: 94
Pointer Time : 109
C:\C\source\Release>test_ptr
Array value: 1141859694
Pointer value: 1141859694
Array Time: 109
Pointer Time : 94
C:\C\source\Release>test_ptr
Array value: 1141859694
Pointer value: 1141859694
Array Time: 94
Pointer Time : 109

The only differences we're seeing are of a single clock tick, and
there's no apparent bias in either direction. For all practical
purpsoses the two are identical in speed. Looking at the assembly
language produced for each shows that it's not just for all practical
purpuposes:

test_array:
; _list$ = ecx
; _nitems$ = edx
; Line 18
push esi
; Line 19
xor eax, eax
; Line 21
xor esi, esi
test edx, edx
jbe SHORT $L10248
push edi
npad 6
$L10246:
; Line 22
mov edi, DWORD PTR [ecx+esi*8+4]
add edi, DWORD PTR [ecx+esi*8]
add esi, 1
add eax, edi
cmp esi, edx
jb SHORT $L10246
pop edi
$L10248:
pop esi
; Line 24
ret 0

test_pointer:
; _list$ = ecx
; _nitems$ = edx
; Line 29
lea edx, DWORD PTR [ecx+edx*8]
xor eax, eax
cmp ecx, edx
jae SHORT $L10257
push esi
npad 6
$L10255:
; Line 30
mov esi, DWORD PTR [ecx+4]
add esi, DWORD PTR [ecx]
add ecx, 8
add eax, esi
cmp ecx, edx
jb SHORT $L10255
pop esi
$L10257:
; Line 32
ret 0

I also did a quick test by stripping the source code down to the parts
we care about:

struct listitem { int value1, value2; };

int test_array(listitem const *list, int nitems)
{
int x=0;
int i;
for(i=0; i<nitems; ++i)
x += list[i].value1 + list[i].value2;
return x;
}

int test_pointer(listitem const *list, int nitems)
{
int x=0;
listitem const *pi;
for(pi=list; pi<list+nitems; ++pi)
x += pi->value1 + pi->value2;
return x;
}

and then compiled the result with MS C/C++ 7.0 (the predecessor of MS
Visual C++ 1.0). Here's the assembly language it produces for the two
inner loops:

test_array:
$F344:
; Line 8
mov ax,WORD PTR [bx-2]
add ax,WORD PTR [bx]
add cx,ax
add bx,4
dec dx
jne $F344

test_pointer:
$F355:
; Line 17
mov ax,WORD PTR [si+2]
add ax,WORD PTR [si]
add dx,ax
add si,4
cmp WORD PTR [bp-4],si ;list
ja $F355

In this case, there may be a small difference between the two -- but
it's the array version that's faster. Even on a machine from that time
frame, the difference would be too small to care much about though --
given the memory constraints of MS-DOS, the time difference for the
largest array you could create would still have been less than a
millisecond.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 27 '08 #6
Hi Jerry,

thanks for your meaningful response.

Jerry Coffin wrote:
In article <g2**********@nserver.hrz.tu-freiberg.de>, wa***@chemie.uni-
halle.de says...
>How does this:
>>>>x = myList[i].value1 + myList[i].value2
look like in your eyes?

This looks like it requires operator[] to be valid for whatever sort of
things myList is -- but that doesn't tell us a thing. Overloading
operator[] for a linked list is trivial.
OK
>>[...] optimizing away the
difference between array-style and pointer-style access has been routine
among nearly all compilers for years -- all versions of "Visual Studio"
(and versions of Microsoft C++ and Microsoft C since long before they
started using the "Visual" brand) have known how to do this kind of
optimization.
Not entirely true. The code posted by me *will* show significant
differences for pointer vs. array formulation between gcc 3.x and
4.x *and* between VC++6 and VC++9. I tested it ;-)

I glanced through your code, and frankly I wasn't much impressed.
What the ... ;-)
First of all, what you timed was substantially different from what the OP
asked about. Second, you called the functions through a pointer
(repeatedly) and included the time to call the function via a pointer in
the overall time for the function. Third, you included the first run
through the data in the time for executing the code -- on the first run,
the data will frequently be in main memory, but on the second and
subsequent runs, it's likely to be in the cache, so the first run will
often be substantially slower than the second and subsequent runs.
All three points are somehow unfounded, imho.
1) what was timed was simply interleaved sequential access to
a block of data that is not too trivial regarding L2
access of the underlying architecture. To take out one
"dimension", as you did, also makes things too simple
for any optimizing compiler and wouldn't support my view ... ,
2) calling through pointer was done 2x per test scenario, which
has to be compared with the NxN loop calculation in the function,
that leaves us at a error ratio of 2 / (20,000 x 20,000) (only) if
we assume the indirect call to take as much cycles as the inner-loop
calculation (which it obviously doesn't),
3) this does't really count because the data block initialization is
done immediately before the first test function invocation. BTW,
due to the "indirect call dispatcher design", this was easy to
add(and I added it - it didn't change any results, see enclosed source)
Here's a bit of code that gets rid of at least those specific problems:
#include <cstdlib>
[snipped]
...
struct listitem { int value1, value2; };
...
Why not to include some other data item in between?

struct listitem { int value1, dummy, value2; };
int test_array(listitem const *list, size_t nitems)
{
int x=0;

for(size_t i=0; i<nitems; ++i)
x += list[i].value1 + list[i].value2;
return x;
}
This is just a sequential block access which can
be translated to assembler using only one index
register, one counter, and one accumulator.
[snipped]
std::cout << "Array value: " << val1;
std::cout << "\nPointer value: " << val2;

std::cout << "\nArray Time: " << time1;
std::cout << "\nPointer Time : " << time2;
On any of my machines, your code (using your
prameters) runs just through within a few clock
cycles. But I see your point. On such simple
constructs, almost any compiler can deduce
the optimal access sequence and can easily
prevent repeated index offset multiplications.
Compiled with VC++ 6.0, using the default "Release" settings for a
console application, the results I get from a few runs of this are:
...
[snipped]
ohh, this is nice:
test_array:
... $L10257:
mov edi, DWORD PTR [ecx+esi*8+4]
add edi, DWORD PTR [ecx+esi*8]
add esi, 1
add eax, edi
cmp esi, edx
jb SHORT $L10246
...
and
test_pointer:
... $L10255:
mov esi, DWORD PTR [ecx+4]
add esi, DWORD PTR [ecx]
add ecx, 8
add eax, esi
cmp ecx, edx
jb SHORT $L10255
...
This makes my point absolutely clear, the
array access is *not* optimized away by
your compiler (aybe this is the asm of the
"debug" settings?). If the above is really
faster an a mychine, thats a problem of
the instruction decoding/reordering or
whatever.
I also did a quick test by stripping the source code down to the parts
we care about:
[simplified code snipped]
...
and then compiled the result with MS C/C++ 7.0 (the predecessor of MS
Visual C++ 1.0). Here's the assembly language it produces for the two
inner loops:
OK
test_array:
$F344:
; Line 8
mov ax,WORD PTR [bx-2]
add ax,WORD PTR [bx]
add cx,ax
add bx,4
dec dx
jne $F344

test_pointer:
$F355:
; Line 17
mov ax,WORD PTR [si+2]
add ax,WORD PTR [si]
add dx,ax
add si,4
cmp WORD PTR [bp-4],si ;list
ja $F355
Yes, thats simply using the adjacent positions
of struct members:

mov ax, WORD PTR [bx-2]
add ax, WORD PTR [bx]

and advancing the pointer into the data block
by the struct size (16 bit ints?)

add bx,4

Thie compiler obviously did convert the
"array access" nto a "pointer access".
In this case, there may be a small difference between the two -- but
it's the array version that's faster. Even on a machine from that time
frame, the difference would be too small to care much about though --
given the memory constraints of MS-DOS, the time difference for the
largest array you could create would still have been less than a
millisecond.
Lets see what the new gcc 4.3.1 thinks about the inner loop
if full optimization is set (g++-4.3 -O3 -dA -c stuff.cxx -S):

[array]
..L6:
# basic block 3
addl 4(%ecx,%edx,8), %eax # displacement(base, index, scale)
addl (%ecx,%edx,8), %eax
addl $1, %edx
cmpl %edx, %ebx
jg .L6

[pointer]
..L14:
# basic block 3
addl 4(%edx), %eax
addl (%edx), %eax
addl $8, %edx
cmpl %ecx, %edx
jb .L14

What do we see here? The "array-loop" uses a different
addressing scheme (indirect register w/base and displacement)
compared to the "pointer-loop". This proves my point again,
the pointer loop will be optimized into simpler and
shorter machine instructions. Due to modern processor
architectures and instruction translation/reordering,
one can't deduce from the above code which is faster -
but this might *possibly* pop up on some architectures.

I modified my code and did rerun systematical comparisons
on some boxes:

--- 8< -----------------------------------------------------------

[C2Q-9300/3.33GHz gcc-4.3.1 -O3 -march=pentiumpro]
array: 1.1 sec pointer: 0.91 sec arr/ptr: 120.879%

[C2Q-9300/3.33GHz icc-10.1 -fast]
array: 1.1 sec pointer: 1.09 sec arr/ptr: 100.917%

[C2Q-6600/3.15GHz gcc-4.3.1 -O3 -march=pentiumpro]
array: 1.09 sec pointer: 0.91 sec arr/ptr: 119.78%

[C2Q-6600/3.15GHz icc-10.1 -fast]
array: 1.1 sec pointer: 1.11 sec arr/ptr: 99.0991%

[P4/2.6GHz gcc-4.3.1 -O3 -march=pentiumpro]
array: 1.61 sec pointer: 1.74 sec arr/ptr: 92.5287%

[P4/2.6GHz icc 10.1 -O3 -xN]
array: 1.97 sec pointer: 1.97 sec arr/ptr: 100%

[Athlon64/2.3GHz VC++6 /O2 /Zp4]
array: 1.875 sec pointer: 1.922 sec arr/ptr: 97.5546%

[Athlon64/2.3GHz VC++9 /O2 /Oy /Ot /Zp4]
array: 2.376 sec pointer: 1.89 sec arr/ptr: 125.714%

[Athlon64/2.3GHz gcc-mingw-3.4.2 -O3 -march=pentiumpro]
array: 3.656 sec pointer: 3.297 sec arr/ptr: 110.889%

--- 8< -----------------------------------------------------------

(using the code in the appendix). The *same* program shows
different behaviour eg. on P4 vs. Core2Q, obviously
due to the different chip-internals ...

Regards

Mirco

code, modified: ==>

#include <cstdlib>
#include <cstring>
#include <ctime>
#include <iostream>

struct listitem { int value1, dummy, value2; };

double timethis(const char*s, int(*p)(listitem*,int), listitem *list, int n)
{
std::cout << s << ' ' << std::flush; // print some process message ...
time_t t=clock(); // the indirect call does not contribute to run-time,
int x = p(list, n); // it's just a dispatcher (and prevents inlining)
return double(clock()-t + x)/CLOCKS_PER_SEC; // fool optimizer! (x == zero)
}

int test_array(listitem *list, int nitems)
{
int x=0;
for(int j=0; j<nitems; ++j)
for(int i=0; i<nitems; ++i)
x += (list[i].value1-list[j].value1) + (list[i].value2-list[j].value2);
return x;
}

int test_pointer(listitem *list, int nitems)
{
int x=0;
for(listitem *pj=list; pj<list+nitems; ++pj)
for(listitem *pi=list; pi<list+nitems; ++pi)
x += (pi->value1-pj->value1) + (pi->value2-pj->value2);
return x;
}

int main(int argc, char*argv[])
{
int nitems = argv[1] ? atoi(argv[1]) : 22222;
listitem *list = new listitem[nitems];
memset(list, 0x1A, sizeof(listitem)*nitems);
timethis("warmup", test_array, list, nitems);

double ta=0, tp=0;
tp += timethis("pointer", test_pointer, list, nitems);
ta += timethis("array", test_array, list, nitems);
tp += timethis("pointer", test_pointer, list, nitems);
ta += timethis("array", test_array, list, nitems);

std::cout << "length: " << nitems << std::endl
<< "array: " << ta << " sec" << '\t'
<< "pointer: " << tp << " sec" << '\t'
<< "arr/ptr: " << (ta/tp)*100. << "%" << std::endl;

delete [] list;
return 0;
}
<==
Jun 27 '08 #7
In article <g2***********@nserver.hrz.tu-freiberg.de>, wa***@chemie.uni-
halle.de says...

[ ... ]
All three points are somehow unfounded, imho.
1) what was timed was simply interleaved sequential access to
a block of data that is not too trivial regarding L2
access of the underlying architecture. To take out one
"dimension", as you did, also makes things too simple
for any optimizing compiler and wouldn't support my view ... ,
The OP asked a fairly specific question, including a definition of the
structure of interest. Adding more to that structure simply because the
one he cares about doesn't "support your view" seems disingenuous at
best.
2) calling through pointer was done 2x per test scenario, which
has to be compared with the NxN loop calculation in the function,
that leaves us at a error ratio of 2 / (20,000 x 20,000) (only) if
we assume the indirect call to take as much cycles as the inner-loop
calculation (which it obviously doesn't),
The problem isn't entirely with the indirection itself, as with the fact
that when making an indirect call, the compiler often can't (or at least
doesn't) carry out optimizations that it would carry out otherwise. If
the function would normally be called indirectly, that's fine and
wonderful -- but there's no indication that would be the case here. As
such, the testing method may be having a substantial effect on the
result.
3) this does't really count because the data block initialization is
done immediately before the first test function invocation. BTW,
due to the "indirect call dispatcher design", this was easy to
add(and I added it - it didn't change any results, see enclosed source)
Here's a bit of code that gets rid of at least those specific problems:
#include <cstdlib>
[snipped]
...
struct listitem { int value1, value2; };
...
Why not to include some other data item in between?
Because the OP specified exactly this structure as being what he cares
about.
struct listitem { int value1, dummy, value2; };
int test_array(listitem const *list, size_t nitems)
{
int x=0;

for(size_t i=0; i<nitems; ++i)
x += list[i].value1 + list[i].value2;
return x;
}

This is just a sequential block access which can
be translated to assembler using only one index
register, one counter, and one accumulator.
Sure -- and it corresponds directly to what the OP said he cared about.

[ ... ]
This makes my point absolutely clear, the
array access is *not* optimized away by
your compiler (aybe this is the asm of the
"debug" settings?). If the above is really
faster an a mychine, thats a problem of
the instruction decoding/reordering or
whatever.
Yes and no -- I decided it got too long, but probably should have left
in a more detailed analysis of the code. The fact is that yes, there is
a difference in the generated code. The other fact is that there's not
even a theoretical difference in speed. The parts that are different are
between adding one each iteration, then multiplying by 8, or adding 8
either iteration. The addition is the same speed either way. The
multiplication by 8 doesn't change the amount of time either. On the
original 8088/8086, there would have been a difference -- calculating an
effective address took a variable amount of time, and the more complex a
calculation used, the longer it took. On those processors, you really
would have gained something by transforming one form to the other -- and
compilers at the time did.

On every since (at least) the 286, there has been an effective address
unit that can carry out the calculation in constant time. That being the
case, there's no difference in speed based on the complexity of the
effective address calculation -- addressing like [esi], [esi+ebx],
[esi+ebx*8] all take _exactly_ the same amount of time to generate on
every reasonably modern processor.

[ ... ]
mov ax, WORD PTR [bx-2]
add ax, WORD PTR [bx]

and advancing the pointer into the data block
by the struct size (16 bit ints?)
Yes -- I said it was old! It generated code for MS-DOS and 16-bit
Windows.
>
add bx,4

Thie compiler obviously did convert the
"array access" nto a "pointer access".
Quite true -- back when processors were sufficiently primitive that it
made a difference, they did the optimization. Nowadays, they don't
because it's an ineffective waste of time.

[ ... ]
What do we see here? The "array-loop" uses a different
addressing scheme (indirect register w/base and displacement)
compared to the "pointer-loop". This proves my point again,
the pointer loop will be optimized into simpler and
shorter machine instructions. Due to modern processor
architectures and instruction translation/reordering,
one can't deduce from the above code which is faster -
but this might *possibly* pop up on some architectures.
Pardon my being blunt, but "Nonsense". People who write compilers have
known about how to do this transformation for _years_. The reason they
don't do it (at least on the x86) is simple: it's no longer effective.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 27 '08 #8
Jerry Coffin wrote:
It's certainly true that under some circumstance or other, you might see
some difference in speed between two loops that do the same thing -- but
this remains equally true whether one uses pointer-style notation and
the other subscripting notation, or both use the same notation. So far
you've provided precisely zero real support for the claim that one will
consistently execute more efficiently than the other. Quite the
contrary, I've proven that in the one case where there really was a
provable difference, it favored the one you said should be slower.
This makes not much sense if you don't read others results and/or
conclusions thoroughly and continue to repeat your assumptions
while throwing around accusations of 'prejudice' and use improper
wording.

Bye bye & thanks for your time spent

Mirco
Jun 27 '08 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

25
by: Brian Patterson | last post by:
I have noticed in the book of words that hasattr works by calling getattr and raising an exception if no such attribute exists. If I need the value in any case, am I better off using getattr...
12
by: Fred | last post by:
Has anyone a link or any information comparing c and c++ as far as execution speed is concerned? Signal Processing algorithms would be welcome... Thanks Fred
12
by: serge | last post by:
I have an SP that is big, huge, 700-800 lines. I am not an expert but I need to figure out every possible way that I can improve the performance speed of this SP. In the next couple of weeks I...
6
by: teedilo | last post by:
We have an application with a SQL Server 2000 back end that is fairly database intensive -- lots of fairly frequent queries, inserts, updates -- the gamut. The application does not make use of...
5
by: Scott | last post by:
I have a customer that had developed an Access97 application to track their business information. The application grew significantly and they used the Upsizing Wizard to move the tables to SQL...
115
by: Mark Shelor | last post by:
I've encountered a troublesome inconsistency in the C-language Perl extension I've written for CPAN (Digest::SHA). The problem involves the use of a static array within a performance-critical...
13
by: bjarne | last post by:
Willy Denoyette wrote; > ... it > was not the intention of StrousTrup to the achieve the level of efficiency > of C when he invented C++, ... Ahmmm. It was my aim to match the performance...
13
by: Bern McCarty | last post by:
I have run an experiment to try to learn some things about floating point performance in managed C++. I am using Visual Studio 2003. I was hoping to get a feel for whether or not it would make...
7
by: Michael D. Ober | last post by:
When calling Enqueue, the internal array may need to be reallocated. My question is by how much? In the old MFC array classes, you could tell MFC how many additional elements to add to the array...
1
by: jvn | last post by:
I am experiencing a particular problem with performance counters. I have created a set of classes, that uses System.Diagnostics.PerformanceCounter to increment custom performance counters (using...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.