A.E lover wrote:
I don't understand what you mean.
when I am programming, I am not sure if we should put directly the
array's elements in to a formula or use some temporary variables
instead. I heard somewhere in C tips that this will be faster since it
doesn't need to frequently access to the arrays.
On May 4, 6:00 pm, Gianni Mariani <gi3nos...@mariani.wswrote:
>A.E lover wrote:
>>Dear all,
In C, I am wondering what codes will run faster:
Is this a trick question ?
OK. Now I understand your question better (I think).
Well, yes and no. As pointed out by another poster, the only way to
know is to profile it on the platform(s) of choice and see.
The yes - if you can minimize access to an array element, then the
compiler has less to optimize.
And no - optimizers are getting very very good. Your code above may be
vectorizable and the temporaries you create would make it harder for the
vectorizer to recognize.
The code you posted would more than likely create exactly the same
assembler code out of a modern compiler.
So lets see what the compiler generates when you send it this:
typedef struct A
{
double a[1000];
double b[1000];
} A;
void foo( A * s )
{
int i;
for (i=1;i<1000;i++)
{
s->a[i]=100*s->a[i]+300*s->b[i];
}
}
void foox( A * s )
{
int i;
double atemp;
double btemp;
double rtemp;
for (i=1;i<1000;i++)
{
atemp = s->a[i];
btemp = s->b[i];
rtemp=100*atemp+300*btemp;
s->a[i] = rtemp;
}
}
int main()
{
A s[1];
foo( s );
}
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv vvvvvvvv
gcc -O3 -S xxx_optimizer.c
gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1) Target: x86_64-redhat-linux
.file "xxx_optimizer.c"
.section .rodata.cst8,"aM",@progbits,8
.align 8
..LC0:
.long 0
.long 1079574528
.align 8
..LC1:
.long 0
.long 1081262080
.text
.p2align 4,,15
..globl foo
.type foo, @function
foo:
..LFB2:
movsd .LC0(%rip), %xmm3
xorl %eax, %eax
movsd .LC1(%rip), %xmm2
.p2align 4,,7
..L2:
movsd 8(%rdi,%rax,8), %xmm0
movsd 8008(%rdi,%rax,8), %xmm1
mulsd %xmm3, %xmm0
mulsd %xmm2, %xmm1
addsd %xmm1, %xmm0
movsd %xmm0, 8(%rdi,%rax,8)
addq $1, %rax
cmpq $999, %rax
jne .L2
rep ; ret
..LFE2:
.size foo, .-foo
.section .rodata.cst8
.align 8
..LC2:
.long 0
.long 1079574528
.align 8
..LC3:
.long 0
.long 1081262080
.text
.p2align 4,,15
..globl foox
.type foox, @function
foox:
..LFB3:
movsd .LC2(%rip), %xmm3
xorl %eax, %eax
movsd .LC3(%rip), %xmm2
.p2align 4,,7
..L9:
movsd 8(%rdi,%rax,8), %xmm0
movsd 8008(%rdi,%rax,8), %xmm1
mulsd %xmm3, %xmm0
mulsd %xmm2, %xmm1
addsd %xmm1, %xmm0
movsd %xmm0, 8(%rdi,%rax,8)
addq $1, %rax
cmpq $999, %rax
jne .L9
rep ; ret
.... snipped ....
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^
OK - so it appears to have used the vector instructions but it looks
like it's not actually vectorizing the code. As you can see however,
the code is identical for both functions.