473,395 Members | 2,010 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Loop Optimization, Array Alignment

Hello,

I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations
turned on.

For the following loop
R=(evaluate here); // float
N=(evaluate here); // N min=1 max=100 median=66
for (i=0;i<N;i++){
R+=A[i]*B[i]*K; // all variables are float=4 bytes
}

Q.1. Is there any advantage to having the arrays A,B,C aligned to 16 bytes ?

Q.1b. If yes, I can make them aligned (non-trivial since A[1]:A[N] is part
of a much bigger array, but I can do it), but I don't know how to tell
the compiler that I have aligned these arrays. How do I do that ?

Q.2. Is there an advantage to using arrays or pointers, eg
float *pA=A,pB=B;
for (i=0;i<N;i++){
R+=(*pA++)*(*pB++)*K; // all variables are float=4 bytes
}

Q.3. Will gcc take *K out of the loop ? (It may change the single precision
computed result, eg if R starts off much bigger than the [i] contribution.)

float RL=0;
for (i=0;i<N;i++){
RL+=A[i]*B[i]; // all variables are float=4 bytes
}
R+=(RL*K);

Thanks in advance for any help,

-rajeev-
Nov 14 '05 #1
7 2665
rr*@ieee.org (Rajeev) wrote in
news:c0**************************@posting.google.c om:
I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed
optimizations turned on.

For the following loop
R=(evaluate here); // float
N=(evaluate here); // N min=1 max=100 median=66
for (i=0;i<N;i++){
R+=A[i]*B[i]*K; // all variables are float=4 bytes
}

Q.1. Is there any advantage to having the arrays A,B,C aligned to 16
bytes ?
Might be but that's not a C issue, it's platform-specific and off-topic in
comp.lang.c.
Q.1b. If yes, I can make them aligned (non-trivial since A[1]:A[N] is
part
of a much bigger array, but I can do it), but I don't know how to
tell the compiler that I have aligned these arrays. How do I do
that ?

Q.2. Is there an advantage to using arrays or pointers, eg
float *pA=A,pB=B;
for (i=0;i<N;i++){
R+=(*pA++)*(*pB++)*K; // all variables are float=4 bytes
}
Shouldn't be but that's not a C issue, it's platform-specific and
off-topic in comp.lang.c.
Q.3. Will gcc take *K out of the loop ? (It may change the single
precision
computed result, eg if R starts off much bigger than the [i]
contribution.)

float RL=0;
for (i=0;i<N;i++){
RL+=A[i]*B[i]; // all variables are float=4 bytes
}
R+=(RL*K);


This is a gcc question and off-topic in comp.lang.c

--
- Mark ->
--
Nov 14 '05 #2
In <c0**************************@posting.google.com > rr*@ieee.org (Rajeev) writes:
I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations
turned on.


If these details are relevant to your questions, cross-posting to
comp.lang.c was a gross mistake.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union
Nov 14 '05 #3
rr*@ieee.org (Rajeev) wrote:
I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations
turned on.

For the following loop
R=(evaluate here); // float
N=(evaluate here); // N min=1 max=100 median=66
for (i=0;i<N;i++){
R+=A[i]*B[i]*K; // all variables are float=4 bytes
}

Q.1. Is there any advantage to having the arrays A,B,C aligned to 16 bytes ?
The Intel compiler might be assisted by such an alignment, because it
can use the packed SSE vector instructions to implement this
operation. I am not aware of any other x86 based compiler that can
automatically vectorize like this.
Q.1b. If yes, I can make them aligned (non-trivial since A[1]:A[N] is part
of a much bigger array, but I can do it), but I don't know how to tell
the compiler that I have aligned these arrays. How do I do that ?
You're probably right, you can't. Even the Intel compiler relies on
deduction to know that an array or pointer is aligned. It will not be
able to deduce it from attempts to hack the array offset to fit the
alignment.
Q.2. Is there an advantage to using arrays or pointers, eg
float *pA=A,pB=B;
for (i=0;i<N;i++){
R+=(*pA++)*(*pB++)*K; // all variables are float=4 bytes
}
No. If there is an advantage to doing it one way or another, the
compiler should be good enough to do the transformation from one form
to the other internally.
Q.3. Will gcc take *K out of the loop ? (It may change the single precision
computed result, eg if R starts off much bigger than the [i] contribution.)

float RL=0;
for (i=0;i<N;i++){
RL+=A[i]*B[i]; // all variables are float=4 bytes
}
R+=(RL*K);


No. The compiler (regardless of which one) can't do this. This is
actually numerically different from your original loop. You need to
do this manually as shown here in order to leverage the operation
count reduction optimization. If the variables were integers, then in
theory a compiler could perform the optimization as you have done it.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 14 '05 #4
Rajeev wrote:
Q.2. Is there an advantage to using arrays or pointers, eg
float *pA=A,pB=B;
for (i=0;i<N;i++){
R+=(*pA++)*(*pB++)*K; // all variables are float=4 bytes
}


You can simplify the loop counting.

i = N;
while (i-- != 0) {
R += *pA++ * *pB++ * K;
}

--
pete
Nov 14 '05 #5
qe*@pobox.com (Paul Hsieh) wrote in message news:<79**************************@posting.google. com>...
<...>
Q.3. Will gcc take *K out of the loop ? (It may change the single precision
computed result, eg if R starts off much bigger than the [i] contribution.)

float RL=0;
for (i=0;i<N;i++){
RL+=A[i]*B[i]; // all variables are float=4 bytes
}
R+=(RL*K);


No. The compiler (regardless of which one) can't do this. This is
actually numerically different from your original loop. You need to
do this manually as shown here in order to leverage the operation
count reduction optimization. If the variables were integers, then in
theory a compiler could perform the optimization as you have done it.


Paul and Pete,

Thank you both for your informative responses. Trying to do optimization
there's just so many things one can play with and try, it really helps a
non-expert like myself to get clarity on even a few issues, so I can focus
on others.

Regards,
-rajeev-
Nov 14 '05 #6
kal
pete <pf*****@mindspring.com> wrote in message news:<41***********@mindspring.com>...
i = N;
while (i-- != 0) {
R += *pA++ * *pB++ * K;
}


Why not the following?

T = 0;
i = N;
while (i-- != 0) {
T += *pA++ * *pB++;
}
R += T * K;
Nov 14 '05 #7
kal wrote:

pete <pf*****@mindspring.com> wrote in message news:<41***********@mindspring.com>...
i = N;
while (i-- != 0) {
R += *pA++ * *pB++ * K;
}


Why not the following?

T = 0;
i = N;
while (i-- != 0) {
T += *pA++ * *pB++;
}
R += T * K;


That seems fine to me.
I'll restate the original conditions:
For the following loop
R=(evaluate here); // float
N=(evaluate here); // N min=1 max=100 median=66
for (i=0;i<N;i++){
R+=A[i]*B[i]*K; // all variables are float=4 bytes
}

--
pete
Nov 14 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: pertheli | last post by:
Hello, I have a large array of pointer to some object. I have to run test such that every possible pair in the array is tested. eg. if A,B,C,D are items of the array, possible pairs are AB, AC,...
8
by: Nirvana | last post by:
Hello All, I'm using for loop to manipulate a 2D int array of size . The program is occupying 99% of CPU on my Win XP OS. How to make this program to occupy less cpu ? Cheers
10
by: Adam Warner | last post by:
Hi all, With this structure that records the length of an array of pointers as its first member: struct array { ptrdiff_t length; void *ptr; };
36
by: Eric Laberge | last post by:
Hi! I'm working on automatically generated code, and need to assign arrays. memcpy is an obvious solution, but it becomes complicated to use in the context I'm working on, ie.: I could use it...
8
by: Dave Veeneman | last post by:
In a for-loop, is a calculated expression re-calculated on each pass through the loop, or only once, when the loop is initialized? For example, assume the following loop: for (int i = 0; i <...
20
by: quantumred | last post by:
I found the following code floating around somewhere and I'd like to get some comments. unsigned char a1= { 5,10,15,20}; unsigned char a2= { 25,30,35,40}; *(unsigned int *)a1=*(unsigned int...
16
by: anon.asdf | last post by:
Hi! On a machine of *given architecture* (in terms of endianness etc.), I want to access the individual bytes of a long (*once-off*) as fast as possible. Is version A, version B, or version...
6
by: Francois Grieu | last post by:
Hello, I'm asking myself all kind of questions on allocating an array of struct with proper alignment. Is the following code oorrect ? I'm most interested by the statement t =...
1
by: stevedub | last post by:
I am having some trouble configuring my array to read from a sequential file, and then calling on that to fill an array of interests. I think I have the class set up to read the file, but when I run...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.