Hello,
I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations
turned on.
For the following loop
R=(evaluate here); // float
N=(evaluate here); // N min=1 max=100 median=66
for (i=0;i<N;i++){
R+=A[i]*B[i]*K; // all variables are float=4 bytes
}
Q.1. Is there any advantage to having the arrays A,B,C aligned to 16 bytes ?
Q.1b. If yes, I can make them aligned (non-trivial since A[1]:A[N] is part
of a much bigger array, but I can do it), but I don't know how to tell
the compiler that I have aligned these arrays. How do I do that ?
Q.2. Is there an advantage to using arrays or pointers, eg
float *pA=A,pB=B;
for (i=0;i<N;i++){
R+=(*pA++)*(*pB++)*K; // all variables are float=4 bytes
}
Q.3. Will gcc take *K out of the loop ? (It may change the single precision
computed result, eg if R starts off much bigger than the [i] contribution.)
float RL=0;
for (i=0;i<N;i++){
RL+=A[i]*B[i]; // all variables are float=4 bytes
}
R+=(RL*K);
Thanks in advance for any help,
-rajeev- 7 2665 rr*@ieee.org (Rajeev) wrote in
news:c0**************************@posting.google.c om: I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations turned on.
For the following loop R=(evaluate here); // float N=(evaluate here); // N min=1 max=100 median=66 for (i=0;i<N;i++){ R+=A[i]*B[i]*K; // all variables are float=4 bytes }
Q.1. Is there any advantage to having the arrays A,B,C aligned to 16 bytes ?
Might be but that's not a C issue, it's platform-specific and off-topic in
comp.lang.c.
Q.1b. If yes, I can make them aligned (non-trivial since A[1]:A[N] is part of a much bigger array, but I can do it), but I don't know how to tell the compiler that I have aligned these arrays. How do I do that ?
Q.2. Is there an advantage to using arrays or pointers, eg float *pA=A,pB=B; for (i=0;i<N;i++){ R+=(*pA++)*(*pB++)*K; // all variables are float=4 bytes }
Shouldn't be but that's not a C issue, it's platform-specific and
off-topic in comp.lang.c.
Q.3. Will gcc take *K out of the loop ? (It may change the single precision computed result, eg if R starts off much bigger than the [i] contribution.)
float RL=0; for (i=0;i<N;i++){ RL+=A[i]*B[i]; // all variables are float=4 bytes } R+=(RL*K);
This is a gcc question and off-topic in comp.lang.c
--
- Mark ->
--
In <c0**************************@posting.google.com > rr*@ieee.org (Rajeev) writes: I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations turned on.
If these details are relevant to your questions, cross-posting to
comp.lang.c was a gross mistake.
Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union rr*@ieee.org (Rajeev) wrote: I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations turned on.
For the following loop R=(evaluate here); // float N=(evaluate here); // N min=1 max=100 median=66 for (i=0;i<N;i++){ R+=A[i]*B[i]*K; // all variables are float=4 bytes }
Q.1. Is there any advantage to having the arrays A,B,C aligned to 16 bytes ?
The Intel compiler might be assisted by such an alignment, because it
can use the packed SSE vector instructions to implement this
operation. I am not aware of any other x86 based compiler that can
automatically vectorize like this.
Q.1b. If yes, I can make them aligned (non-trivial since A[1]:A[N] is part of a much bigger array, but I can do it), but I don't know how to tell the compiler that I have aligned these arrays. How do I do that ?
You're probably right, you can't. Even the Intel compiler relies on
deduction to know that an array or pointer is aligned. It will not be
able to deduce it from attempts to hack the array offset to fit the
alignment.
Q.2. Is there an advantage to using arrays or pointers, eg float *pA=A,pB=B; for (i=0;i<N;i++){ R+=(*pA++)*(*pB++)*K; // all variables are float=4 bytes }
No. If there is an advantage to doing it one way or another, the
compiler should be good enough to do the transformation from one form
to the other internally.
Q.3. Will gcc take *K out of the loop ? (It may change the single precision computed result, eg if R starts off much bigger than the [i] contribution.)
float RL=0; for (i=0;i<N;i++){ RL+=A[i]*B[i]; // all variables are float=4 bytes } R+=(RL*K);
No. The compiler (regardless of which one) can't do this. This is
actually numerically different from your original loop. You need to
do this manually as shown here in order to leverage the operation
count reduction optimization. If the variables were integers, then in
theory a compiler could perform the optimization as you have done it.
--
Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/
Rajeev wrote: Q.2. Is there an advantage to using arrays or pointers, eg float *pA=A,pB=B; for (i=0;i<N;i++){ R+=(*pA++)*(*pB++)*K; // all variables are float=4 bytes }
You can simplify the loop counting.
i = N;
while (i-- != 0) {
R += *pA++ * *pB++ * K;
}
--
pete qe*@pobox.com (Paul Hsieh) wrote in message news:<79**************************@posting.google. com>...
<...> Q.3. Will gcc take *K out of the loop ? (It may change the single precision computed result, eg if R starts off much bigger than the [i] contribution.)
float RL=0; for (i=0;i<N;i++){ RL+=A[i]*B[i]; // all variables are float=4 bytes } R+=(RL*K);
No. The compiler (regardless of which one) can't do this. This is actually numerically different from your original loop. You need to do this manually as shown here in order to leverage the operation count reduction optimization. If the variables were integers, then in theory a compiler could perform the optimization as you have done it.
Paul and Pete,
Thank you both for your informative responses. Trying to do optimization
there's just so many things one can play with and try, it really helps a
non-expert like myself to get clarity on even a few issues, so I can focus
on others.
Regards,
-rajeev-
pete <pf*****@mindspring.com> wrote in message news:<41***********@mindspring.com>... i = N; while (i-- != 0) { R += *pA++ * *pB++ * K; }
Why not the following?
T = 0;
i = N;
while (i-- != 0) {
T += *pA++ * *pB++;
}
R += T * K;
kal wrote: pete <pf*****@mindspring.com> wrote in message news:<41***********@mindspring.com>...
i = N; while (i-- != 0) { R += *pA++ * *pB++ * K; }
Why not the following?
T = 0; i = N; while (i-- != 0) { T += *pA++ * *pB++; } R += T * K;
That seems fine to me.
I'll restate the original conditions:
For the following loop
R=(evaluate here); // float
N=(evaluate here); // N min=1 max=100 median=66
for (i=0;i<N;i++){
R+=A[i]*B[i]*K; // all variables are float=4 bytes
}
--
pete This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: pertheli |
last post by:
Hello,
I have a large array of pointer to some object. I have to run test
such that every possible pair in the array is tested.
eg. if A,B,C,D are items of the array,
possible pairs are AB, AC,...
|
by: Nirvana |
last post by:
Hello All,
I'm using for loop to manipulate a 2D int array of size .
The program is occupying 99% of CPU on my Win XP OS.
How to make this program to occupy less cpu ?
Cheers
|
by: Adam Warner |
last post by:
Hi all,
With this structure that records the length of an array of pointers as its
first member:
struct array {
ptrdiff_t length;
void *ptr;
};
|
by: Eric Laberge |
last post by:
Hi!
I'm working on automatically generated code, and need to assign arrays.
memcpy is an obvious solution, but it becomes complicated to use in the
context I'm working on, ie.: I could use it...
|
by: Dave Veeneman |
last post by:
In a for-loop, is a calculated expression re-calculated on each pass through
the loop, or only once, when the loop is initialized? For example, assume
the following loop:
for (int i = 0; i <...
|
by: quantumred |
last post by:
I found the following code floating around somewhere and I'd like to
get some comments.
unsigned char a1= { 5,10,15,20};
unsigned char a2= { 25,30,35,40};
*(unsigned int *)a1=*(unsigned int...
|
by: anon.asdf |
last post by:
Hi!
On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.
Is version A, version B, or version...
|
by: Francois Grieu |
last post by:
Hello, I'm asking myself all kind of questions on allocating
an array of struct with proper alignment.
Is the following code oorrect ?
I'm most interested by the statement
t =...
|
by: stevedub |
last post by:
I am having some trouble configuring my array to read from a sequential file, and then calling on that to fill an array of interests. I think I have the class set up to read the file, but when I run...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
| | |