dvum...@hotmail.com wrote:
I have C code which computes the row sums of a matrix, divide each
element of each row with the row sum and then compute the column sum
of the resulting matrix. Is there a way I can speed up the code in C:
/* Here is the code */
// Table is "wij"
int i, j;
for(i = 0; i < N; ++i)
{
for(j = 0; j < N; ++j)
{
sum_over_j_wij[i] += wij[i,j];
}
for(j = 0; j < N; ++j)
{
sum_over_i[j] += wij[i,j]/sum_over_j_wij[i)];
}
}
As others have pointed out, you are using Pascal-style subscripts
rather than C or C++ style. And you have a strange syntax error in
there too. I'm going to assume all these arrays are "doubles".
Remember, division, module and square root are all roughly the same
speed are and very slow (as a rule of thumb you should think of them as
about 10 times slower than addition.) So reducing their quantity in
your inner loops is of prime importance in this case. So, if you can
live with slight accuracy issues, the key "speeding up" consideration
is conversion of division to reciprocal multiplication:
for (i=0; i < N; i++) {
for (s=0.0,j=0; j < N; j++) s += wij[i][j];
s = 1.0 / s; /* if s == 0, you are SOL. */
for (j=0; j < N; j++) sum_over_i[j] += s*wij[i][j];
}
Some compilers are not able to hoist out the wij[i] calculation, so it
might be useful to precalculate this as double *wptr = wij[i]; and
replace the instances of wij[i] with wptr.
Certainly, using a good vectorizing compiler, such as Intel's compiler
will likely make a *huge* difference on code like this. Microsoft
claims that their latest compilers have vectorization capabilities, but
I have not verified this myself. In any event, the benefits of using
the SIMD instruction set basically goes straight to the bottom line,
especially in cases like this.
If you are on a processor which has a "multiply accumulate" (PowerPC,
Itanium, PA-RISC) instead of SIMD, you can invert the loops:
for (i=0; i < N; i++) {
for (s=0.0,j=0; j < N; j++) s += wij[i][j];
recip_sum_over_j[i] = 1.0 / s; /* if s == 0, you are SOL. */
}
for (j=0; j < N; j++) {
for (s=0.0,i=0; i < N; i++) s += recip_sum_over_j[i]*wij[i][j];
sum_over_i[j] = s;
}
So you can try each possibility, and check your compiler settings
(w.r.t SIMD and "Multiply-Accumulate") to see which one works better
for your platform.
---
Paul Hsieh
http://www.pobox.com/~qed/ http://bstring.sf.net/