I have found that deployed "RELEASE" versions of applications employing
matrix multiplications perform much more efficiently (1.5 to 2 X) when the
algorithm accesses the data in inner loop of the multiplication in a row by
row fashion.
For example:
For k = 1 to m : For i = 1 to n : For j = 1 to p
C(i, j) = A(i, k)*B(k, j) + C(i, j)
Next : Next : Next
Has an inner loop which accesses whole rows of B and C sequentially and
executes twice as fast a variant that accesses whole columns of A and C
sequentially:
For j = 1 to m : For k = 1 to n : For i = 1 to p
C(i, j) = A(i, k)*B(k, j) + C(i, j)
Next : Next : Next
Evidently, my arrays are stored using a row major structure.
FINALLY, my question is; Is this an artifact of
A. Windows
B. .NET
C. Processor/Memory Configuration
D. VB
E. Other Ideas?
-- mark b