On Mar 27, 8:41Â*am, p...@see.signature.invalid (Pierre Asselin) wrote:
Noma...@gmail.com <Noma...@gmail.comwrote:
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??
Any advice??
I think you can quickly learn enough Fortran at a superficial level
to do some number crunching. Â*Just be good and use "implicit none".
Since you already know C, I would say: write your first project in C
but learn enough Fortran to translate it. Â*Make sure the answers
are the same, then race the two and post the results here :-)
Seriously. Â*Fortran compilers can optimize more aggressively than
C because the language semantics are different. Â*C99 plugs this
gap (mostly) with the "restrict" qualifier, but I don't know how
that plays out in practice and I would love to see data.
Using:
http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90
Via:
dcorbit@DCORBIT64 /f/tmp
$ g95 -O3 -Wall test_fpu.f90
In file test_fpu.f90:83
91 FORMAT (A,I4,2('/',I2.2))
1
Warning (110): Label 91 at (1) defined but not used
In file test_fpu.f90:2293
INTEGER :: i , info , j , l , ncola , nrowa , nrowb
1
Warning (112): Variable 'ncola' at (1) is set but never used
test_fpu.f90: In function 'dtrmv_':
test_fpu.f90:3611: warning: 'kx' may be used uninitialized in this
function
dcorbit@DCORBIT64 /f/tmp
$ ./a
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 15.1 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 7.4 sec Err= 0.000000000000000
Test3 - Crout 2 (1001x1001) inverts 13.8 sec Err= 0.000000000000005
Test4 - Lapack 2 (1001x1001) inverts 10.0 sec Err= 0.000000000000417
total = 46.3 sec
dcorbit@DCORBIT64 /f/tmp
$ g95 --version
G95 (GCC 4.1.2 (g95 0.91!) Mar 21 2007)
Copyright (C) 2002-2005 Free Software Foundation, Inc.
G95 comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of G95
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
Using:
/*
* --------------------------------------------------------------
* TEST_FPU A number-crunching benchmark using matrix inversion.
* Implemented by: David Frank
Da********@aol.com
* Gauss routine by: Tim Prince
N8**@aol.com
* Crout routine by: James Van Buskirk
to****@ix.netcom.com
* F90->C source by: Sergey N. Voronkov
se**@ggd.nsu.ru
* Pointer exchange version by: Dieter Buerssner
bu***@gmx.de
* --------------------------------------------------------------
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>
/*
* Compiling with NI = 1001 (default) generates pool(51,51,1000) =
20mb.
* If test system pages due to insufficient memory (disk i/o activity
* occurs), abort run and compile with NI = 200, benchmark will
adjust
* time x 5.
*/
#define NI 1001
#define NN 51
#define RK8 double
/* below are additional C routines supplied by translator */
void memflt()
{
fputs("Memory allocation error\n", stderr);
exit(EXIT_FAILURE);
}
void alloc_arrays(RK8 ** p[NI], RK8 *** a, RK8 *** b)
{
int i,
j;
for (i = 0; i < NI; i++) {
if ((p[i] = (RK8 **) malloc(NN * sizeof(RK8 *))) == NULL)
memflt();
for (j = 0; j < NN; j++)
if ((p[i][j] = (RK8 *) malloc(NN * sizeof(RK8))) == NULL)
memflt();
}
if ((*a = (RK8 **) malloc(NN * sizeof(RK8 *))) == NULL ||
(*b = (RK8 **) malloc(NN * sizeof(RK8 *))) == NULL)
memflt();
for (i = 0; i < NN; i++)
if (((*a)[i] = (RK8 *) malloc(NN * sizeof(RK8))) == NULL ||
((*b)[i] = (RK8 *) malloc(NN * sizeof(RK8))) == NULL)
memflt();
}
void random_number(RK8 ** pool[NI])
{
int i,
j,
k;
for (i = 0; i < NI; i++)
for (j = 0; j < NN; j++)
for (k = 0; k < NN; k++)
pool[i][j][k] = (RK8) (rand()) / RAND_MAX;
}
RK8
timesec()
{
return (RK8) (clock()) / CLOCKS_PER_SEC;
}
/* prototype the invert functions that follow exec source */
void Gauss(RK8 ** a, RK8 ** b, int n);
void Crout(RK8 ** a, RK8 ** b, int n);
int rgaussi(RK8 ** a, RK8 ** b, int n);
int main()
{
RK8 **pool[NI]; /* pool of matrices to invert */
RK8 **a,
**ai; /* working matrices use < 256k */
RK8 avg_err,
total_time,
time1;
int i,
j,
n;
char *revision = "01/10/98"; /* Gauss speedup mod */
char invert_id[3][8] =
{
"Gauss", "Crout", "Dieter"};
struct tm *ptm;
time_t crtime;
FILE *fp;
/* Begin by allocating matrix arrays */
alloc_arrays(pool, &a, &ai);
puts("Benchmark running, hopefully as only ACTIVE task");
if ((fp = fopen("test_fpc.dat", "w")) == NULL) {
fprintf(stderr, "Can't open output file!\n");
return EXIT_FAILURE;
}
crtime = time(NULL);
ptm = gmtime(&crtime);
fprintf(fp, "Date run = %2d/%2d/%2d\n",
ptm->tm_mon + 1, ptm->tm_mday, ptm->tm_year);
fputs("Pls supply info below, send to
Da********@aol.com\n"
"Tester name/ net address = \n"
"CPU mfg/id/Mhz = \n"
"MEM/CACHE = \n"
"O.S. / COMPILER = \n"
"Additional comments = \n\n\n\n\n", fp);
fprintf(fp, "Results for %s revision using TEST_FPU.C \n",
revision);
srand(time(NULL)); /* set seed to random number based on time */
random_number(pool); /* fill pool with random data ( 0. -1. ) */
for (n = 0; n < 3; n++) { /* for Gauss,Crout algorithms */
time1 = timesec(); /* start benchmark n time */
for (i = 0; i < NI; i++) {
/* get next matrix to invert */
for (j = 0; j < NN; j++)
memcpy(a[j], pool[i][j], sizeof(RK8) * NN);
switch (n) {
case 0:
Gauss(a, ai, NN); /* invert a -ai ; destructs a */
Gauss(ai, a, NN); /* invert ai -a */
break;
case 1:
Crout(a, ai, NN); /* invert a -ai ; nondestructs a
*/
Crout(ai, a, NN); /* invert ai -a */
break;
case 2:
rgaussi(a, ai, NN); /* invert a -ai ; nondestructs a
*/
rgaussi(ai, a, NN); /* invert ai -a */
break;
}
}
total_time = timesec() - time1; /* = elapsed time sec. */
/* check accuracy last matrix invert. */
avg_err = 0;
for (i = 0; i < NN; i++)
for (j = 0; j < NN; j++)
avg_err += fabs(a[i][j] - pool[NI - 1][i][j]);
if (NI == 200)
fprintf(fp, "\n%s 5 x 200 x 2 inverts = %6.1f sec.\n",
invert_id[n], 5 * total_time);
else
fprintf(fp, "\n%s 1000 x 2 inverts = %6.1f sec.\n",
invert_id[n], total_time);
fputs("Accuracy of 2 computed numbers\n", fp);
fprintf(fp, "Original = %18.15f %18.15f\n",
pool[NI - 1][0][0], pool[NI - 1][NN - 1][NN - 1]);
fprintf(fp, "Computed = %18.15f %18.15f\n",
a[0][0], a[NN - 1][NN - 1]);
fprintf(fp, "Avg Err. = %18.15f\n", avg_err / (NN * NN));
} /* for Gauss,Crout algorithms */
puts("Results written to: TEST_FPC.DAT");
return EXIT_SUCCESS;
}
/*
* --------------------------------------
* Invert matrix a -b by Gauss method
* --------------------------------------
*/
void Gauss(RK8 ** a, RK8 ** b, int n)
{
RK8 d,
temp = 0,
c;
int i,
j,
k,
m,
nn,
*ipvt;
if ((ipvt = (int *) malloc(n * sizeof(int))) == NULL)
memflt();
nn = n;
for (i = 0; i < nn; i++)
ipvt[i] = i;
for (k = 0; k < nn; k++) {
temp = 0.;
m = k;
for (i = k; i < nn; i++) {
d = a[k][i];
if (fabs(d) temp) {
temp = fabs(d);
m = i;
}
}
if (m != k) {
j = ipvt[k];
ipvt[k] = ipvt[m];
ipvt[m] = j;
for (j = 0; j < nn; j++) {
temp = a[j][k];
a[j][k] = a[j][m];
a[j][m] = temp;
}
}
d = 1 / a[k][k];
for (j = 0; j < k; j++) {
c = a[j][k] * d;
for (i = 0; i < nn; i++)
a[j][i] -= a[k][i] * c;
a[j][k] = c;
}
for (j = k + 1; j < nn; j++) {
c = a[j][k] * d;
for (i = 0; i < nn; i++)
a[j][i] -= a[k][i] * c;
a[j][k] = c;
}
for (i = 0; i < nn; i++)
a[k][i] = -a[k][i] * d;
a[k][k] = d;
}
for (i = 0; i < nn; i++)
memcpy(b[ipvt[i]], a[i], sizeof(RK8) * nn);
free(ipvt);
}
/*
* --------------------------------------
* Invert matrix a -b by Crout method
* --------------------------------------
*/
void Crout(RK8 ** a, RK8 ** b, int n)
{
int i,
j; /* Current row & column */
int maxlok; /* Location of maximum pivot */
int *index; /* Partial pivot record */
RK8 *temp = 0,
the_max;
RK8 tmp,
*ptr;
RK8 *matr = 0;
int k,
ind,
ind2;
if ((index = (int *) malloc(n * sizeof(int))) == NULL ||
(temp = (RK8 *) malloc(n * sizeof(RK8))) == NULL ||
(matr = (RK8 *) malloc(n * n * sizeof(RK8))) == NULL)
memflt();
/* Initialize everything */
for (i = 0; i < n; i++)
index[i] = i;
/* Shuffle matrix */
for (j = 0; j < n; j++) {
for (i = 0; i < j; i++)
b[j][i] = a[j][i];
for (i = j; i < n; i++)
b[j][i] = a[i - j][n - j - 1];
}
/* LU decomposition; reciprocals of diagonal elements in L matrix */
for (j = 0; j < n; j++) {
/* Get current column of L matrix */
for (i = j; i < n; i++) {
tmp = 0;
ind = n - i - 1;
for (k = 0; k < j; k++)
tmp += b[ind][ind + k] * b[j][k];
b[ind][ind + j] -= tmp;
}
maxlok = 0;
the_max = fabs(b[0][j]);
for (i = 1; i < n - j; i++)
if (fabs(b[i][j + i]) >= the_max) {
the_max = fabs(b[i][j + i]);
maxlok = i;
}
/* det = det*b(j+maxlok-1,maxlok) */
b[maxlok][j + maxlok] = 1 / b[maxlok][j + maxlok];
/* Swap biggest element to current pivot position */
if (maxlok + 1 != n - j) {
ind = n - maxlok - 1;
ind2 = index[j];
index[j] = index[ind];
index[ind] = ind2;
for (k = n - maxlok; k < n; k++) {
tmp = b[k][j];
b[k][j] = b[k][ind];
b[k][ind] = tmp;
}
memcpy(temp, &(b[maxlok][maxlok]), sizeof(RK8) * (n -
maxlok));
ptr = &(b[n - j - 1][n - j - 1]);
memmove(&(b[maxlok][maxlok]), ptr, sizeof(RK8) * (j + 1));
for (k = j + 1; k < n - maxlok; k++)
b[maxlok][maxlok + k] = b[k][j];
memcpy(ptr, temp, (j + 1) * sizeof(RK8));
for (k = j + 1; k < n - maxlok; k++)
b[k][j] = temp[k];
}
/* Get current row of U matrix */
ind = n - j - 1;
for (i = j + 1; i < n; i++) {
tmp = 0.;
for (k = 0; k < j; k++)
tmp += b[i][k] * b[ind][ind + k];
b[i][j] = b[ind][n - 1] * (b[i][j] - tmp);
}
} /* END DO LU_outer */
/* Invert L matrix */
for (j = 0; j < n - 1; j++) {
temp[0] = b[n - j - 1][n - 1];
for (i = j + 1; i < n; i++) {
ind = n - i - 1;
tmp = 0.;
for (k = 0; k < i - j; k++)
tmp += temp[k] * b[ind][ind + j + k];
b[ind][ind + j] = -tmp * b[ind][n - 1];
temp[i - j] = b[ind][ind + j];
}
}
/* Reshuffle matrix */
for (i = 0; i < (n + 1) / 3; i++) {
memcpy(temp, &(b[i][2 * (i + 1) - 1]), sizeof(RK8) * (n + 2 -
3 * (i +1)));
for (j = 2 * (i + 1) - 1; j < n - i; j++)
b[i][j] = b[n - j - 1][n - j + i - 1];
ind = n - i - 1;
for (j = i; j < n + 1 - 2 * (i + 1); j++)
b[j][i + j] = b[n - i - j - 1][ind];
for (k = 0; k < n + 2 - 3 * (i + 1); k++)
b[i + 1 + k][ind] = temp[k];
}
/* Invert U matrix */
for (i = 0; i < n - 1; i++) {
for (j = i + 1; j < n; j++) {
tmp = 0.;
for (k = 0; k < j - i - 1; k++)
tmp += temp[k] * b[j][i + 1 + k];
b[j][i] = -b[j][i] - tmp;
temp[j - i - 1] = b[j][i];
}
}
/* Multiply inverses in reverse order */
for (i = 0; i < n - 1; i++) {
for (k = 0; k < n - i - 1; k++)
temp[k] = b[i + 1 + k][i];
for (j = 0; j <= i; j++) {
tmp = 0.;
for (k = 0; k < n - i - 1; k++)
tmp += temp[k] * b[j][i + 1 + k];
b[j][i] += tmp;
}
for (j = i + 1; j < n; j++) {
tmp = 0.;
for (k = j; k < n; k++)
tmp += temp[k - i - 1] * b[j][k];
b[j][i] = tmp;
}
}
/* Straighten out the columns of the result */
for (i = 0; i < n; i++)
memcpy(matr + n * i, b[i], sizeof(RK8) * n);
for (i = 0; i < n; i++)
memcpy(b[index[i]], matr + n * i, sizeof(RK8) * n);
free(index);
free(temp);
free(matr);
}
/*
** This routine is due to
bu***@gmx.de (Dieter Buerssner)
** Destroys a, return 0: success, 1: zero pivot, 2: out of mem.
*/
int rgaussi(RK8 ** a, RK8 ** b, int n)
{
int i,
j,
k,
maxj,
t;
RK8 maxel,
pivinv,
tmaxel,
aji;
RK8 *tp,
*ai,
*aj;
/* C99: int ipiv[n]; */
int *ipiv = malloc(n * sizeof *ipiv);
if (ipiv == NULL)
return 2;
for (i = 0; i < n; i++)
ipiv[i] = i;
for (i = 0; i < n; i++) {
maxj = -1;
maxel = 0.0;
/* find pivot element */
for (j = i; j < n; j++)
if ((tmaxel = fabs(a[j][i])) maxel) {
maxj = j;
maxel = tmaxel;
}
if (maxj < 0) {
free(ipiv);
return 1;
}
/* exchange rows */
if (maxj != i) {
/* just exchange pointers for a */
tp = a[maxj];
a[maxj] = a[i];
a[i] = tp;
t = ipiv[maxj];
ipiv[maxj] = ipiv[i];
ipiv[i] = t;
}
ai = a[i];
pivinv = 1.0 / ai[i];
ai[i] = 1.0;
for (k = 0; k < n; k++)
ai[k] *= pivinv;
for (j = 0; j < n; j++) {
if (j != i) {
aj = a[j];
aji = aj[i];
aj[i] = 0.0;
for (k = 0; k < n; k++)
aj[k] -= aji * ai[k];
}
}
}
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
b[i][ipiv[j]] = a[i][j];
free(ipiv);
return 0;
}
via:
CL /Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /
D "_UNICODE" /D "UNICODE" /FD /MD /Zp16 /fp:fast /Fo"Release\\" /
Fd"Release\vc80.pdb" /W4 /nologo /c /Wp64 /Zi /Gr /TP /wd4996 /
errorReport:prompt
Date run = 3/27/107
Pls supply info below, send to
Da********@aol.com
Tester name/ net address = {Dann Corbit/dc*****@connx.com}
CPU mfg/id/Mhz =
CPU Identification utility v1.11 (c) 1997-2005 Jan
Steunebrink
────────────────┠€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€ ────────────────┠€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
CPU Vendor and Model: AMD Athlon 64 2800+-3700+
Internal CPU speed : 2199.4 MHz (using internal Time Stamp Counter)
Clock Multiplier : Available only in Real Mode!
CPU-ID Vendor string: AuthenticAMD
CPU-ID Name string : AMD Athlon(tm) 64 Processor 3400+
CPU-ID Signature : 0F4A
│││└─ Stepping or sub-model no.
││└─ Model: Indicates CPU Model and 486 L1
cache mode
│└─ Family: 4=486, Am5x86,Cx5x86
│ 5=Pentium, Nx586, Cx6x86, K5/K6,
C6, mP6
│ 6=PentiumPro/II/III, CxMII/III,
Athlon, C3
│ F=Pentium4, Athlon64
└─ Type: 0=Standard, 1=Overdrive, 2=2nd Dual
Pentium
Current CPU mode : Protected
Internal (L1) cache : Enabled in Write-Back mode
MEM/CACHE = 2GB physical RAM
O.S. / COMPILER = Windows 2003 / Microsoft (R) 32-bit C/C++ Optimizing
Compiler Version 14.00.50727.42 for 80x86
Additional comments =
Results for 01/10/98 revision using TEST_FPU.C
Gauss 1000 x 2 inverts = 0.4 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528701 0.364574114200264
Avg Err. = 0.000000000000001
Crout 1000 x 2 inverts = 0.7 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528701 0.364574114200266
Avg Err. = 0.000000000000003
Dieter 1000 x 2 inverts = 0.4 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528703 0.364574114200264
Avg Err. = 0.000000000000002
We need a Fortran Guru to show me what I am doing wrong, because there
is no feasible explanation for this other than I do not know how to
get performance out of my Fortran compiler.