Is C faster than fortran?

On Mar 26, 6:15 am, "Noma...@gmail.com" <Noma...@gmail.comwrote:

Hi
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??

The two main competitors here are C++ and Fortran. Fortran has been
discussed elsethread.
See also (for instance):
http://portal.acm.org/citation.cfm?id=615718
http://www.oonumerics.org/oon/

Of course, you can do high performance work in other languages and C
is not an exception.

Mar 26 '07 #6

Lane Straatman

"user923005" <dc*****@connx.comwrote in message
news:11**********************@p77g2000hsh.googlegr oups.com...

On Mar 26, 6:15 am, "Noma...@gmail.com" <Noma...@gmail.comwrote:
>Hi
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??

The two main competitors here are C++ and Fortran. Fortran has been
discussed elsethread.
See also (for instance):
http://portal.acm.org/citation.cfm?id=615718
http://www.oonumerics.org/oon/

Of course, you can do high performance work in other languages and C
is not an exception.

Those are both c++ links. The time you lose there is not during execution,
but development. Their standard is significantly thornier than is C's.
--
LS

Mar 26 '07 #7

Malcolm McLean

<No*****@gmail.comwrote in message

I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??

It is a lot easier to build an optimising Fortran 77 compiler than an
optimising C compiler, largely because of the aliasing problem, but also
because call by reference eliminates a lot of overhead.

I have spent far too much of my time translating Fortan 77 programs to C,
because although the Fortran will crunch the numbers faster, it is much
harder to make algorithmic improvements. Then of course I have nearly twenty
years' experince with C as opposed to one and a half with Fortran.

However if you are going to do serious physics-based programming you will
have to learn Fortran some time. You can't always choose the language to
program in, for lots of reasons, technical and social.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Mar 26 '07 #8

Lane Straatman

"Richard Heathfield" <rj*@see.sig.invalidwrote in message
news:A7******************************@bt.com...

Chris Torek said:

<snip>

>As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

The question would seem to be regional. Chris Torek, IIRC, lives in
cream-based Valley, with Cream of Weber and Aggie Ice Cream.

And of course the answer depends on the quality of the chocolate and
whether real strawberries are used. It's actually getting quite
difficult to find strawberry-*flavoured* ice cream in UK supermarkets,
let alone ice cream made with actual strawberries.

You can't overlook the portability of Italian ice cream in Europe. It is
not cream-based and, in many persons' informed opinion, tastier. As a
consequence, these people ingest fewer calories and are therefore quicker to
the street vendors than their American counterparts.

Using this reasoning, I am forced to the inevitable conclusion that C is
faster, except when it isn't.

How would I put "ice cream" in an array, run through the letters as chars,
and have an integer tab to represent their total?
--
LS

Mar 26 '07 #9

Richard Tobin

In article <eu*********@news1.newsguy.com>,
Chris Torek <no****@torek.netwrote:

>As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

But the answer to that is obviously "yes", so I don't see how it
applies.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

Mar 26 '07 #10

SM Ryan

"No*****@gmail.com" <No*****@gmail.comwrote:
# Hi
# I've been thinking of learning Fortran as number crunching kinda
# language for my Physics degree......but then looking around the
# internet, people are saying that the libraries/ Algorithms once used
# for number crunching is now slowly converting into C, so do you think
# I should stick with C, since I know C already, or should I proceed
# learning fortran??

Fortran is designed to simplify optimisers; it has more potential
for running faster. What the compiler actually does is another matter.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
Title does not dictate behaviour.

Mar 26 '07 #11

On Mar 26, 12:27 pm, "Lane Straatman" <inva...@invalid.netwrote:

"user923005" <dcor...@connx.comwrote in message

news:11**********************@p77g2000hsh.googlegr oups.com...

On Mar 26, 6:15 am, "Noma...@gmail.com" <Noma...@gmail.comwrote:
Hi
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??

The two main competitors here are C++ and Fortran. Fortran has been
discussed elsethread.
See also (for instance):
http://portal.acm.org/citation.cfm?id=615718
http://www.oonumerics.org/oon/

Of course, you can do high performance work in other languages and C
is not an exception.

Those are both c++ links. The time you lose there is not during execution,
but development. Their standard is significantly thornier than is C's.

That is true, of course. However, I am not aware of any recent C work
for high performance numeric computation. Hence my comment "The two
main competitors here are C++ and Fortran."

A bugaboo when porting from Fortran to C (and vice versa):
The row/column access order that is optimal in one language is not in
the other.
So if you port a program from one language to another, there are
always lots of manual steps if you want it to run fast when you are
done.

Mar 26 '07 #12

Tim Prince

user923005 wrote:

On Mar 26, 12:27 pm, "Lane Straatman" <inva...@invalid.netwrote:

That is true, of course. However, I am not aware of any recent C work
for high performance numeric computation.

Did you ever hear of C99? Most of the performance features are
implemented in current compilers. Much as I hate to admit it, this puts
C at least even with Fortran in respect of availability of compilers
supporting latest standard features.

>
A bugaboo when porting from Fortran to C (and vice versa):
The row/column access order that is optimal in one language is not in
the other.
So if you port a program from one language to another, there are
always lots of manual steps if you want it to run fast when you are
done.

Back in the early days of C, there were plenty of automatic tools, both
commercial and open source. f2c may be as antiquated as C89, but it
avoids performance snafus on translation of Fortran array syntax.

Mar 27 '07 #13

Richard Tobin wrote:

Chris Torek <no****@torek.netwrote:

>As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

But the answer to that is obviously "yes", so I don't see how it
applies.

You forgot to bang your yes.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Mar 27 '07 #14

Nelu wrote:

Richard Heathfield <r...@see.sig.invalidwrote:
>Chris Torek said:

<snip>

>>As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

And of course the answer depends on the quality of the chocolate
and whether real strawberries are used. It's actually getting
quite difficult to find strawberry-*flavoured* ice cream in UK
supermarkets, let alone ice cream made with actual strawberries.

I work 3 miles away from Ben & Jerry's original factory. They'll
have their annual Free Cone Day in April or May. Care to taste
Strawberry Cheesecake Icecream? You can also visit Canada, it's
two hours away :-)).

Do it soon. Bush's inane passport moves are going to virtually cut
off international travel unless somebody rapidly gains sanity.
This will probably also reduce the viability of the Burlington
economic area. It's going to play hell with Calais ME.

(Note for other nationalities - the areas involved are Burlington
VT. USA and Montreal PQ Canada. All accessible by ship, Atlantic,
St. Lawrence, and Richelieu/Lake Champlain for Burlington.)

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Mar 27 '07 #15

Stephen Sprunk

"Malcolm McLean" <re*******@btinternet.comwrote in message
news:FL*********************@bt.com...

<No*****@gmail.comwrote in message
>I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??

....

However if you are going to do serious physics-based programming
you will have to learn Fortran some time. You can't always choose the
language to program in, for lots of reasons, technical and social.

There's the rub. If you're in the math or physics communities, you're not
going to get very far without at least _learning_ FORTAN. That's what a lot
of the existing code (which you'll want to modify or reuse) is going to be
in, and it may be all many of your associates know. The academic world
thrives on sharing with your peers, and if you aren't using the same
language as the people you need to share with, you're going to have a harder
time solving your problems.

When in Rome, do as the Romans do.

(Note that the above has absolutely nothing to do with technical reasons to
use one or the other or their relative speeds. You're probably better off
asking in comp.lang.fortran for that kind of stuff.)

S

--
Stephen Sprunk "Those people who think they know everything
CCIE #3723 are a great annoyance to those of us who do."
K5SSS --Isaac Asimov
--
Posted via a free Usenet account from http://www.teranews.com

Mar 27 '07 #16

Lane Straatman

"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...

Nelu wrote:
>Richard Heathfield <r...@see.sig.invalidwrote:
>>Chris Torek said:

<snip>

As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

And of course the answer depends on the quality of the chocolate
and whether real strawberries are used. It's actually getting
quite difficult to find strawberry-*flavoured* ice cream in UK
supermarkets, let alone ice cream made with actual strawberries.

I work 3 miles away from Ben & Jerry's original factory. They'll
have their annual Free Cone Day in April or May. Care to taste
Strawberry Cheesecake Icecream? You can also visit Canada, it's
two hours away :-)).

Do it soon. Bush's inane passport moves are going to virtually cut
off international travel unless somebody rapidly gains sanity.
This will probably also reduce the viability of the Burlington
economic area. It's going to play hell with Calais ME.

(Note for other nationalities - the areas involved are Burlington
VT. USA and Montreal PQ Canada. All accessible by ship, Atlantic,
St. Lawrence, and Richelieu/Lake Champlain for Burlington.)

The Bush policy is solid in this respect. What I wouldn't give to see our
secretary of state affirm this instead of prattle on with her Pollyanna Iraq
crappola.
--
LS

Mar 27 '07 #17

Nelu

On Mar 26, 8:26 pm, CBFalconer <cbfalco...@yahoo.comwrote:

Nelu wrote:
Richard Heathfield <r...@see.sig.invalidwrote:
Chris Torek said:

<snip>

>As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

And of course the answer depends on the quality of the chocolate
and whether real strawberries are used. It's actually getting
quite difficult to find strawberry-*flavoured* ice cream in UK
supermarkets, let alone ice cream made with actual strawberries.

I work 3 miles away from Ben & Jerry's original factory. They'll
have their annual Free Cone Day in April or May. Care to taste
Strawberry Cheesecake Icecream? You can also visit Canada, it's
two hours away :-)).

Do it soon. Bush's inane passport moves are going to virtually cut
off international travel unless somebody rapidly gains sanity.
This will probably also reduce the viability of the Burlington
economic area. It's going to play hell with Calais ME.

(Note for other nationalities - the areas involved are Burlington
VT. USA and Montreal PQ Canada. All accessible by ship, Atlantic,
St. Lawrence, and Richelieu/Lake Champlain for Burlington.)

I am not a US resident so I need a Canadian VISA on my passport
anyway... and it's up from $75 to $100 right now, I believe. But let
me know if you happen to visit the Stowe - Waterbury area.

.... and stop with the politics. Everyone is a liberal in VT and I'm
growing tired of it :-)) (Oh, it's also OT).
--
Ioan - Ciprian Tandau
tandau _at_ freeshell _dot_ org (hope it's not too late)
(... and that it still works...)

Mar 27 '07 #18

Nelu

On Mar 26, 8:26 pm, CBFalconer <cbfalco...@yahoo.comwrote:

Nelu wrote:
Richard Heathfield <r...@see.sig.invalidwrote:
Chris Torek said:

<snip>

>As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

And of course the answer depends on the quality of the chocolate
and whether real strawberries are used. It's actually getting
quite difficult to find strawberry-*flavoured* ice cream in UK
supermarkets, let alone ice cream made with actual strawberries.

I work 3 miles away from Ben & Jerry's original factory. They'll
have their annual Free Cone Day in April or May. Care to taste
Strawberry Cheesecake Icecream? You can also visit Canada, it's
two hours away :-)).

Do it soon. Bush's inane passport moves are going to virtually cut
off international travel unless somebody rapidly gains sanity.
This will probably also reduce the viability of the Burlington
economic area. It's going to play hell with Calais ME.

(Note for other nationalities - the areas involved are Burlington
VT. USA and Montreal PQ Canada. All accessible by ship, Atlantic,
St. Lawrence, and Richelieu/Lake Champlain for Burlington.)

Mar 27 '07 #19

Al Balmer

On Mon, 26 Mar 2007 19:26:38 -0500, CBFalconer <cb********@yahoo.com>
wrote:

>Nelu wrote:
>Richard Heathfield <r...@see.sig.invalidwrote:
>>Chris Torek said:

<snip>

As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

And of course the answer depends on the quality of the chocolate
and whether real strawberries are used. It's actually getting
quite difficult to find strawberry-*flavoured* ice cream in UK
supermarkets, let alone ice cream made with actual strawberries.

I work 3 miles away from Ben & Jerry's original factory. They'll
have their annual Free Cone Day in April or May. Care to taste
Strawberry Cheesecake Icecream? You can also visit Canada, it's
two hours away :-)).

Do it soon. Bush's inane passport moves are going to virtually cut
off international travel unless somebody rapidly gains sanity.

[OT}
Is there some reason you aren't eligible for a passport? IMO, this
should have been done years ago. BTW, Bush is also lobbying to add 22
more countries to the visa waiver program, which certainly won't hurt
international travel.
[OT]

>This will probably also reduce the viability of the Burlington
economic area. It's going to play hell with Calais ME.

(Note for other nationalities - the areas involved are Burlington
VT. USA and Montreal PQ Canada. All accessible by ship, Atlantic,
St. Lawrence, and Richelieu/Lake Champlain for Burlington.)

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Al Balmer
Sun City, AZ

Mar 27 '07 #20

No*****@gmail.com <No*****@gmail.comwrote:

I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??

Any advice??

I think you can quickly learn enough Fortran at a superficial level
to do some number crunching. Just be good and use "implicit none".

Since you already know C, I would say: write your first project in C
but learn enough Fortran to translate it. Make sure the answers
are the same, then race the two and post the results here :-)

Seriously. Fortran compilers can optimize more aggressively than
C because the language semantics are different. C99 plugs this
gap (mostly) with the "restrict" qualifier, but I don't know how
that plays out in practice and I would love to see data.
--
pa at panix dot com

Mar 27 '07 #21

Nelu

On Mar 27, 11:01 am, "Nelu" <tandaui...@gmail.comwrote:

On Mar 26, 8:26 pm, CBFalconer <cbfalco...@yahoo.comwrote:

Nelu wrote:
Richard Heathfield <r...@see.sig.invalidwrote:
>Chris Torek said:

><snip>

>>As for the question in the "subject" line, "is C faster than
>>fortran?": the answer is the same as the answer to the question,
>>"is chocolate ice cream tastier than strawberry?".

>And of course the answer depends on the quality of the chocolate
>and whether real strawberries are used. It's actually getting
>quite difficult to find strawberry-*flavoured* ice cream in UK
>supermarkets, let alone ice cream made with actual strawberries.

I work 3 miles away from Ben & Jerry's original factory. They'll
have their annual Free Cone Day in April or May. Care to taste
Strawberry Cheesecake Icecream? You can also visit Canada, it's
two hours away :-)).

Do it soon. Bush's inane passport moves are going to virtually cut
off international travel unless somebody rapidly gains sanity.
This will probably also reduce the viability of the Burlington
economic area. It's going to play hell with Calais ME.

(Note for other nationalities - the areas involved are Burlington
VT. USA and Montreal PQ Canada. All accessible by ship, Atlantic,
St. Lawrence, and Richelieu/Lake Champlain for Burlington.)

I am not a US resident so I need a Canadian VISA on my passport
anyway... and it's up from $75 to $100 right now, I believe. But let
me know if you happen to visit the Stowe - Waterbury area.

... and stop with the politics. Everyone is a liberal in VT and I'm
growing tired of it :-)) (Oh, it's also OT).

Sorry for the double post, I blame it on Google :-)

--
Ioan - Ciprian Tandau
tandau _at_ freeshell _dot_ org (hope it's not too late)
(... and that it still works...)

Mar 27 '07 #22

Kenneth Brody

Nelu wrote:
[... snip unsnipped quote ...]

>
Sorry for the double post, I blame it on Google :-)

Well, triple, now that you posted it again with the apology. But
who's counting? :-)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h|
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>

Mar 27 '07 #23

Richard Heathfield

Kenneth Brody said:

Nelu wrote:
[... snip unsnipped quote ...]
>>
Sorry for the double post, I blame it on Google :-)

Well, triple, now that you posted it again with the apology. But
who's counting? :-)

Apparently you are. :-)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

Mar 27 '07 #24

On Mar 27, 8:41Â*am, p...@see.signature.invalid (Pierre Asselin) wrote:

Noma...@gmail.com <Noma...@gmail.comwrote:
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??
Any advice??

I think you can quickly learn enough Fortran at a superficial level
to do some number crunching. Â*Just be good and use "implicit none".

Since you already know C, I would say: write your first project in C
but learn enough Fortran to translate it. Â*Make sure the answers
are the same, then race the two and post the results here :-)

Seriously. Â*Fortran compilers can optimize more aggressively than
C because the language semantics are different. Â*C99 plugs this
gap (mostly) with the "restrict" qualifier, but I don't know how
that plays out in practice and I would love to see data.

Using:
http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90
Via:
dcorbit@DCORBIT64 /f/tmp
$ g95 -O3 -Wall test_fpu.f90
In file test_fpu.f90:83

91 FORMAT (A,I4,2('/',I2.2))
1
Warning (110): Label 91 at (1) defined but not used
In file test_fpu.f90:2293

INTEGER :: i , info , j , l , ncola , nrowa , nrowb
1
Warning (112): Variable 'ncola' at (1) is set but never used
test_fpu.f90: In function 'dtrmv_':
test_fpu.f90:3611: warning: 'kx' may be used uninitialized in this
function

dcorbit@DCORBIT64 /f/tmp
$ ./a
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 15.1 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 7.4 sec Err= 0.000000000000000
Test3 - Crout 2 (1001x1001) inverts 13.8 sec Err= 0.000000000000005
Test4 - Lapack 2 (1001x1001) inverts 10.0 sec Err= 0.000000000000417
total = 46.3 sec
dcorbit@DCORBIT64 /f/tmp
$ g95 --version
G95 (GCC 4.1.2 (g95 0.91!) Mar 21 2007)
Copyright (C) 2002-2005 Free Software Foundation, Inc.

G95 comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of G95
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
Using:
/*
* --------------------------------------------------------------
* TEST_FPU A number-crunching benchmark using matrix inversion.
* Implemented by: David Frank Da********@aol.com
* Gauss routine by: Tim Prince N8**@aol.com
* Crout routine by: James Van Buskirk to****@ix.netcom.com
* F90->C source by: Sergey N. Voronkov se**@ggd.nsu.ru
* Pointer exchange version by: Dieter Buerssner bu***@gmx.de
* --------------------------------------------------------------
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>

/*
* Compiling with NI = 1001 (default) generates pool(51,51,1000) =
20mb.
* If test system pages due to insufficient memory (disk i/o activity
* occurs), abort run and compile with NI = 200, benchmark will
adjust
* time x 5.
*/

#define NI 1001
#define NN 51
#define RK8 double
/* below are additional C routines supplied by translator */

void memflt()
{
fputs("Memory allocation error\n", stderr);
exit(EXIT_FAILURE);
}
void alloc_arrays(RK8 ** p[NI], RK8 *** a, RK8 *** b)
{
int i,
j;

for (i = 0; i < NI; i++) {
if ((p[i] = (RK8 **) malloc(NN * sizeof(RK8 *))) == NULL)
memflt();
for (j = 0; j < NN; j++)
if ((p[i][j] = (RK8 *) malloc(NN * sizeof(RK8))) == NULL)
memflt();
}
if ((*a = (RK8 **) malloc(NN * sizeof(RK8 *))) == NULL ||
(*b = (RK8 **) malloc(NN * sizeof(RK8 *))) == NULL)
memflt();
for (i = 0; i < NN; i++)
if (((*a)[i] = (RK8 *) malloc(NN * sizeof(RK8))) == NULL ||
((*b)[i] = (RK8 *) malloc(NN * sizeof(RK8))) == NULL)
memflt();
}
void random_number(RK8 ** pool[NI])
{
int i,
j,
k;

for (i = 0; i < NI; i++)
for (j = 0; j < NN; j++)
for (k = 0; k < NN; k++)
pool[i][j][k] = (RK8) (rand()) / RAND_MAX;
}
RK8
timesec()
{
return (RK8) (clock()) / CLOCKS_PER_SEC;
}
/* prototype the invert functions that follow exec source */
void Gauss(RK8 ** a, RK8 ** b, int n);
void Crout(RK8 ** a, RK8 ** b, int n);
int rgaussi(RK8 ** a, RK8 ** b, int n);

int main()
{

RK8 **pool[NI]; /* pool of matrices to invert */
RK8 **a,
**ai; /* working matrices use < 256k */
RK8 avg_err,
total_time,
time1;
int i,
j,
n;

char *revision = "01/10/98"; /* Gauss speedup mod */
char invert_id[3][8] =
{
"Gauss", "Crout", "Dieter"};

struct tm *ptm;
time_t crtime;
FILE *fp;

/* Begin by allocating matrix arrays */
alloc_arrays(pool, &a, &ai);

puts("Benchmark running, hopefully as only ACTIVE task");

if ((fp = fopen("test_fpc.dat", "w")) == NULL) {
fprintf(stderr, "Can't open output file!\n");
return EXIT_FAILURE;
}
crtime = time(NULL);
ptm = gmtime(&crtime);

fprintf(fp, "Date run = %2d/%2d/%2d\n",
ptm->tm_mon + 1, ptm->tm_mday, ptm->tm_year);

fputs("Pls supply info below, send to Da********@aol.com\n"
"Tester name/ net address = \n"
"CPU mfg/id/Mhz = \n"
"MEM/CACHE = \n"
"O.S. / COMPILER = \n"
"Additional comments = \n\n\n\n\n", fp);

fprintf(fp, "Results for %s revision using TEST_FPU.C \n",
revision);

srand(time(NULL)); /* set seed to random number based on time */
random_number(pool); /* fill pool with random data ( 0. -1. ) */

for (n = 0; n < 3; n++) { /* for Gauss,Crout algorithms */
time1 = timesec(); /* start benchmark n time */

for (i = 0; i < NI; i++) {
/* get next matrix to invert */
for (j = 0; j < NN; j++)
memcpy(a[j], pool[i][j], sizeof(RK8) * NN);

switch (n) {
case 0:
Gauss(a, ai, NN); /* invert a -ai ; destructs a */
Gauss(ai, a, NN); /* invert ai -a */
break;
case 1:
Crout(a, ai, NN); /* invert a -ai ; nondestructs a
*/
Crout(ai, a, NN); /* invert ai -a */
break;
case 2:
rgaussi(a, ai, NN); /* invert a -ai ; nondestructs a
*/
rgaussi(ai, a, NN); /* invert ai -a */
break;
}
}

total_time = timesec() - time1; /* = elapsed time sec. */

/* check accuracy last matrix invert. */
avg_err = 0;
for (i = 0; i < NN; i++)
for (j = 0; j < NN; j++)
avg_err += fabs(a[i][j] - pool[NI - 1][i][j]);

if (NI == 200)
fprintf(fp, "\n%s 5 x 200 x 2 inverts = %6.1f sec.\n",
invert_id[n], 5 * total_time);
else
fprintf(fp, "\n%s 1000 x 2 inverts = %6.1f sec.\n",
invert_id[n], total_time);

fputs("Accuracy of 2 computed numbers\n", fp);
fprintf(fp, "Original = %18.15f %18.15f\n",
pool[NI - 1][0][0], pool[NI - 1][NN - 1][NN - 1]);
fprintf(fp, "Computed = %18.15f %18.15f\n",
a[0][0], a[NN - 1][NN - 1]);
fprintf(fp, "Avg Err. = %18.15f\n", avg_err / (NN * NN));

} /* for Gauss,Crout algorithms */

puts("Results written to: TEST_FPC.DAT");

return EXIT_SUCCESS;
}
/*
* --------------------------------------
* Invert matrix a -b by Gauss method
* --------------------------------------
*/
void Gauss(RK8 ** a, RK8 ** b, int n)
{
RK8 d,
temp = 0,
c;
int i,
j,
k,
m,
nn,
*ipvt;

if ((ipvt = (int *) malloc(n * sizeof(int))) == NULL)
memflt();

nn = n;
for (i = 0; i < nn; i++)
ipvt[i] = i;

for (k = 0; k < nn; k++) {
temp = 0.;
m = k;
for (i = k; i < nn; i++) {
d = a[k][i];
if (fabs(d) temp) {
temp = fabs(d);
m = i;
}
}
if (m != k) {
j = ipvt[k];
ipvt[k] = ipvt[m];
ipvt[m] = j;
for (j = 0; j < nn; j++) {
temp = a[j][k];
a[j][k] = a[j][m];
a[j][m] = temp;
}
}
d = 1 / a[k][k];
for (j = 0; j < k; j++) {
c = a[j][k] * d;
for (i = 0; i < nn; i++)
a[j][i] -= a[k][i] * c;
a[j][k] = c;
}
for (j = k + 1; j < nn; j++) {
c = a[j][k] * d;
for (i = 0; i < nn; i++)
a[j][i] -= a[k][i] * c;
a[j][k] = c;
}
for (i = 0; i < nn; i++)
a[k][i] = -a[k][i] * d;
a[k][k] = d;
}

for (i = 0; i < nn; i++)
memcpy(b[ipvt[i]], a[i], sizeof(RK8) * nn);

free(ipvt);
}
/*
* --------------------------------------
* Invert matrix a -b by Crout method
* --------------------------------------
*/
void Crout(RK8 ** a, RK8 ** b, int n)
{
int i,
j; /* Current row & column */
int maxlok; /* Location of maximum pivot */
int *index; /* Partial pivot record */
RK8 *temp = 0,
the_max;
RK8 tmp,
*ptr;
RK8 *matr = 0;
int k,
ind,
ind2;

if ((index = (int *) malloc(n * sizeof(int))) == NULL ||
(temp = (RK8 *) malloc(n * sizeof(RK8))) == NULL ||
(matr = (RK8 *) malloc(n * n * sizeof(RK8))) == NULL)
memflt();

/* Initialize everything */

for (i = 0; i < n; i++)
index[i] = i;

/* Shuffle matrix */
for (j = 0; j < n; j++) {
for (i = 0; i < j; i++)
b[j][i] = a[j][i];
for (i = j; i < n; i++)
b[j][i] = a[i - j][n - j - 1];
}

/* LU decomposition; reciprocals of diagonal elements in L matrix */
for (j = 0; j < n; j++) {
/* Get current column of L matrix */
for (i = j; i < n; i++) {
tmp = 0;
ind = n - i - 1;
for (k = 0; k < j; k++)
tmp += b[ind][ind + k] * b[j][k];
b[ind][ind + j] -= tmp;
}
maxlok = 0;
the_max = fabs(b[0][j]);
for (i = 1; i < n - j; i++)
if (fabs(b[i][j + i]) >= the_max) {
the_max = fabs(b[i][j + i]);
maxlok = i;
}
/* det = det*b(j+maxlok-1,maxlok) */
b[maxlok][j + maxlok] = 1 / b[maxlok][j + maxlok];

/* Swap biggest element to current pivot position */
if (maxlok + 1 != n - j) {
ind = n - maxlok - 1;
ind2 = index[j];
index[j] = index[ind];
index[ind] = ind2;
for (k = n - maxlok; k < n; k++) {
tmp = b[k][j];
b[k][j] = b[k][ind];
b[k][ind] = tmp;
}
memcpy(temp, &(b[maxlok][maxlok]), sizeof(RK8) * (n -
maxlok));
ptr = &(b[n - j - 1][n - j - 1]);
memmove(&(b[maxlok][maxlok]), ptr, sizeof(RK8) * (j + 1));
for (k = j + 1; k < n - maxlok; k++)
b[maxlok][maxlok + k] = b[k][j];
memcpy(ptr, temp, (j + 1) * sizeof(RK8));
for (k = j + 1; k < n - maxlok; k++)
b[k][j] = temp[k];
}
/* Get current row of U matrix */
ind = n - j - 1;
for (i = j + 1; i < n; i++) {
tmp = 0.;
for (k = 0; k < j; k++)
tmp += b[i][k] * b[ind][ind + k];
b[i][j] = b[ind][n - 1] * (b[i][j] - tmp);
}
} /* END DO LU_outer */

/* Invert L matrix */
for (j = 0; j < n - 1; j++) {
temp[0] = b[n - j - 1][n - 1];
for (i = j + 1; i < n; i++) {
ind = n - i - 1;
tmp = 0.;
for (k = 0; k < i - j; k++)
tmp += temp[k] * b[ind][ind + j + k];
b[ind][ind + j] = -tmp * b[ind][n - 1];
temp[i - j] = b[ind][ind + j];
}
}

/* Reshuffle matrix */
for (i = 0; i < (n + 1) / 3; i++) {
memcpy(temp, &(b[i][2 * (i + 1) - 1]), sizeof(RK8) * (n + 2 -
3 * (i +1)));
for (j = 2 * (i + 1) - 1; j < n - i; j++)
b[i][j] = b[n - j - 1][n - j + i - 1];
ind = n - i - 1;
for (j = i; j < n + 1 - 2 * (i + 1); j++)
b[j][i + j] = b[n - i - j - 1][ind];
for (k = 0; k < n + 2 - 3 * (i + 1); k++)
b[i + 1 + k][ind] = temp[k];
}

/* Invert U matrix */
for (i = 0; i < n - 1; i++) {
for (j = i + 1; j < n; j++) {
tmp = 0.;
for (k = 0; k < j - i - 1; k++)
tmp += temp[k] * b[j][i + 1 + k];
b[j][i] = -b[j][i] - tmp;
temp[j - i - 1] = b[j][i];
}
}

/* Multiply inverses in reverse order */
for (i = 0; i < n - 1; i++) {
for (k = 0; k < n - i - 1; k++)
temp[k] = b[i + 1 + k][i];
for (j = 0; j <= i; j++) {
tmp = 0.;
for (k = 0; k < n - i - 1; k++)
tmp += temp[k] * b[j][i + 1 + k];
b[j][i] += tmp;
}
for (j = i + 1; j < n; j++) {
tmp = 0.;
for (k = j; k < n; k++)
tmp += temp[k - i - 1] * b[j][k];
b[j][i] = tmp;
}
}

/* Straighten out the columns of the result */
for (i = 0; i < n; i++)
memcpy(matr + n * i, b[i], sizeof(RK8) * n);
for (i = 0; i < n; i++)
memcpy(b[index[i]], matr + n * i, sizeof(RK8) * n);

free(index);
free(temp);
free(matr);
}

/*
** This routine is due to bu***@gmx.de (Dieter Buerssner)
** Destroys a, return 0: success, 1: zero pivot, 2: out of mem.
*/
int rgaussi(RK8 ** a, RK8 ** b, int n)
{
int i,
j,
k,
maxj,
t;
RK8 maxel,
pivinv,
tmaxel,
aji;
RK8 *tp,
*ai,
*aj;
/* C99: int ipiv[n]; */
int *ipiv = malloc(n * sizeof *ipiv);

if (ipiv == NULL)
return 2;
for (i = 0; i < n; i++)
ipiv[i] = i;
for (i = 0; i < n; i++) {
maxj = -1;
maxel = 0.0;
/* find pivot element */
for (j = i; j < n; j++)
if ((tmaxel = fabs(a[j][i])) maxel) {
maxj = j;
maxel = tmaxel;
}
if (maxj < 0) {
free(ipiv);
return 1;
}
/* exchange rows */
if (maxj != i) {
/* just exchange pointers for a */
tp = a[maxj];
a[maxj] = a[i];
a[i] = tp;
t = ipiv[maxj];
ipiv[maxj] = ipiv[i];
ipiv[i] = t;
}
ai = a[i];
pivinv = 1.0 / ai[i];
ai[i] = 1.0;
for (k = 0; k < n; k++)
ai[k] *= pivinv;
for (j = 0; j < n; j++) {
if (j != i) {
aj = a[j];
aji = aj[i];
aj[i] = 0.0;
for (k = 0; k < n; k++)
aj[k] -= aji * ai[k];
}
}
}
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
b[i][ipiv[j]] = a[i][j];
free(ipiv);
return 0;
}

via:
CL /Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /
D "_UNICODE" /D "UNICODE" /FD /MD /Zp16 /fp:fast /Fo"Release\\" /
Fd"Release\vc80.pdb" /W4 /nologo /c /Wp64 /Zi /Gr /TP /wd4996 /
errorReport:prompt

Date run = 3/27/107
Pls supply info below, send to Da********@aol.com
Tester name/ net address = {Dann Corbit/dc*****@connx.com}
CPU mfg/id/Mhz =
CPU Identification utility v1.11 (c) 1997-2005 Jan
Steunebrink

â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” €â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€ â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” €â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
CPU Vendor and Model: AMD Athlon 64 2800+-3700+
Internal CPU speed : 2199.4 MHz (using internal Time Stamp Counter)
Clock Multiplier : Available only in Real Mode!
CPU-ID Vendor string: AuthenticAMD
CPU-ID Name string : AMD Athlon(tm) 64 Processor 3400+
CPU-ID Signature : 0F4A
â”‚â”‚â”‚â””â”€ Stepping or sub-model no.
â”‚â”‚â””â”€ Model: Indicates CPU Model and 486 L1
cache mode
â”‚â””â”€ Family: 4=486, Am5x86,Cx5x86
â”‚ 5=Pentium, Nx586, Cx6x86, K5/K6,
C6, mP6
â”‚ 6=PentiumPro/II/III, CxMII/III,
Athlon, C3
â”‚ F=Pentium4, Athlon64
â””â”€ Type: 0=Standard, 1=Overdrive, 2=2nd Dual
Pentium
Current CPU mode : Protected
Internal (L1) cache : Enabled in Write-Back mode
MEM/CACHE = 2GB physical RAM
O.S. / COMPILER = Windows 2003 / Microsoft (R) 32-bit C/C++ Optimizing
Compiler Version 14.00.50727.42 for 80x86
Additional comments =

Results for 01/10/98 revision using TEST_FPU.C

Gauss 1000 x 2 inverts = 0.4 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528701 0.364574114200264
Avg Err. = 0.000000000000001

Crout 1000 x 2 inverts = 0.7 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528701 0.364574114200266
Avg Err. = 0.000000000000003

Dieter 1000 x 2 inverts = 0.4 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528703 0.364574114200264
Avg Err. = 0.000000000000002

We need a Fortran Guru to show me what I am doing wrong, because there
is no feasible explanation for this other than I do not know how to
get performance out of my Fortran compiler.

Mar 27 '07 #25

Al Balmer wrote:

CBFalconer <cb********@yahoo.comwrote:

.... snip ...

>
>Do it soon. Bush's inane passport moves are going to virtually
cut off international travel unless somebody rapidly gains sanity.
[OT}
Is there some reason you aren't eligible for a passport? IMO, this
should have been done years ago. BTW, Bush is also lobbying to add
22 more countries to the visa waiver program, which certainly won't
hurt international travel.
[OT]

It's just another means of increasing the bureaucracy size and
raising taxes. No practical benefits to anyone or anything. Many
detriments.

During WWII we derided the practice of carrying papers in
dictatorial realms.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Mar 27 '07 #26

glen herrmannsfeldt

user923005 wrote:

On Mar 27, 8:41 am, p...@see.signature.invalid (Pierre Asselin) wrote:

(snip)

>>Seriously. Fortran compilers can optimize more aggressively than
C because the language semantics are different. C99 plugs this
gap (mostly) with the "restrict" qualifier, but I don't know how
that plays out in practice and I would love to see data.

(snip)

We need a Fortran Guru to show me what I am doing wrong, because there
is no feasible explanation for this other than I do not know how to
get performance out of my Fortran compiler.

That is what they say, and maybe on the average it is true.

There are too many variables to make it true in general.
First, the programs should be as similar as possible, and also
the data. In the (snipped) programs, the C version uses arrays of
pointers to arrays of pointers to arrays of pointers, which is not
likely true for Fortran, and might make a difference either way.

The programs are doing matrix inversion of random numbers, which
are generated using system dependent generators. There may be some
data dependence to the timing of matrix inversion, especially
if the hardware timing is different for different data.

Optimization is probably more sensitive to the number of registers
available than to the ability to rearrange expressions.
Do both Fortran and C programs have the array subscripts in the
optimal order? Most compilers won't change that.

-- glen

Mar 28 '07 #27

highegg

We need a Fortran Guru to show me what I am doing wrong, because there
is no feasible explanation for this other than I do not know how to
get performance out of my Fortran compiler.

Perhaps. I see you specified a whole bunch of options for the MSVSC++
compiler (out of those I only recall that the /G are for code
generation, /O are presumably optimizers), while you just specify -O3
for g95. Additional options for g95 (as well as other compilers) may
include -ffast-math (equivalent to /fp:fast for VC++ ?), -fomit-frame-
pointer, -march=athlon (or something similar), -funroll-loops.
g95 has almost all optimizations turned off by default.
Even if you do so, do not expect miracles. Two things need to be
noted:
the Fortran code may not be written with speed in mind. Note that
array arithmetic has to create a temporary array if necessary. It is
up to the compiler to determine whether or not it is necessary.
Especially tough might be tricks like this (snipped from TEST_FPU.f90)
b((/m,k/),:) = b((/k,m/),:)

although the swapping here can be done with one temp variable, this is
probably going to create a 2xsize(b,2) temporary array, copy the
righthand side to it, the copy it to the lefthand side. A more speed-
friendly coding could look like this:

elemental subroutine swap(a,b)
real,intent(inout)::a,b
real:: tmp
tmp = a; a = b; b = tmp
end subroutine
....
call swap(b(m,:),b(k,:))

or the good old DO loop.

Moreover, it is important to note that g95 isn't the top optimizing
compiler out there; to be honest, it may well be the bottom. It was
not created with aggresive optimizations in mind - and especially
optimizing of complicated array expressions is a tough task. I think
that gfortran (the newer versions) can perform better, but I would
consider more fair a comparison with a commercial Windows compiler,
like Intel Visual Fortran.
Still, it is hard to tell whether a C code is equivalent to some
Fortran code. I did not closely review your C code to see how close it
gets.

regards,
Jaroslav (not remotely a Fortran guru)

Mar 28 '07 #28

On Mar 27, 10:49 pm, "highegg" <high...@gmail.comwrote:

We need a Fortran Guru to show me what I am doing wrong, because there
is no feasible explanation for this other than I do not know how to
get performance out of my Fortran compiler.

Perhaps. I see you specified a whole bunch of options for the MSVSC++
compiler (out of those I only recall that the /G are for code
generation, /O are presumably optimizers), while you just specify -O3
for g95. Additional options for g95 (as well as other compilers) may
include -ffast-math (equivalent to /fp:fast for VC++ ?), -fomit-frame-
pointer, -march=athlon (or something similar), -funroll-loops.
g95 has almost all optimizations turned off by default.
Even if you do so, do not expect miracles. Two things need to be
noted:

I have an AMD machine, so I did like you suggest:
dcorbit@DCORBIT64 /f/tmp
$ g95 -O3 -Wall -ffast-math -fomit-frame-pointer -march=athlon -
funroll-loops
test_fpu.f90
In file test_fpu.f90:83

91 FORMAT (A,I4,2('/',I2.2))
1
Warning (110): Label 91 at (1) defined but not used
In file test_fpu.f90:2293

INTEGER :: i , info , j , l , ncola , nrowa , nrowb
1
Warning (112): Variable 'ncola' at (1) is set but never used
test_fpu.f90: In function 'dtrmv_':
test_fpu.f90:3611: warning: 'kx' may be used uninitialized in this
function

dcorbit@DCORBIT64 /f/tmp
$ ./a
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 12.7 sec Err= 0.000000000000003
Test2 - Crout 2000 (101x101) inverts 7.0 sec Err= 0.000000000000001
Test3 - Crout 2 (1001x1001) inverts 11.6 sec Err= 0.000000000000002
Test4 - Lapack 2 (1001x1001) inverts 8.0 sec Err= 0.000000000000643
total = 39.3 sec

the Fortran code may not be written with speed in mind. Note that
array arithmetic has to create a temporary array if necessary. It is
up to the compiler to determine whether or not it is necessary.
Especially tough might be tricks like this (snipped from TEST_FPU.f90)
b((/m,k/),:) = b((/k,m/),:)

although the swapping here can be done with one temp variable, this is
probably going to create a 2xsize(b,2) temporary array, copy the
righthand side to it, the copy it to the lefthand side. A more speed-
friendly coding could look like this:

elemental subroutine swap(a,b)
real,intent(inout)::a,b
real:: tmp
tmp = a; a = b; b = tmp
end subroutine
...
call swap(b(m,:),b(k,:))

or the good old DO loop.

Moreover, it is important to note that g95 isn't the top optimizing
compiler out there; to be honest, it may well be the bottom. It was
not created with aggresive optimizations in mind - and especially
optimizing of complicated array expressions is a tough task. I think
that gfortran (the newer versions) can perform better, but I would
consider more fair a comparison with a commercial Windows compiler,
like Intel Visual Fortran. Don't have that one.

gfortran barfed:
dcorbit@DCORBIT64 /f/tmp
$ gfortran -O3 -Wall test_fpu.f90
In file test_fpu.f90:83

91 FORMAT (A,I4,2('/',I2.2))
1
Warning: Label 91 at (1) defined but not used
test_fpu.f90: In function 'gauss':
test_fpu.f90:106: internal compiler error: in gfc_conv_ss_descriptor,
at fortran/trans-array.c:1235
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.htmlfor instructions.

dcorbit@DCORBIT64 /f/tmp
$ gfortran --version
GNU Fortran 95 (GCC 4.0.1 20050608 (prerelease))
Copyright (C) 2005 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

dcorbit@DCORBIT64 /f/tmp

Still, it is hard to tell whether a C code is equivalent to some
Fortran code. I did not closely review your C code to see how close it
gets.

regards,
Jaroslav (not remotely a Fortran guru)

Mar 28 '07 #29

Richard Bos

"Nelu" <ta********@gmail.comwrote:

On Mar 26, 2:06 pm, Richard Heathfield <r...@see.sig.invalidwrote:
Chris Torek said:

As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".
And of course the answer depends on the quality of the chocolate and
whether real strawberries are used. It's actually getting quite
difficult to find strawberry-*flavoured* ice cream in UK supermarkets,
let alone ice cream made with actual strawberries.

I work 3 miles away from Ben & Jerry's original factory. They'll have
their annual Free Cone Day in April or May. Care to taste some
Strawberry Cheesecake Icecream? You can also visit Canada, it's two
hours away :-)).

I prefer real C and real ice cream. Luckily I have a real Italian
gelatero living in town, and a real C Standard printed by Wiley on my
desk.

Richard

Mar 28 '07 #30

Michael Prager

"highegg" <hi*****@gmail.comwrote:

Moreover, it is important to note that g95 isn't the top optimizing
compiler out there; to be honest, it may well be the bottom.

That's been true for me -- g95 execution time is twice that of
Lahey LF95 on my codes, and Lahey is not the fastest ediest
Windows Fortran compiler.

(Need I say, I still consider g95 a wonderful tool and
outstanding accomplishment.)

--
Mike Prager, NOAA, Beaufort, NC
Address spam-trapped; remove color to reply.
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.

Mar 28 '07 #31

In comp.lang.c user923005 <dc*****@connx.comwrote:

Using:
http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90
Via:
dcorbit@DCORBIT64 /f/tmp
$ g95 -O3 -Wall test_fpu.f90

[ ... ]

dcorbit@DCORBIT64 /f/tmp
$ ./a
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 15.1 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 7.4 sec Err= 0.000000000000000
Test3 - Crout 2 (1001x1001) inverts 13.8 sec Err= 0.000000000000005
Test4 - Lapack 2 (1001x1001) inverts 10.0 sec Err= 0.000000000000417
total = 46.3 sec

These numbers are quite reasonable. At O(n^3) operations per matrix
inversion, they correspond to 136, 278, 145 and 201 megaflops. I
didn't do an accurate flop count, just 2000*101^3 ops for the first
two and 2*1001^3 for the last two.

Using:
[ similar C program omitted, except: ]
[ #define NI 1001 ] --number of repeats
[ #define NN 51 ] --matrix size
via:
CL /Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /
D "_UNICODE" /D "UNICODE" /FD /MD /Zp16 /fp:fast /Fo"Release\\" /
Fd"Release\vc80.pdb" /W4 /nologo /c /Wp64 /Zi /Gr /TP /wd4996 /
errorReport:prompt

Results for 01/10/98 revision using TEST_FPU.C

Gauss 1000 x 2 inverts = 0.4 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528701 0.364574114200264
Avg Err. = 0.000000000000001

Crout 1000 x 2 inverts = 0.7 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528701 0.364574114200266
Avg Err. = 0.000000000000003

Dieter 1000 x 2 inverts = 0.4 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528703 0.364574114200264
Avg Err. = 0.000000000000002

We need a Fortran Guru to show me what I am doing wrong, because there
is no feasible explanation for this other than I do not know how to
get performance out of my Fortran compiler.

From a cursory inspection of the C, these are smaller systems: 2002
inversions of 51x51 matrices compared to the 2000 101x101 and 2
1001x1001 of the Fortran example. The numbers correspond to ~332,
190 and 332 megaflops, but the accuracy may not be so good because
of the short times.

On the Fortran code, I get
gfortran -o fortran -O3 -funroll-loops TEST_FPU.f90
../fortran
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 6.6 sec Err= 0.000000000000003
Test2 - Crout 2000 (101x101) inverts 7.1 sec Err= 0.000000000000000
Test3 - Crout 2 (1001x1001) inverts 22.9 sec Err= 0.000000000000001
Test4 - Lapack 2 (1001x1001) inverts 20.8 sec Err= 0.000000000000273
total = 57.3 sec

Approx 312, 290, 87.6 and 96.4 megaflops (on a little laptop, must
be spilling out of cache).

I translated Gauss() to C by hand but not the other two (especially
not Lapack() !). I'll append my code below. I basically transliterated
the algorithm, using C99 VLA's instead of arrays of pointers. I
also transposed all the matrices so the memory access patterns
would be similar in C and in Fortran; this is ok mathematically,
matrix inversion commutes with transposition. I added a unit test,
not shown, to check that multiplying a matrix by the alleged inverse
returned by Gauss() yields the identity matrix.

I get:

cc -o C -std=c99 -O3 -funroll-loops -Wall TEST_FPU.c
../C
Test1 - Gauss 2000 (101x101) inverts 6.415025 sec Err=0.000000
--rest not implemented--

so comparable to Fortran. Maybe a smidge faster, in fact.

Here is the C version of Gauss(), followed by the original
Fortran. Oh, and the "restrict" keyword doesn't make one
bit of difference.

************************************************** ********************

static inline int isamax(int n, double v[n])
{
int i, imax;
double tmp, max;
for(i= 1, imax= 0, max= fabs(v[0]); i<n; i++) {
tmp= fabs(v[i]);
if(tmp>max) { max= tmp; imax= i; }
}
return imax;
}

static void Gauss(int n, double a[n][n])
{
double (* restrict b)[n]= malloc(n*sizeof(b[0]));
double * restrict temp= malloc(n*sizeof(temp[0]));
int * restrict piv= malloc(n*sizeof(piv[0]));
double c, d;
int i, j, k, m, jmax;

memcpy(b, a, n*sizeof(b[0]));
for(i= 0; i<n; i++) piv[i]= i;

for(k= 0; k<n; k++) {
/* largest element in b[k][k:n] */
jmax= k+isamax(n-k, &b[k][k]);

if(jmax != k) {
/* swap columns k and jmax */
m= piv[k]; piv[k]= piv[jmax]; piv[jmax]= m;
for(i= 0; i<n; i++) {
c= b[i][k];
b[i][k]= b[i][jmax];
b[i][jmax]= c;
}
}

d= 1./b[k][k];

memcpy(temp, b[k], sizeof(b[k]));
for(i= 0; i<n; i++) {
c= b[i][k]*d;
for(j= 0; j<n; j++) {
b[i][j]-= temp[j]*c;
}
b[i][k]= c;
}
for(j= 0; j<n; j++) b[k][j]= -d*temp[j];
b[k][k]= d;
}

for(i= 0; i<n; i++) for(j= 0; j<n; j++) {
a[piv[i]][j]= b[i][j];
}

free(piv); free(temp); free(b);
}

************************************************** ********************

MODULE kinds
INTEGER, PARAMETER :: RK8 = SELECTED_REAL_KIND(15, 300)
END MODULE kinds

SUBROUTINE Gauss (a,n) ! Invert matrix by Gauss method
! --------------------------------------------------------------------
USE kinds
IMPLICIT NONE

INTEGER :: n
REAL(RK8) :: a(n,n)

! - - - Local Variables - - -
REAL(RK8) :: b(n,n), c, d, temp(n)
INTEGER :: i, j, k, m, imax(1), ipvt(n)
! - - - - - - - - - - - - - -
b = a
ipvt = (/ (i, i = 1, n) /)

DO k = 1,n
imax = MAXLOC(ABS(b(k:n,k)))
m = k-1+imax(1)

IF (m /= k) THEN
ipvt( (/m,k/) ) = ipvt( (/k,m/) )
b((/m,k/),:) = b((/k,m/),:)
END IF
d = 1/b(k,k)

temp = b(:,k)
DO j = 1, n
c = b(k,j)*d
b(:,j) = b(:,j)-temp*c
b(k,j) = c
END DO
b(:,k) = temp*(-d)
b(k,k) = d
END DO
a(:,ipvt) = b

END SUBROUTINE Gauss

************************************************** ********************
--
pa at panix dot com

Mar 30 '07 #32

Arjen Markus

On 30 mrt, 06:00, p...@see.signature.invalid (Pierre Asselin) wrote:

In comp.lang.c user923005 <dcor...@connx.comwrote:

Using:
http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90

I used the above program with two different compilers on one Linux
machine and got the following results:

% ifort -o test_fpu test_fpu.f90
% test_fpu
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 3.1 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 4.9 sec Err= 0.000000000000003
Test3 - Crout 2 (1001x1001) inverts 9.6 sec Err= 0.000000000000073
Test4 - Lapack 2 (1001x1001) inverts 6.6 sec Err= 0.000000000000537
total = 24.2 sec

and:
% g95 -o test_fpu test_fpu.f90
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 56.5 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 21.1 sec Err= 0.000000000000001
Test3 - Crout 2 (1001x1001) inverts 13.1 sec Err= 0.000000000000001
Test4 - Lapack 2 (1001x1001) inverts 34.2 sec Err= 0.000000000000329
total =124.9 sec

That is a factor of 6 in the total time!

When I tried the same with optimisation for speed:

ifort: -O2 -ipo -static (derived from -fast, conservative on
optimisation level)
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 3.1 sec Err= 0.000000000000030
Test2 - Crout 2000 (101x101) inverts 4.7 sec Err= 0.000000000000069
Test3 - Crout 2 (1001x1001) inverts 9.9 sec Err= 0.000000000000381
Test4 - Lapack 2 (1001x1001) inverts 16.6 sec Err= 0.000000000002275
total = 34.4 sec

(Slower, surprisingly, a second run showed 24.9 total)

g95: -O3 (I did not want to examine all the details of the
optimisation)
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 36.8 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 19.6 sec Err= 0.000000000000000
Test3 - Crout 2 (1001x1001) inverts 23.2 sec Err= 0.000000000000009
Test4 - Lapack 2 (1001x1001) inverts 13.1 sec Err= 0.000000000000822
total = 92.7 sec

In this case we gain about 25% speed - not bad.

So, quite apart from the question whether C is faster or not than
Fortran
there is a very wide spread in what compilers are able to do!

That simply adds another factor to consider.

Regards,

Arjen

Mar 30 '07 #33

Arjen Markus

On 30 mrt, 09:51, "Arjen Markus" <arjen.mar...@wldelft.nlwrote:

On 30 mrt, 06:00, p...@see.signature.invalid (Pierre Asselin) wrote:

In comp.lang.c user923005 <dcor...@connx.comwrote:

Using:
>http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90

I used the above program with two different compilers on one Linux
machine and got the following results:

% ifort -o test_fpu test_fpu.f90

I tried this on a Windows XP machine too, but the Intel Fortran
compiler version terminated in case 3 with a stack overflow.
I did specify a stack of 16 million bytes (via the /F option),
but that did not work. Should I use an even larger stack?

Regards,

Arjen

Mar 30 '07 #34

On Mar 30, 6:23 am, "Arjen Markus" <arjen.mar...@wldelft.nlwrote:

On 30 mrt, 09:51, "Arjen Markus" <arjen.mar...@wldelft.nlwrote:

On 30 mrt, 06:00, p...@see.signature.invalid (Pierre Asselin) wrote:

In comp.lang.c user923005 <dcor...@connx.comwrote:

Using:
http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90

I used the above program with two different compilers on one Linux
machine and got the following results:

% ifort -o test_fpu test_fpu.f90

I tried this on a Windows XP machine too, but the Intel Fortran
compiler version terminated in case 3 with a stack overflow.
I did specify a stack of 16 million bytes (via the /F option),
but that did not work. Should I use an even larger stack?

If I recall correctly, the arrays are about 20 MB.
I am kind of surprised that they are formulated as automatic
variables.
In the C version, they are allocated using malloc(), so they do not
exhaust automatic storage.
Hence the C version does not need excessive stack requirements.

Mar 30 '07 #35

highegg

If I recall correctly, the arrays are about 20 MB.

I am kind of surprised that they are formulated as automatic
variables.

That's very common with current Fortran compilers. Automatic arrays
need to be allocated on subroutine entry and deallocated on subroutine
exit, thus they qualify for stack allocation, which is faster than
heap allocation. Compiler-generated temporaries usually also live on
the stack. Allocatable arrays normally go on heap. Some compilers also
offer options to specify the largest array that can be stack-
allocated.

In the C version, they are allocated using malloc(), so they do not
exhaust automatic storage.
Hence the C version does not need excessive stack requirements.

Neither would the Fortran version, if allocatable arrays were used
instead. However, it is a good general idea to increase stack for
Fortran programs - some compilers do that automatically.

Mar 30 '07 #36

In comp.lang.fortran Arjen Markus <ar**********@wldelft.nlwrote:

I used the above program with two different compilers on one Linux
machine and got the following results:

[ifort] total = 24.2 sec
[g95] total =124.9 sec
[ifort -O2 -ipo -static] total = 34.4 sec
(Slower, surprisingly, a second run showed 24.9 total)
[g95 -O3] total = 92.7 sec

So, quite apart from the question whether C is faster or not than
Fortran
there is a very wide spread in what compilers are able to do!

That simply adds another factor to consider.

Indeed. Good showing for Intel. I want to try ifort when
I have a chance.

For the piece that I translated I was surprized to see gcc edge
gfortran, since they use the same back-end. I suspect gfortran
had trouble optimizing the array operations; when I wrote the C
I had no choice but to expand them into loops. I want to try the
same thing on the Fortran side --but it will take a while before
I get to it.

I don't have ifc so I won't be able to do an Intel-Intel
comparison.

--
pa at panix dot com

Mar 31 '07 #37

Ian Collins

Pierre Asselin wrote:

In comp.lang.fortran Arjen Markus <ar**********@wldelft.nlwrote:

>>I used the above program with two different compilers on one Linux
machine and got the following results:

>>[ifort] total = 24.2 sec
[g95] total =124.9 sec
[ifort -O2 -ipo -static] total = 34.4 sec
(Slower, surprisingly, a second run showed 24.9 total)
[g95 -O3] total = 92.7 sec

>>So, quite apart from the question whether C is faster or not than
Fortran
there is a very wide spread in what compilers are able to do!

>>That simply adds another factor to consider.

Indeed. Good showing for Intel. I want to try ifort when
I have a chance.

You might like to try the latest Sun Studio compiler on Linux. On am
AMD64 Solaris box, the performance is significantly faster than f95.

--
Ian Collins.

Mar 31 '07 #38

Pierre Asselin wrote:

In comp.lang.fortran Arjen Markus <ar**********@wldelft.nlwrote:

>I used the above program with two different compilers on one Linux
machine and got the following results:

>[ifort] total = 24.2 sec
[g95] total =124.9 sec
[ifort -O2 -ipo -static] total = 34.4 sec
(Slower, surprisingly, a second run showed 24.9 total)
[g95 -O3] total = 92.7 sec

>So, quite apart from the question whether C is faster or not than
Fortran
there is a very wide spread in what compilers are able to do!

>That simply adds another factor to consider.

Indeed. Good showing for Intel. I want to try ifort when
I have a chance.

For the piece that I translated I was surprized to see gcc edge
gfortran, since they use the same back-end. I suspect gfortran
had trouble optimizing the array operations; when I wrote the C
I had no choice but to expand them into loops. I want to try the
same thing on the Fortran side --but it will take a while before
I get to it.

I don't have ifc so I won't be able to do an Intel-Intel
comparison.

I suspect the difference in array addressing and its influence on
the memory cache might explain most of the differences.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Mar 31 '07 #39

Tobias Burnus

On Mar 31, 4:43 pm, p...@see.signature.invalid (Pierre Asselin) wrote:

In comp.lang.fortran Arjen Markus <arjen.mar...@wldelft.nlwrote:
I used the above program with two different compilers on one Linux
machine and got the following results:
[ifort] total = 24.2 sec
[g95] total =124.9 sec

This comparison does not make sense: Ifort uses by default -O2
optimization and GCC (gfortran, g95, g77) uses -O0.

[ifort -O2 -ipo -static] total = 34.4 sec
(Slower, surprisingly, a second run showed 24.9 total)
[g95 -O3] total = 92.7 sec

For comparison (without thinking too much about the options; on x86-64
Linux):

gfortran [4.3] -march=opteron -ffast-math -funroll-loops -ftree-
vectorize -msse3 -O3 TEST_FPU.f90
total = 19.2 sec

ifort [9.1] -O3 -xW -ipo -static TEST_FPU.f90
total = 17.7 sec

sunf95 [8.3] -w4 -fast -xarch=amd64a -xipo=0 TEST_FPU.f90
total = 15.5 sec

NAG f95 [5.1/gcc 4.1.2] -O4 -ieee=full -Bstatic -march=opteron -ffast-
math -funroll-loops -ftree-vectorize -msse3 TEST_FPU.f90
total = 20.2 sec

g95 [0.91-Feb23/gcc 4.0.3] -march=opteron -ffast-math -funroll-loops -
ftree-vectorize -msse3 -O3 TEST_FPU.f90
total = 31.7 sec

Tobias

Mar 31 '07 #40

Arjen Markus

On 1 apr, 00:32, "Tobias Burnus" <bur...@net-b.dewrote:

On Mar 31, 4:43 pm, p...@see.signature.invalid (Pierre Asselin) wrote:

In comp.lang.fortran Arjen Markus <arjen.mar...@wldelft.nlwrote:
I used the above program with two different compilers on one Linux
machine and got the following results:
[ifort] total = 24.2 sec
[g95] total =124.9 sec

This comparison does not make sense: Ifort uses by default -O2
optimization and GCC (gfortran, g95, g77) uses -O0.

[ifort -O2 -ipo -static] total = 34.4 sec
(Slower, surprisingly, a second run showed 24.9 total)
[g95 -O3] total = 92.7 sec

For comparison (without thinking too much about the options; on x86-64
Linux):

gfortran [4.3] -march=opteron -ffast-math -funroll-loops -ftree-
vectorize -msse3 -O3 TEST_FPU.f90
total = 19.2 sec

ifort [9.1] -O3 -xW -ipo -static TEST_FPU.f90
total = 17.7 sec

sunf95 [8.3] -w4 -fast -xarch=amd64a -xipo=0 TEST_FPU.f90
total = 15.5 sec

NAG f95 [5.1/gcc 4.1.2] -O4 -ieee=full -Bstatic -march=opteron -ffast-
math -funroll-loops -ftree-vectorize -msse3 TEST_FPU.f90
total = 20.2 sec

g95 [0.91-Feb23/gcc 4.0.3] -march=opteron -ffast-math -funroll-loops -
ftree-vectorize -msse3 -O3 TEST_FPU.f90
total = 31.7 sec

Tobias

Oh, I was not aware of the default settings - that explains why there
was little
effect of the optimisation. I should try again on my Linux machine and
see
what happens with the above.

Regards,

Arjen

Apr 1 '07 #41