473,396 Members | 1,972 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Is C faster than fortran?

Hi
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??

Any advice??
Thanks
Chris

Mar 26 '07 #1
52 5037
On 26 Mar 2007 06:15:50 -0700, "No*****@gmail.com" <No*****@gmail.com>
wrote:
>Hi
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??
This is a bit off topic, but:

There was a thread on the perceived merits of Fortran for scientific
programming in comp.fortran this month in which various parties gave
their views. I posted a summary there and in comp.lang.misc. You could
google for the latter - the title was "The merits of fortran" and the
date was March 14 this annum.

Briefly, though, Fortran is often faster for number crunching and is
probably a more pleasant language to use for that purpose. Besides it's
always worthwhile learning another language. If you do tackle it,learn
Fortran 95 - the older versions are, ah, antique.

Mar 26 '07 #2
In article <11*********************@y80g2000hsf.googlegroups. com>
No*****@gmail.com <No*****@gmail.comwrote:
>I've been thinking of learning Fortran ...
Learning additional computer languages is often good. Note (as
someone else did) that there are lots of different versions of
Fortran, though. The current standard is F95.

As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Mar 26 '07 #3
Chris Torek said:

<snip>
As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".
And of course the answer depends on the quality of the chocolate and
whether real strawberries are used. It's actually getting quite
difficult to find strawberry-*flavoured* ice cream in UK supermarkets,
let alone ice cream made with actual strawberries.

Using this reasoning, I am forced to the inevitable conclusion that C is
faster, except when it isn't.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Mar 26 '07 #4
On Mar 26, 2:06 pm, Richard Heathfield <r...@see.sig.invalidwrote:
Chris Torek said:

<snip>
As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

And of course the answer depends on the quality of the chocolate and
whether real strawberries are used. It's actually getting quite
difficult to find strawberry-*flavoured* ice cream in UK supermarkets,
let alone ice cream made with actual strawberries.
I work 3 miles away from Ben & Jerry's original factory. They'll have
their annual Free Cone Day in April or May. Care to taste some
Strawberry Cheesecake Icecream? You can also visit Canada, it's two
hours away :-)).
>
Using this reasoning, I am forced to the inevitable conclusion that C is
faster, except when it isn't.
GSL is a nice topping...

--
Ioan - Ciprian Tandau
tandau _at_ freeshell _dot_ org (hope it's not too late)
(... and that it still works...)

Mar 26 '07 #5
On Mar 26, 6:15 am, "Noma...@gmail.com" <Noma...@gmail.comwrote:
Hi
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??
The two main competitors here are C++ and Fortran. Fortran has been
discussed elsethread.
See also (for instance):
http://portal.acm.org/citation.cfm?id=615718
http://www.oonumerics.org/oon/

Of course, you can do high performance work in other languages and C
is not an exception.
Mar 26 '07 #6

"user923005" <dc*****@connx.comwrote in message
news:11**********************@p77g2000hsh.googlegr oups.com...
On Mar 26, 6:15 am, "Noma...@gmail.com" <Noma...@gmail.comwrote:
>Hi
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??

The two main competitors here are C++ and Fortran. Fortran has been
discussed elsethread.
See also (for instance):
http://portal.acm.org/citation.cfm?id=615718
http://www.oonumerics.org/oon/

Of course, you can do high performance work in other languages and C
is not an exception.
Those are both c++ links. The time you lose there is not during execution,
but development. Their standard is significantly thornier than is C's.
--
LS
Mar 26 '07 #7
<No*****@gmail.comwrote in message
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??
It is a lot easier to build an optimising Fortran 77 compiler than an
optimising C compiler, largely because of the aliasing problem, but also
because call by reference eliminates a lot of overhead.

I have spent far too much of my time translating Fortan 77 programs to C,
because although the Fortran will crunch the numbers faster, it is much
harder to make algorithmic improvements. Then of course I have nearly twenty
years' experince with C as opposed to one and a half with Fortran.

However if you are going to do serious physics-based programming you will
have to learn Fortran some time. You can't always choose the language to
program in, for lots of reasons, technical and social.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Mar 26 '07 #8

"Richard Heathfield" <rj*@see.sig.invalidwrote in message
news:A7******************************@bt.com...
Chris Torek said:

<snip>
>As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".
The question would seem to be regional. Chris Torek, IIRC, lives in
cream-based Valley, with Cream of Weber and Aggie Ice Cream.
And of course the answer depends on the quality of the chocolate and
whether real strawberries are used. It's actually getting quite
difficult to find strawberry-*flavoured* ice cream in UK supermarkets,
let alone ice cream made with actual strawberries.
You can't overlook the portability of Italian ice cream in Europe. It is
not cream-based and, in many persons' informed opinion, tastier. As a
consequence, these people ingest fewer calories and are therefore quicker to
the street vendors than their American counterparts.
Using this reasoning, I am forced to the inevitable conclusion that C is
faster, except when it isn't.
How would I put "ice cream" in an array, run through the letters as chars,
and have an integer tab to represent their total?
--
LS
Mar 26 '07 #9
In article <eu*********@news1.newsguy.com>,
Chris Torek <no****@torek.netwrote:
>As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".
But the answer to that is obviously "yes", so I don't see how it
applies.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Mar 26 '07 #10
"No*****@gmail.com" <No*****@gmail.comwrote:
# Hi
# I've been thinking of learning Fortran as number crunching kinda
# language for my Physics degree......but then looking around the
# internet, people are saying that the libraries/ Algorithms once used
# for number crunching is now slowly converting into C, so do you think
# I should stick with C, since I know C already, or should I proceed
# learning fortran??

Fortran is designed to simplify optimisers; it has more potential
for running faster. What the compiler actually does is another matter.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
Title does not dictate behaviour.
Mar 26 '07 #11
On Mar 26, 12:27 pm, "Lane Straatman" <inva...@invalid.netwrote:
"user923005" <dcor...@connx.comwrote in message

news:11**********************@p77g2000hsh.googlegr oups.com...
On Mar 26, 6:15 am, "Noma...@gmail.com" <Noma...@gmail.comwrote:
Hi
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??
The two main competitors here are C++ and Fortran. Fortran has been
discussed elsethread.
See also (for instance):
http://portal.acm.org/citation.cfm?id=615718
http://www.oonumerics.org/oon/
Of course, you can do high performance work in other languages and C
is not an exception.

Those are both c++ links. The time you lose there is not during execution,
but development. Their standard is significantly thornier than is C's.
That is true, of course. However, I am not aware of any recent C work
for high performance numeric computation. Hence my comment "The two
main competitors here are C++ and Fortran."

A bugaboo when porting from Fortran to C (and vice versa):
The row/column access order that is optimal in one language is not in
the other.
So if you port a program from one language to another, there are
always lots of manual steps if you want it to run fast when you are
done.
Mar 26 '07 #12
user923005 wrote:
On Mar 26, 12:27 pm, "Lane Straatman" <inva...@invalid.netwrote:
That is true, of course. However, I am not aware of any recent C work
for high performance numeric computation.
Did you ever hear of C99? Most of the performance features are
implemented in current compilers. Much as I hate to admit it, this puts
C at least even with Fortran in respect of availability of compilers
supporting latest standard features.
>
A bugaboo when porting from Fortran to C (and vice versa):
The row/column access order that is optimal in one language is not in
the other.
So if you port a program from one language to another, there are
always lots of manual steps if you want it to run fast when you are
done.

Back in the early days of C, there were plenty of automatic tools, both
commercial and open source. f2c may be as antiquated as C89, but it
avoids performance snafus on translation of Fortran array syntax.
Mar 27 '07 #13
Richard Tobin wrote:
Chris Torek <no****@torek.netwrote:
>As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

But the answer to that is obviously "yes", so I don't see how it
applies.
You forgot to bang your yes.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Mar 27 '07 #14
Nelu wrote:
Richard Heathfield <r...@see.sig.invalidwrote:
>Chris Torek said:

<snip>
>>As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

And of course the answer depends on the quality of the chocolate
and whether real strawberries are used. It's actually getting
quite difficult to find strawberry-*flavoured* ice cream in UK
supermarkets, let alone ice cream made with actual strawberries.

I work 3 miles away from Ben & Jerry's original factory. They'll
have their annual Free Cone Day in April or May. Care to taste
Strawberry Cheesecake Icecream? You can also visit Canada, it's
two hours away :-)).
Do it soon. Bush's inane passport moves are going to virtually cut
off international travel unless somebody rapidly gains sanity.
This will probably also reduce the viability of the Burlington
economic area. It's going to play hell with Calais ME.

(Note for other nationalities - the areas involved are Burlington
VT. USA and Montreal PQ Canada. All accessible by ship, Atlantic,
St. Lawrence, and Richelieu/Lake Champlain for Burlington.)

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Mar 27 '07 #15
"Malcolm McLean" <re*******@btinternet.comwrote in message
news:FL*********************@bt.com...
<No*****@gmail.comwrote in message
>I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??
....
However if you are going to do serious physics-based programming
you will have to learn Fortran some time. You can't always choose the
language to program in, for lots of reasons, technical and social.
There's the rub. If you're in the math or physics communities, you're not
going to get very far without at least _learning_ FORTAN. That's what a lot
of the existing code (which you'll want to modify or reuse) is going to be
in, and it may be all many of your associates know. The academic world
thrives on sharing with your peers, and if you aren't using the same
language as the people you need to share with, you're going to have a harder
time solving your problems.

When in Rome, do as the Romans do.

(Note that the above has absolutely nothing to do with technical reasons to
use one or the other or their relative speeds. You're probably better off
asking in comp.lang.fortran for that kind of stuff.)

S

--
Stephen Sprunk "Those people who think they know everything
CCIE #3723 are a great annoyance to those of us who do."
K5SSS --Isaac Asimov
--
Posted via a free Usenet account from http://www.teranews.com

Mar 27 '07 #16

"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
Nelu wrote:
>Richard Heathfield <r...@see.sig.invalidwrote:
>>Chris Torek said:

<snip>

As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

And of course the answer depends on the quality of the chocolate
and whether real strawberries are used. It's actually getting
quite difficult to find strawberry-*flavoured* ice cream in UK
supermarkets, let alone ice cream made with actual strawberries.

I work 3 miles away from Ben & Jerry's original factory. They'll
have their annual Free Cone Day in April or May. Care to taste
Strawberry Cheesecake Icecream? You can also visit Canada, it's
two hours away :-)).

Do it soon. Bush's inane passport moves are going to virtually cut
off international travel unless somebody rapidly gains sanity.
This will probably also reduce the viability of the Burlington
economic area. It's going to play hell with Calais ME.

(Note for other nationalities - the areas involved are Burlington
VT. USA and Montreal PQ Canada. All accessible by ship, Atlantic,
St. Lawrence, and Richelieu/Lake Champlain for Burlington.)
The Bush policy is solid in this respect. What I wouldn't give to see our
secretary of state affirm this instead of prattle on with her Pollyanna Iraq
crappola.
--
LS
Mar 27 '07 #17
On Mar 26, 8:26 pm, CBFalconer <cbfalco...@yahoo.comwrote:
Nelu wrote:
Richard Heathfield <r...@see.sig.invalidwrote:
Chris Torek said:
<snip>
>As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".
And of course the answer depends on the quality of the chocolate
and whether real strawberries are used. It's actually getting
quite difficult to find strawberry-*flavoured* ice cream in UK
supermarkets, let alone ice cream made with actual strawberries.
I work 3 miles away from Ben & Jerry's original factory. They'll
have their annual Free Cone Day in April or May. Care to taste
Strawberry Cheesecake Icecream? You can also visit Canada, it's
two hours away :-)).

Do it soon. Bush's inane passport moves are going to virtually cut
off international travel unless somebody rapidly gains sanity.
This will probably also reduce the viability of the Burlington
economic area. It's going to play hell with Calais ME.

(Note for other nationalities - the areas involved are Burlington
VT. USA and Montreal PQ Canada. All accessible by ship, Atlantic,
St. Lawrence, and Richelieu/Lake Champlain for Burlington.)
I am not a US resident so I need a Canadian VISA on my passport
anyway... and it's up from $75 to $100 right now, I believe. But let
me know if you happen to visit the Stowe - Waterbury area.

.... and stop with the politics. Everyone is a liberal in VT and I'm
growing tired of it :-)) (Oh, it's also OT).
--
Ioan - Ciprian Tandau
tandau _at_ freeshell _dot_ org (hope it's not too late)
(... and that it still works...)

Mar 27 '07 #18
On Mar 26, 8:26 pm, CBFalconer <cbfalco...@yahoo.comwrote:
Nelu wrote:
Richard Heathfield <r...@see.sig.invalidwrote:
Chris Torek said:
<snip>
>As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".
And of course the answer depends on the quality of the chocolate
and whether real strawberries are used. It's actually getting
quite difficult to find strawberry-*flavoured* ice cream in UK
supermarkets, let alone ice cream made with actual strawberries.
I work 3 miles away from Ben & Jerry's original factory. They'll
have their annual Free Cone Day in April or May. Care to taste
Strawberry Cheesecake Icecream? You can also visit Canada, it's
two hours away :-)).

Do it soon. Bush's inane passport moves are going to virtually cut
off international travel unless somebody rapidly gains sanity.
This will probably also reduce the viability of the Burlington
economic area. It's going to play hell with Calais ME.

(Note for other nationalities - the areas involved are Burlington
VT. USA and Montreal PQ Canada. All accessible by ship, Atlantic,
St. Lawrence, and Richelieu/Lake Champlain for Burlington.)
I am not a US resident so I need a Canadian VISA on my passport
anyway... and it's up from $75 to $100 right now, I believe. But let
me know if you happen to visit the Stowe - Waterbury area.

.... and stop with the politics. Everyone is a liberal in VT and I'm
growing tired of it :-)) (Oh, it's also OT).
--
Ioan - Ciprian Tandau
tandau _at_ freeshell _dot_ org (hope it's not too late)
(... and that it still works...)

Mar 27 '07 #19
On Mon, 26 Mar 2007 19:26:38 -0500, CBFalconer <cb********@yahoo.com>
wrote:
>Nelu wrote:
>Richard Heathfield <r...@see.sig.invalidwrote:
>>Chris Torek said:

<snip>

As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".

And of course the answer depends on the quality of the chocolate
and whether real strawberries are used. It's actually getting
quite difficult to find strawberry-*flavoured* ice cream in UK
supermarkets, let alone ice cream made with actual strawberries.

I work 3 miles away from Ben & Jerry's original factory. They'll
have their annual Free Cone Day in April or May. Care to taste
Strawberry Cheesecake Icecream? You can also visit Canada, it's
two hours away :-)).

Do it soon. Bush's inane passport moves are going to virtually cut
off international travel unless somebody rapidly gains sanity.
[OT}
Is there some reason you aren't eligible for a passport? IMO, this
should have been done years ago. BTW, Bush is also lobbying to add 22
more countries to the visa waiver program, which certainly won't hurt
international travel.
[OT]
>This will probably also reduce the viability of the Burlington
economic area. It's going to play hell with Calais ME.

(Note for other nationalities - the areas involved are Burlington
VT. USA and Montreal PQ Canada. All accessible by ship, Atlantic,
St. Lawrence, and Richelieu/Lake Champlain for Burlington.)

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
--
Al Balmer
Sun City, AZ
Mar 27 '07 #20
No*****@gmail.com <No*****@gmail.comwrote:
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??
Any advice??
I think you can quickly learn enough Fortran at a superficial level
to do some number crunching. Just be good and use "implicit none".

Since you already know C, I would say: write your first project in C
but learn enough Fortran to translate it. Make sure the answers
are the same, then race the two and post the results here :-)

Seriously. Fortran compilers can optimize more aggressively than
C because the language semantics are different. C99 plugs this
gap (mostly) with the "restrict" qualifier, but I don't know how
that plays out in practice and I would love to see data.
--
pa at panix dot com
Mar 27 '07 #21
On Mar 27, 11:01 am, "Nelu" <tandaui...@gmail.comwrote:
On Mar 26, 8:26 pm, CBFalconer <cbfalco...@yahoo.comwrote:
Nelu wrote:
Richard Heathfield <r...@see.sig.invalidwrote:
>Chris Torek said:
><snip>
>>As for the question in the "subject" line, "is C faster than
>>fortran?": the answer is the same as the answer to the question,
>>"is chocolate ice cream tastier than strawberry?".
>And of course the answer depends on the quality of the chocolate
>and whether real strawberries are used. It's actually getting
>quite difficult to find strawberry-*flavoured* ice cream in UK
>supermarkets, let alone ice cream made with actual strawberries.
I work 3 miles away from Ben & Jerry's original factory. They'll
have their annual Free Cone Day in April or May. Care to taste
Strawberry Cheesecake Icecream? You can also visit Canada, it's
two hours away :-)).
Do it soon. Bush's inane passport moves are going to virtually cut
off international travel unless somebody rapidly gains sanity.
This will probably also reduce the viability of the Burlington
economic area. It's going to play hell with Calais ME.
(Note for other nationalities - the areas involved are Burlington
VT. USA and Montreal PQ Canada. All accessible by ship, Atlantic,
St. Lawrence, and Richelieu/Lake Champlain for Burlington.)

I am not a US resident so I need a Canadian VISA on my passport
anyway... and it's up from $75 to $100 right now, I believe. But let
me know if you happen to visit the Stowe - Waterbury area.

... and stop with the politics. Everyone is a liberal in VT and I'm
growing tired of it :-)) (Oh, it's also OT).
Sorry for the double post, I blame it on Google :-)

--
Ioan - Ciprian Tandau
tandau _at_ freeshell _dot_ org (hope it's not too late)
(... and that it still works...)
Mar 27 '07 #22
Nelu wrote:
[... snip unsnipped quote ...]
>
Sorry for the double post, I blame it on Google :-)
Well, triple, now that you posted it again with the apology. But
who's counting? :-)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h|
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>
Mar 27 '07 #23
Kenneth Brody said:
Nelu wrote:
[... snip unsnipped quote ...]
>>
Sorry for the double post, I blame it on Google :-)

Well, triple, now that you posted it again with the apology. But
who's counting? :-)
Apparently you are. :-)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Mar 27 '07 #24
On Mar 27, 8:41Â*am, p...@see.signature.invalid (Pierre Asselin) wrote:
Noma...@gmail.com <Noma...@gmail.comwrote:
I've been thinking of learning Fortran as number crunching kinda
language for my Physics degree......but then looking around the
internet, people are saying that the libraries/ Algorithms once used
for number crunching is now slowly converting into C, so do you think
I should stick with C, since I know C already, or should I proceed
learning fortran??
Any advice??

I think you can quickly learn enough Fortran at a superficial level
to do some number crunching. Â*Just be good and use "implicit none".

Since you already know C, I would say: write your first project in C
but learn enough Fortran to translate it. Â*Make sure the answers
are the same, then race the two and post the results here :-)

Seriously. Â*Fortran compilers can optimize more aggressively than
C because the language semantics are different. Â*C99 plugs this
gap (mostly) with the "restrict" qualifier, but I don't know how
that plays out in practice and I would love to see data.
Using:
http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90
Via:
dcorbit@DCORBIT64 /f/tmp
$ g95 -O3 -Wall test_fpu.f90
In file test_fpu.f90:83

91 FORMAT (A,I4,2('/',I2.2))
1
Warning (110): Label 91 at (1) defined but not used
In file test_fpu.f90:2293

INTEGER :: i , info , j , l , ncola , nrowa , nrowb
1
Warning (112): Variable 'ncola' at (1) is set but never used
test_fpu.f90: In function 'dtrmv_':
test_fpu.f90:3611: warning: 'kx' may be used uninitialized in this
function

dcorbit@DCORBIT64 /f/tmp
$ ./a
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 15.1 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 7.4 sec Err= 0.000000000000000
Test3 - Crout 2 (1001x1001) inverts 13.8 sec Err= 0.000000000000005
Test4 - Lapack 2 (1001x1001) inverts 10.0 sec Err= 0.000000000000417
total = 46.3 sec
dcorbit@DCORBIT64 /f/tmp
$ g95 --version
G95 (GCC 4.1.2 (g95 0.91!) Mar 21 2007)
Copyright (C) 2002-2005 Free Software Foundation, Inc.

G95 comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of G95
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
Using:
/*
* --------------------------------------------------------------
* TEST_FPU A number-crunching benchmark using matrix inversion.
* Implemented by: David Frank Da********@aol.com
* Gauss routine by: Tim Prince N8**@aol.com
* Crout routine by: James Van Buskirk to****@ix.netcom.com
* F90->C source by: Sergey N. Voronkov se**@ggd.nsu.ru
* Pointer exchange version by: Dieter Buerssner bu***@gmx.de
* --------------------------------------------------------------
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>

/*
* Compiling with NI = 1001 (default) generates pool(51,51,1000) =
20mb.
* If test system pages due to insufficient memory (disk i/o activity
* occurs), abort run and compile with NI = 200, benchmark will
adjust
* time x 5.
*/

#define NI 1001
#define NN 51
#define RK8 double
/* below are additional C routines supplied by translator */

void memflt()
{
fputs("Memory allocation error\n", stderr);
exit(EXIT_FAILURE);
}
void alloc_arrays(RK8 ** p[NI], RK8 *** a, RK8 *** b)
{
int i,
j;

for (i = 0; i < NI; i++) {
if ((p[i] = (RK8 **) malloc(NN * sizeof(RK8 *))) == NULL)
memflt();
for (j = 0; j < NN; j++)
if ((p[i][j] = (RK8 *) malloc(NN * sizeof(RK8))) == NULL)
memflt();
}
if ((*a = (RK8 **) malloc(NN * sizeof(RK8 *))) == NULL ||
(*b = (RK8 **) malloc(NN * sizeof(RK8 *))) == NULL)
memflt();
for (i = 0; i < NN; i++)
if (((*a)[i] = (RK8 *) malloc(NN * sizeof(RK8))) == NULL ||
((*b)[i] = (RK8 *) malloc(NN * sizeof(RK8))) == NULL)
memflt();
}
void random_number(RK8 ** pool[NI])
{
int i,
j,
k;

for (i = 0; i < NI; i++)
for (j = 0; j < NN; j++)
for (k = 0; k < NN; k++)
pool[i][j][k] = (RK8) (rand()) / RAND_MAX;
}
RK8
timesec()
{
return (RK8) (clock()) / CLOCKS_PER_SEC;
}
/* prototype the invert functions that follow exec source */
void Gauss(RK8 ** a, RK8 ** b, int n);
void Crout(RK8 ** a, RK8 ** b, int n);
int rgaussi(RK8 ** a, RK8 ** b, int n);

int main()
{

RK8 **pool[NI]; /* pool of matrices to invert */
RK8 **a,
**ai; /* working matrices use < 256k */
RK8 avg_err,
total_time,
time1;
int i,
j,
n;

char *revision = "01/10/98"; /* Gauss speedup mod */
char invert_id[3][8] =
{
"Gauss", "Crout", "Dieter"};

struct tm *ptm;
time_t crtime;
FILE *fp;

/* Begin by allocating matrix arrays */
alloc_arrays(pool, &a, &ai);

puts("Benchmark running, hopefully as only ACTIVE task");

if ((fp = fopen("test_fpc.dat", "w")) == NULL) {
fprintf(stderr, "Can't open output file!\n");
return EXIT_FAILURE;
}
crtime = time(NULL);
ptm = gmtime(&crtime);

fprintf(fp, "Date run = %2d/%2d/%2d\n",
ptm->tm_mon + 1, ptm->tm_mday, ptm->tm_year);

fputs("Pls supply info below, send to Da********@aol.com\n"
"Tester name/ net address = \n"
"CPU mfg/id/Mhz = \n"
"MEM/CACHE = \n"
"O.S. / COMPILER = \n"
"Additional comments = \n\n\n\n\n", fp);

fprintf(fp, "Results for %s revision using TEST_FPU.C \n",
revision);

srand(time(NULL)); /* set seed to random number based on time */
random_number(pool); /* fill pool with random data ( 0. -1. ) */

for (n = 0; n < 3; n++) { /* for Gauss,Crout algorithms */
time1 = timesec(); /* start benchmark n time */

for (i = 0; i < NI; i++) {
/* get next matrix to invert */
for (j = 0; j < NN; j++)
memcpy(a[j], pool[i][j], sizeof(RK8) * NN);

switch (n) {
case 0:
Gauss(a, ai, NN); /* invert a -ai ; destructs a */
Gauss(ai, a, NN); /* invert ai -a */
break;
case 1:
Crout(a, ai, NN); /* invert a -ai ; nondestructs a
*/
Crout(ai, a, NN); /* invert ai -a */
break;
case 2:
rgaussi(a, ai, NN); /* invert a -ai ; nondestructs a
*/
rgaussi(ai, a, NN); /* invert ai -a */
break;
}
}

total_time = timesec() - time1; /* = elapsed time sec. */

/* check accuracy last matrix invert. */
avg_err = 0;
for (i = 0; i < NN; i++)
for (j = 0; j < NN; j++)
avg_err += fabs(a[i][j] - pool[NI - 1][i][j]);

if (NI == 200)
fprintf(fp, "\n%s 5 x 200 x 2 inverts = %6.1f sec.\n",
invert_id[n], 5 * total_time);
else
fprintf(fp, "\n%s 1000 x 2 inverts = %6.1f sec.\n",
invert_id[n], total_time);

fputs("Accuracy of 2 computed numbers\n", fp);
fprintf(fp, "Original = %18.15f %18.15f\n",
pool[NI - 1][0][0], pool[NI - 1][NN - 1][NN - 1]);
fprintf(fp, "Computed = %18.15f %18.15f\n",
a[0][0], a[NN - 1][NN - 1]);
fprintf(fp, "Avg Err. = %18.15f\n", avg_err / (NN * NN));

} /* for Gauss,Crout algorithms */

puts("Results written to: TEST_FPC.DAT");

return EXIT_SUCCESS;
}
/*
* --------------------------------------
* Invert matrix a -b by Gauss method
* --------------------------------------
*/
void Gauss(RK8 ** a, RK8 ** b, int n)
{
RK8 d,
temp = 0,
c;
int i,
j,
k,
m,
nn,
*ipvt;

if ((ipvt = (int *) malloc(n * sizeof(int))) == NULL)
memflt();

nn = n;
for (i = 0; i < nn; i++)
ipvt[i] = i;

for (k = 0; k < nn; k++) {
temp = 0.;
m = k;
for (i = k; i < nn; i++) {
d = a[k][i];
if (fabs(d) temp) {
temp = fabs(d);
m = i;
}
}
if (m != k) {
j = ipvt[k];
ipvt[k] = ipvt[m];
ipvt[m] = j;
for (j = 0; j < nn; j++) {
temp = a[j][k];
a[j][k] = a[j][m];
a[j][m] = temp;
}
}
d = 1 / a[k][k];
for (j = 0; j < k; j++) {
c = a[j][k] * d;
for (i = 0; i < nn; i++)
a[j][i] -= a[k][i] * c;
a[j][k] = c;
}
for (j = k + 1; j < nn; j++) {
c = a[j][k] * d;
for (i = 0; i < nn; i++)
a[j][i] -= a[k][i] * c;
a[j][k] = c;
}
for (i = 0; i < nn; i++)
a[k][i] = -a[k][i] * d;
a[k][k] = d;
}

for (i = 0; i < nn; i++)
memcpy(b[ipvt[i]], a[i], sizeof(RK8) * nn);

free(ipvt);
}
/*
* --------------------------------------
* Invert matrix a -b by Crout method
* --------------------------------------
*/
void Crout(RK8 ** a, RK8 ** b, int n)
{
int i,
j; /* Current row & column */
int maxlok; /* Location of maximum pivot */
int *index; /* Partial pivot record */
RK8 *temp = 0,
the_max;
RK8 tmp,
*ptr;
RK8 *matr = 0;
int k,
ind,
ind2;

if ((index = (int *) malloc(n * sizeof(int))) == NULL ||
(temp = (RK8 *) malloc(n * sizeof(RK8))) == NULL ||
(matr = (RK8 *) malloc(n * n * sizeof(RK8))) == NULL)
memflt();

/* Initialize everything */

for (i = 0; i < n; i++)
index[i] = i;

/* Shuffle matrix */
for (j = 0; j < n; j++) {
for (i = 0; i < j; i++)
b[j][i] = a[j][i];
for (i = j; i < n; i++)
b[j][i] = a[i - j][n - j - 1];
}

/* LU decomposition; reciprocals of diagonal elements in L matrix */
for (j = 0; j < n; j++) {
/* Get current column of L matrix */
for (i = j; i < n; i++) {
tmp = 0;
ind = n - i - 1;
for (k = 0; k < j; k++)
tmp += b[ind][ind + k] * b[j][k];
b[ind][ind + j] -= tmp;
}
maxlok = 0;
the_max = fabs(b[0][j]);
for (i = 1; i < n - j; i++)
if (fabs(b[i][j + i]) >= the_max) {
the_max = fabs(b[i][j + i]);
maxlok = i;
}
/* det = det*b(j+maxlok-1,maxlok) */
b[maxlok][j + maxlok] = 1 / b[maxlok][j + maxlok];

/* Swap biggest element to current pivot position */
if (maxlok + 1 != n - j) {
ind = n - maxlok - 1;
ind2 = index[j];
index[j] = index[ind];
index[ind] = ind2;
for (k = n - maxlok; k < n; k++) {
tmp = b[k][j];
b[k][j] = b[k][ind];
b[k][ind] = tmp;
}
memcpy(temp, &(b[maxlok][maxlok]), sizeof(RK8) * (n -
maxlok));
ptr = &(b[n - j - 1][n - j - 1]);
memmove(&(b[maxlok][maxlok]), ptr, sizeof(RK8) * (j + 1));
for (k = j + 1; k < n - maxlok; k++)
b[maxlok][maxlok + k] = b[k][j];
memcpy(ptr, temp, (j + 1) * sizeof(RK8));
for (k = j + 1; k < n - maxlok; k++)
b[k][j] = temp[k];
}
/* Get current row of U matrix */
ind = n - j - 1;
for (i = j + 1; i < n; i++) {
tmp = 0.;
for (k = 0; k < j; k++)
tmp += b[i][k] * b[ind][ind + k];
b[i][j] = b[ind][n - 1] * (b[i][j] - tmp);
}
} /* END DO LU_outer */

/* Invert L matrix */
for (j = 0; j < n - 1; j++) {
temp[0] = b[n - j - 1][n - 1];
for (i = j + 1; i < n; i++) {
ind = n - i - 1;
tmp = 0.;
for (k = 0; k < i - j; k++)
tmp += temp[k] * b[ind][ind + j + k];
b[ind][ind + j] = -tmp * b[ind][n - 1];
temp[i - j] = b[ind][ind + j];
}
}

/* Reshuffle matrix */
for (i = 0; i < (n + 1) / 3; i++) {
memcpy(temp, &(b[i][2 * (i + 1) - 1]), sizeof(RK8) * (n + 2 -
3 * (i +1)));
for (j = 2 * (i + 1) - 1; j < n - i; j++)
b[i][j] = b[n - j - 1][n - j + i - 1];
ind = n - i - 1;
for (j = i; j < n + 1 - 2 * (i + 1); j++)
b[j][i + j] = b[n - i - j - 1][ind];
for (k = 0; k < n + 2 - 3 * (i + 1); k++)
b[i + 1 + k][ind] = temp[k];
}

/* Invert U matrix */
for (i = 0; i < n - 1; i++) {
for (j = i + 1; j < n; j++) {
tmp = 0.;
for (k = 0; k < j - i - 1; k++)
tmp += temp[k] * b[j][i + 1 + k];
b[j][i] = -b[j][i] - tmp;
temp[j - i - 1] = b[j][i];
}
}

/* Multiply inverses in reverse order */
for (i = 0; i < n - 1; i++) {
for (k = 0; k < n - i - 1; k++)
temp[k] = b[i + 1 + k][i];
for (j = 0; j <= i; j++) {
tmp = 0.;
for (k = 0; k < n - i - 1; k++)
tmp += temp[k] * b[j][i + 1 + k];
b[j][i] += tmp;
}
for (j = i + 1; j < n; j++) {
tmp = 0.;
for (k = j; k < n; k++)
tmp += temp[k - i - 1] * b[j][k];
b[j][i] = tmp;
}
}

/* Straighten out the columns of the result */
for (i = 0; i < n; i++)
memcpy(matr + n * i, b[i], sizeof(RK8) * n);
for (i = 0; i < n; i++)
memcpy(b[index[i]], matr + n * i, sizeof(RK8) * n);

free(index);
free(temp);
free(matr);
}

/*
** This routine is due to bu***@gmx.de (Dieter Buerssner)
** Destroys a, return 0: success, 1: zero pivot, 2: out of mem.
*/
int rgaussi(RK8 ** a, RK8 ** b, int n)
{
int i,
j,
k,
maxj,
t;
RK8 maxel,
pivinv,
tmaxel,
aji;
RK8 *tp,
*ai,
*aj;
/* C99: int ipiv[n]; */
int *ipiv = malloc(n * sizeof *ipiv);

if (ipiv == NULL)
return 2;
for (i = 0; i < n; i++)
ipiv[i] = i;
for (i = 0; i < n; i++) {
maxj = -1;
maxel = 0.0;
/* find pivot element */
for (j = i; j < n; j++)
if ((tmaxel = fabs(a[j][i])) maxel) {
maxj = j;
maxel = tmaxel;
}
if (maxj < 0) {
free(ipiv);
return 1;
}
/* exchange rows */
if (maxj != i) {
/* just exchange pointers for a */
tp = a[maxj];
a[maxj] = a[i];
a[i] = tp;
t = ipiv[maxj];
ipiv[maxj] = ipiv[i];
ipiv[i] = t;
}
ai = a[i];
pivinv = 1.0 / ai[i];
ai[i] = 1.0;
for (k = 0; k < n; k++)
ai[k] *= pivinv;
for (j = 0; j < n; j++) {
if (j != i) {
aj = a[j];
aji = aj[i];
aj[i] = 0.0;
for (k = 0; k < n; k++)
aj[k] -= aji * ai[k];
}
}
}
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
b[i][ipiv[j]] = a[i][j];
free(ipiv);
return 0;
}

via:
CL /Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /
D "_UNICODE" /D "UNICODE" /FD /MD /Zp16 /fp:fast /Fo"Release\\" /
Fd"Release\vc80.pdb" /W4 /nologo /c /Wp64 /Zi /Gr /TP /wd4996 /
errorReport:prompt

Date run = 3/27/107
Pls supply info below, send to Da********@aol.com
Tester name/ net address = {Dann Corbit/dc*****@connx.com}
CPU mfg/id/Mhz =
CPU Identification utility v1.11 (c) 1997-2005 Jan
Steunebrink

────────────────┠€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€ ────────────────┠€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
CPU Vendor and Model: AMD Athlon 64 2800+-3700+
Internal CPU speed : 2199.4 MHz (using internal Time Stamp Counter)
Clock Multiplier : Available only in Real Mode!
CPU-ID Vendor string: AuthenticAMD
CPU-ID Name string : AMD Athlon(tm) 64 Processor 3400+
CPU-ID Signature : 0F4A
│││└─ Stepping or sub-model no.
││└─ Model: Indicates CPU Model and 486 L1
cache mode
│└─ Family: 4=486, Am5x86,Cx5x86
│ 5=Pentium, Nx586, Cx6x86, K5/K6,
C6, mP6
│ 6=PentiumPro/II/III, CxMII/III,
Athlon, C3
│ F=Pentium4, Athlon64
└─ Type: 0=Standard, 1=Overdrive, 2=2nd Dual
Pentium
Current CPU mode : Protected
Internal (L1) cache : Enabled in Write-Back mode
MEM/CACHE = 2GB physical RAM
O.S. / COMPILER = Windows 2003 / Microsoft (R) 32-bit C/C++ Optimizing
Compiler Version 14.00.50727.42 for 80x86
Additional comments =


Results for 01/10/98 revision using TEST_FPU.C

Gauss 1000 x 2 inverts = 0.4 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528701 0.364574114200264
Avg Err. = 0.000000000000001

Crout 1000 x 2 inverts = 0.7 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528701 0.364574114200266
Avg Err. = 0.000000000000003

Dieter 1000 x 2 inverts = 0.4 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528703 0.364574114200264
Avg Err. = 0.000000000000002

We need a Fortran Guru to show me what I am doing wrong, because there
is no feasible explanation for this other than I do not know how to
get performance out of my Fortran compiler.

Mar 27 '07 #25
Al Balmer wrote:
CBFalconer <cb********@yahoo.comwrote:
.... snip ...
>
>Do it soon. Bush's inane passport moves are going to virtually
cut off international travel unless somebody rapidly gains sanity.
[OT}
Is there some reason you aren't eligible for a passport? IMO, this
should have been done years ago. BTW, Bush is also lobbying to add
22 more countries to the visa waiver program, which certainly won't
hurt international travel.
[OT]
It's just another means of increasing the bureaucracy size and
raising taxes. No practical benefits to anyone or anything. Many
detriments.

During WWII we derided the practice of carrying papers in
dictatorial realms.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Mar 27 '07 #26
user923005 wrote:
On Mar 27, 8:41 am, p...@see.signature.invalid (Pierre Asselin) wrote:
(snip)
>>Seriously. Fortran compilers can optimize more aggressively than
C because the language semantics are different. C99 plugs this
gap (mostly) with the "restrict" qualifier, but I don't know how
that plays out in practice and I would love to see data.
(snip)
We need a Fortran Guru to show me what I am doing wrong, because there
is no feasible explanation for this other than I do not know how to
get performance out of my Fortran compiler.
That is what they say, and maybe on the average it is true.

There are too many variables to make it true in general.
First, the programs should be as similar as possible, and also
the data. In the (snipped) programs, the C version uses arrays of
pointers to arrays of pointers to arrays of pointers, which is not
likely true for Fortran, and might make a difference either way.

The programs are doing matrix inversion of random numbers, which
are generated using system dependent generators. There may be some
data dependence to the timing of matrix inversion, especially
if the hardware timing is different for different data.

Optimization is probably more sensitive to the number of registers
available than to the ability to rearrange expressions.
Do both Fortran and C programs have the array subscripts in the
optimal order? Most compilers won't change that.

-- glen

Mar 28 '07 #27
We need a Fortran Guru to show me what I am doing wrong, because there
is no feasible explanation for this other than I do not know how to
get performance out of my Fortran compiler.
Perhaps. I see you specified a whole bunch of options for the MSVSC++
compiler (out of those I only recall that the /G are for code
generation, /O are presumably optimizers), while you just specify -O3
for g95. Additional options for g95 (as well as other compilers) may
include -ffast-math (equivalent to /fp:fast for VC++ ?), -fomit-frame-
pointer, -march=athlon (or something similar), -funroll-loops.
g95 has almost all optimizations turned off by default.
Even if you do so, do not expect miracles. Two things need to be
noted:
the Fortran code may not be written with speed in mind. Note that
array arithmetic has to create a temporary array if necessary. It is
up to the compiler to determine whether or not it is necessary.
Especially tough might be tricks like this (snipped from TEST_FPU.f90)
b((/m,k/),:) = b((/k,m/),:)

although the swapping here can be done with one temp variable, this is
probably going to create a 2xsize(b,2) temporary array, copy the
righthand side to it, the copy it to the lefthand side. A more speed-
friendly coding could look like this:

elemental subroutine swap(a,b)
real,intent(inout)::a,b
real:: tmp
tmp = a; a = b; b = tmp
end subroutine
....
call swap(b(m,:),b(k,:))

or the good old DO loop.

Moreover, it is important to note that g95 isn't the top optimizing
compiler out there; to be honest, it may well be the bottom. It was
not created with aggresive optimizations in mind - and especially
optimizing of complicated array expressions is a tough task. I think
that gfortran (the newer versions) can perform better, but I would
consider more fair a comparison with a commercial Windows compiler,
like Intel Visual Fortran.
Still, it is hard to tell whether a C code is equivalent to some
Fortran code. I did not closely review your C code to see how close it
gets.

regards,
Jaroslav (not remotely a Fortran guru)

Mar 28 '07 #28
On Mar 27, 10:49 pm, "highegg" <high...@gmail.comwrote:
We need a Fortran Guru to show me what I am doing wrong, because there
is no feasible explanation for this other than I do not know how to
get performance out of my Fortran compiler.

Perhaps. I see you specified a whole bunch of options for the MSVSC++
compiler (out of those I only recall that the /G are for code
generation, /O are presumably optimizers), while you just specify -O3
for g95. Additional options for g95 (as well as other compilers) may
include -ffast-math (equivalent to /fp:fast for VC++ ?), -fomit-frame-
pointer, -march=athlon (or something similar), -funroll-loops.
g95 has almost all optimizations turned off by default.
Even if you do so, do not expect miracles. Two things need to be
noted:
I have an AMD machine, so I did like you suggest:
dcorbit@DCORBIT64 /f/tmp
$ g95 -O3 -Wall -ffast-math -fomit-frame-pointer -march=athlon -
funroll-loops
test_fpu.f90
In file test_fpu.f90:83

91 FORMAT (A,I4,2('/',I2.2))
1
Warning (110): Label 91 at (1) defined but not used
In file test_fpu.f90:2293

INTEGER :: i , info , j , l , ncola , nrowa , nrowb
1
Warning (112): Variable 'ncola' at (1) is set but never used
test_fpu.f90: In function 'dtrmv_':
test_fpu.f90:3611: warning: 'kx' may be used uninitialized in this
function

dcorbit@DCORBIT64 /f/tmp
$ ./a
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 12.7 sec Err= 0.000000000000003
Test2 - Crout 2000 (101x101) inverts 7.0 sec Err= 0.000000000000001
Test3 - Crout 2 (1001x1001) inverts 11.6 sec Err= 0.000000000000002
Test4 - Lapack 2 (1001x1001) inverts 8.0 sec Err= 0.000000000000643
total = 39.3 sec

the Fortran code may not be written with speed in mind. Note that
array arithmetic has to create a temporary array if necessary. It is
up to the compiler to determine whether or not it is necessary.
Especially tough might be tricks like this (snipped from TEST_FPU.f90)
b((/m,k/),:) = b((/k,m/),:)

although the swapping here can be done with one temp variable, this is
probably going to create a 2xsize(b,2) temporary array, copy the
righthand side to it, the copy it to the lefthand side. A more speed-
friendly coding could look like this:

elemental subroutine swap(a,b)
real,intent(inout)::a,b
real:: tmp
tmp = a; a = b; b = tmp
end subroutine
...
call swap(b(m,:),b(k,:))

or the good old DO loop.

Moreover, it is important to note that g95 isn't the top optimizing
compiler out there; to be honest, it may well be the bottom. It was
not created with aggresive optimizations in mind - and especially
optimizing of complicated array expressions is a tough task. I think
that gfortran (the newer versions) can perform better, but I would
consider more fair a comparison with a commercial Windows compiler,
like Intel Visual Fortran. Don't have that one.
gfortran barfed:
dcorbit@DCORBIT64 /f/tmp
$ gfortran -O3 -Wall test_fpu.f90
In file test_fpu.f90:83

91 FORMAT (A,I4,2('/',I2.2))
1
Warning: Label 91 at (1) defined but not used
test_fpu.f90: In function 'gauss':
test_fpu.f90:106: internal compiler error: in gfc_conv_ss_descriptor,
at fortran/trans-array.c:1235
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.htmlfor instructions.

dcorbit@DCORBIT64 /f/tmp
$ gfortran --version
GNU Fortran 95 (GCC 4.0.1 20050608 (prerelease))
Copyright (C) 2005 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

dcorbit@DCORBIT64 /f/tmp

Still, it is hard to tell whether a C code is equivalent to some
Fortran code. I did not closely review your C code to see how close it
gets.

regards,
Jaroslav (not remotely a Fortran guru)

Mar 28 '07 #29
"Nelu" <ta********@gmail.comwrote:
On Mar 26, 2:06 pm, Richard Heathfield <r...@see.sig.invalidwrote:
Chris Torek said:
As for the question in the "subject" line, "is C faster than
fortran?": the answer is the same as the answer to the question,
"is chocolate ice cream tastier than strawberry?".
And of course the answer depends on the quality of the chocolate and
whether real strawberries are used. It's actually getting quite
difficult to find strawberry-*flavoured* ice cream in UK supermarkets,
let alone ice cream made with actual strawberries.

I work 3 miles away from Ben & Jerry's original factory. They'll have
their annual Free Cone Day in April or May. Care to taste some
Strawberry Cheesecake Icecream? You can also visit Canada, it's two
hours away :-)).
I prefer real C and real ice cream. Luckily I have a real Italian
gelatero living in town, and a real C Standard printed by Wiley on my
desk.

Richard
Mar 28 '07 #30
"highegg" <hi*****@gmail.comwrote:
Moreover, it is important to note that g95 isn't the top optimizing
compiler out there; to be honest, it may well be the bottom.
That's been true for me -- g95 execution time is twice that of
Lahey LF95 on my codes, and Lahey is not the fastest ediest
Windows Fortran compiler.

(Need I say, I still consider g95 a wonderful tool and
outstanding accomplishment.)

--
Mike Prager, NOAA, Beaufort, NC
Address spam-trapped; remove color to reply.
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.
Mar 28 '07 #31
In comp.lang.c user923005 <dc*****@connx.comwrote:
Using:
http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90
Via:
dcorbit@DCORBIT64 /f/tmp
$ g95 -O3 -Wall test_fpu.f90
[ ... ]
dcorbit@DCORBIT64 /f/tmp
$ ./a
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 15.1 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 7.4 sec Err= 0.000000000000000
Test3 - Crout 2 (1001x1001) inverts 13.8 sec Err= 0.000000000000005
Test4 - Lapack 2 (1001x1001) inverts 10.0 sec Err= 0.000000000000417
total = 46.3 sec
These numbers are quite reasonable. At O(n^3) operations per matrix
inversion, they correspond to 136, 278, 145 and 201 megaflops. I
didn't do an accurate flop count, just 2000*101^3 ops for the first
two and 2*1001^3 for the last two.

Using:
[ similar C program omitted, except: ]
[ #define NI 1001 ] --number of repeats
[ #define NN 51 ] --matrix size
via:
CL /Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /
D "_UNICODE" /D "UNICODE" /FD /MD /Zp16 /fp:fast /Fo"Release\\" /
Fd"Release\vc80.pdb" /W4 /nologo /c /Wp64 /Zi /Gr /TP /wd4996 /
errorReport:prompt
Results for 01/10/98 revision using TEST_FPU.C
Gauss 1000 x 2 inverts = 0.4 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528701 0.364574114200264
Avg Err. = 0.000000000000001
Crout 1000 x 2 inverts = 0.7 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528701 0.364574114200266
Avg Err. = 0.000000000000003
Dieter 1000 x 2 inverts = 0.4 sec.
Accuracy of 2 computed numbers
Original = 0.753715628528703 0.364574114200262
Computed = 0.753715628528703 0.364574114200264
Avg Err. = 0.000000000000002
We need a Fortran Guru to show me what I am doing wrong, because there
is no feasible explanation for this other than I do not know how to
get performance out of my Fortran compiler.
From a cursory inspection of the C, these are smaller systems: 2002
inversions of 51x51 matrices compared to the 2000 101x101 and 2
1001x1001 of the Fortran example. The numbers correspond to ~332,
190 and 332 megaflops, but the accuracy may not be so good because
of the short times.

On the Fortran code, I get
gfortran -o fortran -O3 -funroll-loops TEST_FPU.f90
../fortran
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 6.6 sec Err= 0.000000000000003
Test2 - Crout 2000 (101x101) inverts 7.1 sec Err= 0.000000000000000
Test3 - Crout 2 (1001x1001) inverts 22.9 sec Err= 0.000000000000001
Test4 - Lapack 2 (1001x1001) inverts 20.8 sec Err= 0.000000000000273
total = 57.3 sec

Approx 312, 290, 87.6 and 96.4 megaflops (on a little laptop, must
be spilling out of cache).

I translated Gauss() to C by hand but not the other two (especially
not Lapack() !). I'll append my code below. I basically transliterated
the algorithm, using C99 VLA's instead of arrays of pointers. I
also transposed all the matrices so the memory access patterns
would be similar in C and in Fortran; this is ok mathematically,
matrix inversion commutes with transposition. I added a unit test,
not shown, to check that multiplying a matrix by the alleged inverse
returned by Gauss() yields the identity matrix.

I get:

cc -o C -std=c99 -O3 -funroll-loops -Wall TEST_FPU.c
../C
Test1 - Gauss 2000 (101x101) inverts 6.415025 sec Err=0.000000
--rest not implemented--

so comparable to Fortran. Maybe a smidge faster, in fact.

Here is the C version of Gauss(), followed by the original
Fortran. Oh, and the "restrict" keyword doesn't make one
bit of difference.

************************************************** ********************

static inline int isamax(int n, double v[n])
{
int i, imax;
double tmp, max;
for(i= 1, imax= 0, max= fabs(v[0]); i<n; i++) {
tmp= fabs(v[i]);
if(tmp>max) { max= tmp; imax= i; }
}
return imax;
}

static void Gauss(int n, double a[n][n])
{
double (* restrict b)[n]= malloc(n*sizeof(b[0]));
double * restrict temp= malloc(n*sizeof(temp[0]));
int * restrict piv= malloc(n*sizeof(piv[0]));
double c, d;
int i, j, k, m, jmax;

memcpy(b, a, n*sizeof(b[0]));
for(i= 0; i<n; i++) piv[i]= i;

for(k= 0; k<n; k++) {
/* largest element in b[k][k:n] */
jmax= k+isamax(n-k, &b[k][k]);

if(jmax != k) {
/* swap columns k and jmax */
m= piv[k]; piv[k]= piv[jmax]; piv[jmax]= m;
for(i= 0; i<n; i++) {
c= b[i][k];
b[i][k]= b[i][jmax];
b[i][jmax]= c;
}
}

d= 1./b[k][k];

memcpy(temp, b[k], sizeof(b[k]));
for(i= 0; i<n; i++) {
c= b[i][k]*d;
for(j= 0; j<n; j++) {
b[i][j]-= temp[j]*c;
}
b[i][k]= c;
}
for(j= 0; j<n; j++) b[k][j]= -d*temp[j];
b[k][k]= d;
}

for(i= 0; i<n; i++) for(j= 0; j<n; j++) {
a[piv[i]][j]= b[i][j];
}

free(piv); free(temp); free(b);
}

************************************************** ********************

MODULE kinds
INTEGER, PARAMETER :: RK8 = SELECTED_REAL_KIND(15, 300)
END MODULE kinds

SUBROUTINE Gauss (a,n) ! Invert matrix by Gauss method
! --------------------------------------------------------------------
USE kinds
IMPLICIT NONE

INTEGER :: n
REAL(RK8) :: a(n,n)

! - - - Local Variables - - -
REAL(RK8) :: b(n,n), c, d, temp(n)
INTEGER :: i, j, k, m, imax(1), ipvt(n)
! - - - - - - - - - - - - - -
b = a
ipvt = (/ (i, i = 1, n) /)

DO k = 1,n
imax = MAXLOC(ABS(b(k:n,k)))
m = k-1+imax(1)

IF (m /= k) THEN
ipvt( (/m,k/) ) = ipvt( (/k,m/) )
b((/m,k/),:) = b((/k,m/),:)
END IF
d = 1/b(k,k)

temp = b(:,k)
DO j = 1, n
c = b(k,j)*d
b(:,j) = b(:,j)-temp*c
b(k,j) = c
END DO
b(:,k) = temp*(-d)
b(k,k) = d
END DO
a(:,ipvt) = b

END SUBROUTINE Gauss

************************************************** ********************
--
pa at panix dot com
Mar 30 '07 #32
On 30 mrt, 06:00, p...@see.signature.invalid (Pierre Asselin) wrote:
In comp.lang.c user923005 <dcor...@connx.comwrote:
Using:
http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90
I used the above program with two different compilers on one Linux
machine and got the following results:

% ifort -o test_fpu test_fpu.f90
% test_fpu
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 3.1 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 4.9 sec Err= 0.000000000000003
Test3 - Crout 2 (1001x1001) inverts 9.6 sec Err= 0.000000000000073
Test4 - Lapack 2 (1001x1001) inverts 6.6 sec Err= 0.000000000000537
total = 24.2 sec

and:
% g95 -o test_fpu test_fpu.f90
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 56.5 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 21.1 sec Err= 0.000000000000001
Test3 - Crout 2 (1001x1001) inverts 13.1 sec Err= 0.000000000000001
Test4 - Lapack 2 (1001x1001) inverts 34.2 sec Err= 0.000000000000329
total =124.9 sec

That is a factor of 6 in the total time!

When I tried the same with optimisation for speed:

ifort: -O2 -ipo -static (derived from -fast, conservative on
optimisation level)
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 3.1 sec Err= 0.000000000000030
Test2 - Crout 2000 (101x101) inverts 4.7 sec Err= 0.000000000000069
Test3 - Crout 2 (1001x1001) inverts 9.9 sec Err= 0.000000000000381
Test4 - Lapack 2 (1001x1001) inverts 16.6 sec Err= 0.000000000002275
total = 34.4 sec

(Slower, surprisingly, a second run showed 24.9 total)

g95: -O3 (I did not want to examine all the details of the
optimisation)
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 36.8 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 19.6 sec Err= 0.000000000000000
Test3 - Crout 2 (1001x1001) inverts 23.2 sec Err= 0.000000000000009
Test4 - Lapack 2 (1001x1001) inverts 13.1 sec Err= 0.000000000000822
total = 92.7 sec

In this case we gain about 25% speed - not bad.

So, quite apart from the question whether C is faster or not than
Fortran
there is a very wide spread in what compilers are able to do!

That simply adds another factor to consider.

Regards,

Arjen

Mar 30 '07 #33
On 30 mrt, 09:51, "Arjen Markus" <arjen.mar...@wldelft.nlwrote:
On 30 mrt, 06:00, p...@see.signature.invalid (Pierre Asselin) wrote:
In comp.lang.c user923005 <dcor...@connx.comwrote:
Using:
>http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90

I used the above program with two different compilers on one Linux
machine and got the following results:

% ifort -o test_fpu test_fpu.f90
I tried this on a Windows XP machine too, but the Intel Fortran
compiler version terminated in case 3 with a stack overflow.
I did specify a stack of 16 million bytes (via the /F option),
but that did not work. Should I use an even larger stack?

Regards,

Arjen

Mar 30 '07 #34
On Mar 30, 6:23 am, "Arjen Markus" <arjen.mar...@wldelft.nlwrote:
On 30 mrt, 09:51, "Arjen Markus" <arjen.mar...@wldelft.nlwrote:
On 30 mrt, 06:00, p...@see.signature.invalid (Pierre Asselin) wrote:
In comp.lang.c user923005 <dcor...@connx.comwrote:
Using:
http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90
I used the above program with two different compilers on one Linux
machine and got the following results:
% ifort -o test_fpu test_fpu.f90

I tried this on a Windows XP machine too, but the Intel Fortran
compiler version terminated in case 3 with a stack overflow.
I did specify a stack of 16 million bytes (via the /F option),
but that did not work. Should I use an even larger stack?
If I recall correctly, the arrays are about 20 MB.
I am kind of surprised that they are formulated as automatic
variables.
In the C version, they are allocated using malloc(), so they do not
exhaust automatic storage.
Hence the C version does not need excessive stack requirements.

Mar 30 '07 #35
If I recall correctly, the arrays are about 20 MB.
I am kind of surprised that they are formulated as automatic
variables.
That's very common with current Fortran compilers. Automatic arrays
need to be allocated on subroutine entry and deallocated on subroutine
exit, thus they qualify for stack allocation, which is faster than
heap allocation. Compiler-generated temporaries usually also live on
the stack. Allocatable arrays normally go on heap. Some compilers also
offer options to specify the largest array that can be stack-
allocated.
In the C version, they are allocated using malloc(), so they do not
exhaust automatic storage.
Hence the C version does not need excessive stack requirements.
Neither would the Fortran version, if allocatable arrays were used
instead. However, it is a good general idea to increase stack for
Fortran programs - some compilers do that automatically.

Mar 30 '07 #36
In comp.lang.fortran Arjen Markus <ar**********@wldelft.nlwrote:
I used the above program with two different compilers on one Linux
machine and got the following results:
[ifort] total = 24.2 sec
[g95] total =124.9 sec
[ifort -O2 -ipo -static] total = 34.4 sec
(Slower, surprisingly, a second run showed 24.9 total)
[g95 -O3] total = 92.7 sec
So, quite apart from the question whether C is faster or not than
Fortran
there is a very wide spread in what compilers are able to do!
That simply adds another factor to consider.
Indeed. Good showing for Intel. I want to try ifort when
I have a chance.

For the piece that I translated I was surprized to see gcc edge
gfortran, since they use the same back-end. I suspect gfortran
had trouble optimizing the array operations; when I wrote the C
I had no choice but to expand them into loops. I want to try the
same thing on the Fortran side --but it will take a while before
I get to it.

I don't have ifc so I won't be able to do an Intel-Intel
comparison.

--
pa at panix dot com
Mar 31 '07 #37
Pierre Asselin wrote:
In comp.lang.fortran Arjen Markus <ar**********@wldelft.nlwrote:

>>I used the above program with two different compilers on one Linux
machine and got the following results:

>>[ifort] total = 24.2 sec
[g95] total =124.9 sec
[ifort -O2 -ipo -static] total = 34.4 sec
(Slower, surprisingly, a second run showed 24.9 total)
[g95 -O3] total = 92.7 sec

>>So, quite apart from the question whether C is faster or not than
Fortran
there is a very wide spread in what compilers are able to do!

>>That simply adds another factor to consider.


Indeed. Good showing for Intel. I want to try ifort when
I have a chance.
You might like to try the latest Sun Studio compiler on Linux. On am
AMD64 Solaris box, the performance is significantly faster than f95.

--
Ian Collins.
Mar 31 '07 #38
Pierre Asselin wrote:
In comp.lang.fortran Arjen Markus <ar**********@wldelft.nlwrote:
>I used the above program with two different compilers on one Linux
machine and got the following results:
>[ifort] total = 24.2 sec
[g95] total =124.9 sec
[ifort -O2 -ipo -static] total = 34.4 sec
(Slower, surprisingly, a second run showed 24.9 total)
[g95 -O3] total = 92.7 sec
>So, quite apart from the question whether C is faster or not than
Fortran
there is a very wide spread in what compilers are able to do!
>That simply adds another factor to consider.

Indeed. Good showing for Intel. I want to try ifort when
I have a chance.

For the piece that I translated I was surprized to see gcc edge
gfortran, since they use the same back-end. I suspect gfortran
had trouble optimizing the array operations; when I wrote the C
I had no choice but to expand them into loops. I want to try the
same thing on the Fortran side --but it will take a while before
I get to it.

I don't have ifc so I won't be able to do an Intel-Intel
comparison.
I suspect the difference in array addressing and its influence on
the memory cache might explain most of the differences.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Mar 31 '07 #39
On Mar 31, 4:43 pm, p...@see.signature.invalid (Pierre Asselin) wrote:
In comp.lang.fortran Arjen Markus <arjen.mar...@wldelft.nlwrote:
I used the above program with two different compilers on one Linux
machine and got the following results:
[ifort] total = 24.2 sec
[g95] total =124.9 sec
This comparison does not make sense: Ifort uses by default -O2
optimization and GCC (gfortran, g95, g77) uses -O0.
[ifort -O2 -ipo -static] total = 34.4 sec
(Slower, surprisingly, a second run showed 24.9 total)
[g95 -O3] total = 92.7 sec
For comparison (without thinking too much about the options; on x86-64
Linux):

gfortran [4.3] -march=opteron -ffast-math -funroll-loops -ftree-
vectorize -msse3 -O3 TEST_FPU.f90
total = 19.2 sec

ifort [9.1] -O3 -xW -ipo -static TEST_FPU.f90
total = 17.7 sec

sunf95 [8.3] -w4 -fast -xarch=amd64a -xipo=0 TEST_FPU.f90
total = 15.5 sec

NAG f95 [5.1/gcc 4.1.2] -O4 -ieee=full -Bstatic -march=opteron -ffast-
math -funroll-loops -ftree-vectorize -msse3 TEST_FPU.f90
total = 20.2 sec

g95 [0.91-Feb23/gcc 4.0.3] -march=opteron -ffast-math -funroll-loops -
ftree-vectorize -msse3 -O3 TEST_FPU.f90
total = 31.7 sec

Tobias

Mar 31 '07 #40
On 1 apr, 00:32, "Tobias Burnus" <bur...@net-b.dewrote:
On Mar 31, 4:43 pm, p...@see.signature.invalid (Pierre Asselin) wrote:
In comp.lang.fortran Arjen Markus <arjen.mar...@wldelft.nlwrote:
I used the above program with two different compilers on one Linux
machine and got the following results:
[ifort] total = 24.2 sec
[g95] total =124.9 sec

This comparison does not make sense: Ifort uses by default -O2
optimization and GCC (gfortran, g95, g77) uses -O0.
[ifort -O2 -ipo -static] total = 34.4 sec
(Slower, surprisingly, a second run showed 24.9 total)
[g95 -O3] total = 92.7 sec

For comparison (without thinking too much about the options; on x86-64
Linux):

gfortran [4.3] -march=opteron -ffast-math -funroll-loops -ftree-
vectorize -msse3 -O3 TEST_FPU.f90
total = 19.2 sec

ifort [9.1] -O3 -xW -ipo -static TEST_FPU.f90
total = 17.7 sec

sunf95 [8.3] -w4 -fast -xarch=amd64a -xipo=0 TEST_FPU.f90
total = 15.5 sec

NAG f95 [5.1/gcc 4.1.2] -O4 -ieee=full -Bstatic -march=opteron -ffast-
math -funroll-loops -ftree-vectorize -msse3 TEST_FPU.f90
total = 20.2 sec

g95 [0.91-Feb23/gcc 4.0.3] -march=opteron -ffast-math -funroll-loops -
ftree-vectorize -msse3 -O3 TEST_FPU.f90
total = 31.7 sec

Tobias

Oh, I was not aware of the default settings - that explains why there
was little
effect of the optimisation. I should try again on my Linux machine and
see
what happens with the above.

Regards,

Arjen
Apr 1 '07 #41
In comp.lang.fortran CBFalconer <cb********@yahoo.comwrote:
Pierre Asselin wrote:
For the piece that I translated I was surprized to see gcc edge
gfortran, since they use the same back-end.
I suspect the difference in array addressing and its influence on
the memory cache might explain most of the differences.
Maybe, but I transposed the arrays between Fortran and C,
so the access patterns ought to be similar. It was not
a big difference, mind you.
--
pa at panix dot com
Apr 2 '07 #42
CBFalconer wrote:
During WWII we derided the practice of carrying papers in
dictatorial realms.
We also respected habeas corpus, the Geneva Convention, Congress' right
to declare war, court authorization to spy on citizens, democratically
elected officials. Today all gone...bushit is right down there with adolph.
--
When a place gets crowded enough to require ID's, social collapse is not
far away.
-- Robert A. Heinlein

Apr 3 '07 #43
Michael Delaney <91*@bullitt.comwrites:
CBFalconer wrote:
>During WWII we derided the practice of carrying papers in
dictatorial realms.

We also respected habeas corpus, the Geneva Convention, Congress'
right to declare war, court authorization to spy on citizens,
democratically elected officials. Today all gone...bushit is right
down there with adolph.
This newsgroup discusses the C programming language, not US politics.
There are plenty of newsgroups where your post would be topical. This
clearly is not one of them. Are you a deliberate troll?

(I express no opinion on the content of your article, other than the
fact that it's off-topic.)

Followups set appropriately.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Apr 3 '07 #44
Richard Heathfield wrote: <snip OT>
Nelu wrote: <snip OT>
CBFalconer wrote: <snip OT>
Al Balmer wrote: <snip OT>
CBFalconer wrote: <snip provocative OT>
Michael Delaney wrote: <snip provocative OT>
Keith Thompson wrote: ...Are you a deliberate troll?

Some people might call that entrapment, Keith! ;)

--
Peter

Apr 4 '07 #45
On 30 Mar 2007 00:51:23 -0700, Arjen Markus wrote:
On 30 mrt, 06:00, p...@see.signature.invalid (Pierre Asselin) wrote:
>In comp.lang.c user923005 <dcor...@connx.comwrote:
Using:
http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90

I used the above program with two different compilers on one Linux
machine and got the following results:

% ifort -o test_fpu test_fpu.f90
% test_fpu
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 3.1 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 4.9 sec Err= 0.000000000000003
Test3 - Crout 2 (1001x1001) inverts 9.6 sec Err= 0.000000000000073
Test4 - Lapack 2 (1001x1001) inverts 6.6 sec Err= 0.000000000000537
total = 24.2 sec

and:
% g95 -o test_fpu test_fpu.f90
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 56.5 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 21.1 sec Err= 0.000000000000001
Test3 - Crout 2 (1001x1001) inverts 13.1 sec Err= 0.000000000000001
Test4 - Lapack 2 (1001x1001) inverts 34.2 sec Err= 0.000000000000329
total =124.9 sec

That is a factor of 6 in the total time!
Usually Intel Fortran uses a quite complete set of optimization options
by default. Gnu Fortran (gfortran or g95) dos not apply any special
optimization.

I don't have Intel Fortran installed, but the differences between
gfortran with and without optimzations goes as :

$ gfortran gfortran TEST_FPU.f90 -o TEST_FPU
$ ./TEST_FPU
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 18.4 sec Err= 0.000000000000006
Test2 - Crout 2000 (101x101) inverts 18.6 sec Err= 0.000000000000023
Test3 - Crout 2 (1001x1001) inverts 16.0 sec Err= 0.000000000000031
Test4 - Lapack 2 (1001x1001) inverts 23.9 sec Err= 0.000000000000250
total = 77.0 sec

$ gfortran -O2 -msse3 TEST_FPU.f90 -o TEST_FPU
$ ./TEST_FPU
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 3.9 sec Err= 0.000000000000006
Test2 - Crout 2000 (101x101) inverts 4.4 sec Err= 0.000000000000023
Test3 - Crout 2 (1001x1001) inverts 3.9 sec Err= 0.000000000000031
Test4 - Lapack 2 (1001x1001) inverts 3.9 sec Err= 0.000000000000250
total = 16.1 sec

It's about a factor of 5 from nooptimization to some optimization. I'm
pretty sure that this result can be improved.

By the way, it was run on a Core(TM)2 CPU T7200 @ 2.00GHz Laptop.
Roxo

--
---------------- Non luctari, ludare -------------------+ WYSIWYG Editor ?
Fernando M. Roxo da Motta <ro**@roxo.org | VI !!
Except where explicitly stated I speak on my own behalf.| I see text,
( Usuário Linux registrado #39505 ) | I get text !

Apr 7 '07 #46
Fernando M. Roxo da Motta wrote:
On 30 Mar 2007 00:51:23 -0700, Arjen Markus wrote:
>On 30 mrt, 06:00, p...@see.signature.invalid (Pierre Asselin) wrote:
>>In comp.lang.c user923005 <dcor...@connx.comwrote:

Using:
http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90
I used the above program with two different compilers on one Linux
machine and got the following results:

% ifort -o test_fpu test_fpu.f90
% test_fpu
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 3.1 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 4.9 sec Err= 0.000000000000003
Test3 - Crout 2 (1001x1001) inverts 9.6 sec Err= 0.000000000000073
Test4 - Lapack 2 (1001x1001) inverts 6.6 sec Err= 0.000000000000537
total = 24.2 sec

and:
% g95 -o test_fpu test_fpu.f90
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 56.5 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 21.1 sec Err= 0.000000000000001
Test3 - Crout 2 (1001x1001) inverts 13.1 sec Err= 0.000000000000001
Test4 - Lapack 2 (1001x1001) inverts 34.2 sec Err= 0.000000000000329
total =124.9 sec

That is a factor of 6 in the total time!

Usually Intel Fortran uses a quite complete set of optimization options
by default. Gnu Fortran (gfortran or g95) dos not apply any special
optimization.

I don't have Intel Fortran installed, but the differences between
gfortran with and without optimzations goes as :

$ gfortran gfortran TEST_FPU.f90 -o TEST_FPU
$ ./TEST_FPU
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 18.4 sec Err= 0.000000000000006
Test2 - Crout 2000 (101x101) inverts 18.6 sec Err= 0.000000000000023
Test3 - Crout 2 (1001x1001) inverts 16.0 sec Err= 0.000000000000031
Test4 - Lapack 2 (1001x1001) inverts 23.9 sec Err= 0.000000000000250
total = 77.0 sec

$ gfortran -O2 -msse3 TEST_FPU.f90 -o TEST_FPU
$ ./TEST_FPU
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 3.9 sec Err= 0.000000000000006
Test2 - Crout 2000 (101x101) inverts 4.4 sec Err= 0.000000000000023
Test3 - Crout 2 (1001x1001) inverts 3.9 sec Err= 0.000000000000031
Test4 - Lapack 2 (1001x1001) inverts 3.9 sec Err= 0.000000000000250
total = 16.1 sec

It's about a factor of 5 from nooptimization to some optimization. I'm
pretty sure that this result can be improved.

By the way, it was run on a Core(TM)2 CPU T7200 @ 2.00GHz Laptop.
Yes, I wonder why people don't use the usual vectorizing options at
least for a starting point:
ifort -xW
gfortran -O2 -ftree-vectorize -funroll-loops -msse2

gfortran -msse2 is redundant on 64-bit platforms.

ifort defaults are totally different between 32- and 64-bit targets.
I wonder also why people don't give more details about their platform,
including whether they are running 32- or 64-bit compilation.
Apr 8 '07 #47
On Sun, 08 Apr 2007 02:19:10 GMT, Tim Prince wrote:
Fernando M. Roxo da Motta wrote:
>On 30 Mar 2007 00:51:23 -0700, Arjen Markus wrote:
>>On 30 mrt, 06:00, p...@see.signature.invalid (Pierre Asselin) wrote:
In comp.lang.c user923005 <dcor...@connx.comwrote:

Using:
http://gcc.gnu.org/ml/fortran/2005-1...9/TEST_FPU.f90
I used the above program with two different compilers on one Linux
================================8<------------------------------
>$ gfortran -O2 -msse3 TEST_FPU.f90 -o TEST_FPU
$ ./TEST_FPU
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 3.9 sec Err= 0.000000000000006
Test2 - Crout 2000 (101x101) inverts 4.4 sec Err= 0.000000000000023
Test3 - Crout 2 (1001x1001) inverts 3.9 sec Err= 0.000000000000031
Test4 - Lapack 2 (1001x1001) inverts 3.9 sec Err= 0.000000000000250
total = 16.1 sec

It's about a factor of 5 from nooptimization to some optimization. I'm
pretty sure that this result can be improved.

By the way, it was run on a Core(TM)2 CPU T7200 @ 2.00GHz Laptop.

Yes, I wonder why people don't use the usual vectorizing options at
least for a starting point:
I agree completely, for some time almost all processors has some
vectorization support, so it should be the very first thing to look at.
ifort -xW
gfortran -O2 -ftree-vectorize -funroll-loops -msse2
Unroll can have an important impact too. Simply adding the unroll
option the timing above becomes :

$ gfortran -O2 -funroll-loops -msse2 TEST_FPU.f90 -o TEST_FPU
$ ./TEST_FPU
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 3.1 sec Err= 0.000000000000006
Test2 - Crout 2000 (101x101) inverts 3.7 sec Err= 0.000000000000023
Test3 - Crout 2 (1001x1001) inverts 3.5 sec Err= 0.000000000000031
Test4 - Lapack 2 (1001x1001) inverts 3.1 sec Err= 0.000000000000250
total = 13.5 sec
>
gfortran -msse2 is redundant on 64-bit platforms.
Why ? I was not aware of it. At least in newer Intel processors you
have the option of SSE3 vector processor.
>
ifort defaults are totally different between 32- and 64-bit targets.
I wonder also why people don't give more details about their platform,
including whether they are running 32- or 64-bit compilation.
For sure, any information about platform can be useful do figure out
what is going on.

Roxo

--
---------------- Non luctari, ludare -------------------+ WYSIWYG Editor ?
Fernando M. Roxo da Motta <ro**@roxo.org | VI !!
Except where explicitly stated I speak on my own behalf.| I see text,
( Usuário Linux registrado #39505 ) | I get text !

Apr 8 '07 #48
On Sun, 08 Apr 2007 18:42:23 +0200, Fernando M. Roxo da Motta wrote:

In advance, let me say "I can't help myself, mea culpa, I'm sorry". It's
been one of those days, I need to inject some humor somewhere. I'm also
a HUGE fan of Fortran - I use it every day - I don't particularly care
for C. What I'm about to say has probably been said in this thread
already and I just missed it. Having said that...

Of course C is faster faster the Fortran.
Fortran is from the dark ages (of computing) and
C is the speed of light.

Thanks, I feel better now. We now return you to your regularly scheduled
discussion.

--
+-------------------+---------------------------------------------------+
Bruce Bowler | I tried reality once, but it was too confusing
+-------------------+---------------------------------------------------+
Apr 9 '07 #49
Bruce Bowler wrote:
On Sun, 08 Apr 2007 18:42:23 +0200, Fernando M. Roxo da Motta wrote:
....
Of course C is faster faster the Fortran.
Fortran is from the dark ages (of computing) and
C is the speed of light.
Well, C is a case sensitive language (whose name is
upper-case). While the speed of light is c (lower case).
Physically, C (upper case) is a coulomb: a moderate
amount of electrical current. How that corresponds to
speed at all is questionable.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare
Apr 9 '07 #50

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

54
by: zhaoyandong | last post by:
In one of my interview, some people asked me why C is faster C++, and tell me to illustrate at least two reasons. I can't find the answer in the web. I'll appreciate any suggestion on this....
11
by: Stef Mientki | last post by:
hi All, instead of questions, my first success story: I converted my first MatLab algorithm into Python (using SciPy), and it not only works perfectly, but also runs much faster: MatLab:...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.