473,795 Members | 3,295 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Computation slow with float than double.

Hello to everybody.

I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
dimensional grid of different size and type, I have some strange result
when I change the computation from double to float.

Here are the time of test with different grid SIZE and type:

SIZE 128 256 512

float 2.20s 2.76s 7.86s

double 2.30s 2.47s 2.59s

As you can see when the grid has a size of 256 node the code with float
type increase the time drastically.

What could be the problem? could be the cache? Should the float
computation always fastest than double?

Hope to receive an answer as soon as possible,
Thanks

Michele Guidolin.
P.S.

Here are some more information about the test:

The code that I'm testing is this and it is the same for the double
version (the constant are not 0.25f but 0.25).

------------- CODE -------------

float u[SIZE][SIZE];
float rhs[SIZE][SIZE];

inline void gs_relax(int i,int j)
{

u[i][j] = ( rhs[i][j] +
0.0f * u[i][j] +
0.25f* u[i+1][j]+
0.25f* u[i-1][j]+
0.25f* u[i][j+1]+
0.25f* u[i][j-1]);
}

void gs_step_fusion( )
{
int i,j;

/* update the red points:
*/

for(j=1; j<SIZE-1; j=j+2)
{
gs_relax(1,j);
}
for(i=2; i<SIZE-1; i++)
{
for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)
{
gs_relax(i,j);
gs_relax(i-1,j);
}

}
for(j=1; j<SIZE-1; j=j+2)
{
gs_relax(SIZE-2,j);
}

}
---------------CODE--------------

I'm testing this code on this machine:

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping : 1
cpu MHz : 3192.311
cache size : 1024 KB
physical id : 0
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 3
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
monitor ds_cpl cid
bogomips : 6324.22

with Hyper threading enable on Linux 2.6.8.

The compiler is gcc 3.4.4 and the flags are:
CFLAGS = -g -O2 -funroll-loops -msse2 -march=pentium4 -Wall
Nov 14 '05 #1
13 2723
in comp.lang.c i read:
I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
dimensional grid As you can see when the grid has a size of 256 node the code with float
type increase the time drastically.

What could be the problem? could be the cache? Should the float
computation always fastest than double?


most likely your system does all floating point computations using a
precision greater than float, then reduces the result when the value must
be stored, which happens more often as you increase the size of the table.

--
a signature
Nov 14 '05 #2


Michele Guidolin wrote:
Hello to everybody.

I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
dimensional grid of different size and type, I have some strange result
when I change the computation from double to float.

Here are the time of test with different grid SIZE and type:

SIZE 128 256 512

float 2.20s 2.76s 7.86s

double 2.30s 2.47s 2.59s

As you can see when the grid has a size of 256 node the code with float
type increase the time drastically.
I see a modest increase at 256 and a huge increase at 512.
Have there been any transcription errors?

I also see that the code you didn't show probably accounts
for the lion's share of the running time, which casts suspicion
on drawing too many conclusions from a couple of experiments.
The running time of the posted code should increase (roughly)
as the square of SIZE, so changing SIZE from 128 to 512 should
inflate its running time by a factor of (about) sixteen. Yet
this supposed sixteen-fold increase added only 0.29 seconds to
the running time for "double;" a straightforward calculation
(based on data of unknown accuracy, to be sure) suggests that
the rest of the program accounts for 89% or more of the time
in that case, and even more in the other two.

... and if such a large portion of the total time resides
"elsewhere, " it would be unwise to draw too many conclusions
until the contributions of "elsewhere" are better characterized,
or better controlled for (e.g., by repeated experiment and
statistical analysis).
What could be the problem? could be the cache? Should the float
computation always fastest than double?


Cache might be a problem. So might alignment, or other
competing processes on the machine. If you're reading the
initial data from a file, perhaps one test paid the penalty of
actually reading from the disk while the others benefitted from
the file system's cache. Or maybe the disk is just beginning
to go sour, and the O/S relocated an entire track of data in
the middle of one test. Or maybe the phase of the moon wasn't
propitious.

Should float always be faster than double? No, the C language
Standard is silent on matters of speed (which makes the entire
discussion off-topic here, or at least slightly so). You've shown
some puzzling data, but you need more data and more analysis to
draw good conclusions, and the results you eventually get will
most likely be relevant only to the system you got them on, and
not to the C language. I'd suggest further experimentation , and
a change to a newsgroup devoted to your system, where the experts
on your system's quirks hang out.

--
Er*********@sun .com

Nov 14 '05 #3
Michele Guidolin wrote:
.... snip ...
Here are the time of test with different grid SIZE and type:

SIZE 128 256 512
float 2.20s 2.76s 7.86s
double 2.30s 2.47s 2.59s

As you can see when the grid has a size of 256 node the code with
float type increase the time drastically.

What could be the problem? could be the cache? Should the float
computation always fastest than double?


C real computations are always done as doubles by default. When
you specify floats you are primarily constricting the storage, and
are causing float->double->float conversions to be done. These are
eating up the time.

--
"If you want to post a followup via groups.google.c om, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
Nov 14 '05 #4
On Mon, 06 Jun 2005 20:54:56 +0000, CBFalconer wrote:
Michele Guidolin wrote:
... snip ...

Here are the time of test with different grid SIZE and type:

SIZE 128 256 512
float 2.20s 2.76s 7.86s
double 2.30s 2.47s 2.59s

As you can see when the grid has a size of 256 node the code with
float type increase the time drastically.

What could be the problem? could be the cache? Should the float
computation always fastest than double?


C real computations are always done as doubles by default.


That was true in K&R C, but not in standard C. An implementation CAN
perform calculations in greater precision than the representation of the
type but it is not required to.
When
you specify floats you are primarily constricting the storage, and
are causing float->double->float conversions to be done. These are
eating up the time.


Perhaps. But on common architectures it is typically the case that float
operations are performed using float precision or else loading/saving a
float sized object in memory to/from a wider register is no more expensing
than a double sized object in memory.

Lawrence



Nov 14 '05 #5


CBFalconer wrote:
Michele Guidolin wrote:

... snip ...
Here are the time of test with different grid SIZE and type:

SIZE 128 256 512
float 2.20s 2.76s 7.86s
double 2.30s 2.47s 2.59s

As you can see when the grid has a size of 256 node the code with
float type increase the time drastically.

What could be the problem? could be the cache? Should the float
computation always fastest than double?

C real computations are always done as doubles by default. When
you specify floats you are primarily constricting the storage, and
are causing float->double->float conversions to be done. These are
eating up the time.


That was true in pre-Standard days, but ever since
C89 the implementation has been allowed to use `float'
arithmetic when only `float' operands are involved. Not
all implementations do so (and I don't know whether the
O.P.'s does), but it's no longer a certainty that the
conversions are occurring. C99 6.3.1.8 or C89 3.2.1.5;
I don't have the section number for C90.

--
Er*********@sun .com

Nov 14 '05 #6
In article <ne************ ********@weblab .ucd.ie>,
Michele Guidolin <"michele dot guidolin at ucd dot ie"> wrote:
Hello to everybody.

I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
dimensional grid of different size and type, I have some strange result
when I change the computation from double to float.

Here are the time of test with different grid SIZE and type:

SIZE 128 256 512

float 2.20s 2.76s 7.86s

double 2.30s 2.47s 2.59s


As a rule of thumb: Accessing array elements at a distance that is a
large power of two is asking for trouble (performance wise).

Any reason why you choose powers of two? Why not SIZE = 50, 100, 200,
500?
Nov 14 '05 #7
Christian Bau wrote:

SIZE 128 256 512

float 2.20s 2.76s 7.86s

double 2.30s 2.47s 2.59s

As a rule of thumb: Accessing array elements at a distance that is a
large power of two is asking for trouble (performance wise).

Any reason why you choose powers of two? Why not SIZE = 50, 100, 200,
500?

OK! I tried some more test with different SIZE of grid, in the precedent
message I forgot to say that the number of loop is proportional of SIZE
of grid, but the different time between two different SIZE shouldn't be
considerate realy proportonial.

-------code ----
ITERATIONS = ((int)(pow(2.0, 28.0))/(pow((double)SI ZE,2.0)));

gettimeofday(&s ubmit_time, 0);

for(iter=0; iter<ITERATIONS ; iter++)
gs_step_fusion( );

gettimeofday(&c omplete_time, 0);
-------code -----

Moreover the time considerer only the loop itself and not other things,
like data initialization and print of result.

The new time test are:

SIZE 100 200 300 400 500 513
Float 2.17s 2.44s 3.35s 5.82s 8.37s 7.98s
Double 2.32s 2.34s 2.57s 2.63s 2.63s 2.65s

When I use a profiler it show me that the 95% of time is on this two
function:

for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)
{
gs_relax(i,j); // 45%
gs_relax(i-1,j); // 45%
}

So I still doesn't understand why the float version is going so slowy.
Any help?

Thaks for answer.

Michele Guidolin
Nov 14 '05 #8
On Tue, 07 Jun 2005 11:46:54 +0100, Michele Guidolin wrote:
Christian Bau wrote:

SIZE 128 256 512

float 2.20s 2.76s 7.86s

double 2.30s 2.47s 2.59s

As a rule of thumb: Accessing array elements at a distance that is a
large power of two is asking for trouble (performance wise).

Any reason why you choose powers of two? Why not SIZE = 50, 100, 200,
500?

OK! I tried some more test with different SIZE of grid, in the precedent
message I forgot to say that the number of loop is proportional of SIZE
of grid, but the different time between two different SIZE shouldn't be
considerate realy proportonial.

-------code ----
ITERATIONS = ((int)(pow(2.0, 28.0))/(pow((double)SI ZE,2.0)));


It is better to do integer calculations in integer arithmetic if you can.
Anso consider that C only requires int to be able to represent number in
the range -32767 to 32767. So you might use something like

ITERATIONS = (1L << 28) / ((long)SIZE * SIZE);
gettimeofday(&s ubmit_time, 0);
gettimeofday() isn't standard C. You can use the standard clock() function
to measure CPU time used.
for(iter=0; iter<ITERATIONS ; iter++)
gs_step_fusion( );

gettimeofday(&c omplete_time, 0);
-------code -----

Moreover the time considerer only the loop itself and not other things,
like data initialization and print of result.

The new time test are:

SIZE 100 200 300 400 500 513
Float 2.17s 2.44s 3.35s 5.82s 8.37s 7.98s
Double 2.32s 2.34s 2.57s 2.63s 2.63s 2.65s

When I use a profiler it show me that the 95% of time is on this two
function:

for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)
What is i? Is this an inner loop?
{
gs_relax(i,j); // 45%
gs_relax(i-1,j); // 45%
This suggests that you need to look in gs_relax to see what is happening.
}
}
So I still doesn't understand why the float version is going so slowy.
Any help?


You have yet to show any code that accesses float or double data.

Lawrence
Nov 14 '05 #9

"Michele Guidolin" <"michele dot guidolin at ucd dot ie"> wrote in message
news:ne******** ************@we blab.ucd.ie...
Christian Bau wrote:

SIZE 128 256 512

float 2.20s 2.76s 7.86s

double 2.30s 2.47s 2.59s

As a rule of thumb: Accessing array elements at a distance that is a
large power of two is asking for trouble (performance wise).

Any reason why you choose powers of two? Why not SIZE = 50, 100, 200,
500?

OK! I tried some more test with different SIZE of grid, in the precedent
message I forgot to say that the number of loop is proportional of SIZE
of grid, but the different time between two different SIZE shouldn't be
considerate realy proportonial.

-------code ----
ITERATIONS = ((int)(pow(2.0, 28.0))/(pow((double)SI ZE,2.0)));

gettimeofday(&s ubmit_time, 0);

for(iter=0; iter<ITERATIONS ; iter++)
gs_step_fusion( );

gettimeofday(&c omplete_time, 0);
-------code -----

Moreover the time considerer only the loop itself and not other things,
like data initialization and print of result.

The new time test are:

SIZE 100 200 300 400 500 513
Float 2.17s 2.44s 3.35s 5.82s 8.37s 7.98s
Double 2.32s 2.34s 2.57s 2.63s 2.63s 2.65s

When I use a profiler it show me that the 95% of time is on this two
function:

for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)
{
gs_relax(i,j); // 45%
gs_relax(i-1,j); // 45%
}

So I still doesn't understand why the float version is going so slowy.
Any help?

I didn't like to attempt an answer, as I wasn't certain whether your options
invoke SSE code generation. Several other answers seemed to imply that
people thought so, but weren't certain. Maybe attacking the problem more
directly makes it off topic for c.l.c, but I've already seen plenty of
answers which don't look like pure Standard C information.
When you divide your grid more finely, are you running into gradual
underflow? If so, what happens when you invoke abrupt underflow, as
gcc -O2 -funroll-loops -march=pentium4 -mfpmath=sse -ffast-math
might do? Most compilers have gradual underflow on as a default, since it
is required according to IEEE standard, and turn it off either by a specific
option or as a part of some "fast" package.
Gradual underflow is quite slow on early P4 steppings, in case you didn't
believe this question could go far OFF TOPIC.
Nov 14 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
2593
by: Jason | last post by:
Hey, I'm an experience programmer but new to Python. I'm doing a simple implementation of a field morphing techinique due to Beier and Neely (1992) and I have the simple case working in Python 2.3 - but it's REALLY slow. Basically, you specify two directed line segments in the coordinate system of a raster image and use the difference between those two lines to transform the image.
19
2649
by: Wenfei | last post by:
float percentage; for (j = 0; j < 10000000; j++) { percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq ); buffer =ceilf(volume * percentage) + volume; totalBytes++; } Because the float variable, the above loop take 2 seconds in c or c++
5
2333
by: Kubik | last post by:
Hi! Let's see, we got: float var=4.6f; //as we know 414/4.6 shoud be equal to 90 but Math.Ceiling(414/var) gives us 91 but (414/var).ToString() prints '90'.
2
9527
by: johnywalkyra | last post by:
Hello, first of all sorry for crossposting, but I could not decide which group is more appropriate. To my question: Recently I've came across the code in GCC standard library, which computes the cube root of a real (floating point) number. Could anyone explain me the math behind the computation? Here's the snippet: ---8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<--
6
7614
by: karthi | last post by:
hi, I need user defined function that converts string to float in c. since the library function atof and strtod occupies large space in my processor memory I can't use it in my code. regards, Karthi
60
7234
by: Erick-> | last post by:
hi all... I've readed some lines about the difference between float and double data types... but, in the real world, which is the best? when should we use float or double?? thanks Erick
1
2160
by: Wing | last post by:
Thanks for those who answered my question previously. Everytime I want to output high precision numbers, I use the following code: cout << setprecision (9) << f << endl; where f is some double number. However, I don't want to add "setprecision (9)" in every "cout" and
10
3914
by: Chris Stankevitz | last post by:
Is this a fast way to invert a float: inline Invert(float& f) { f *= -1.0f; } I'd like the CPU to flip the sign bit (and not carry out a float-float multiplication). Please enlighten me! I'm going be performing a lot of these.
88
3803
by: santosh | last post by:
Hello all, In K&R2 one exercise asks the reader to compute and print the limits for the basic integer types. This is trivial for unsigned types. But is it possible for signed types without invoking undefined behaviour triggered by overflow? Remember that the constants in limits.h cannot be used.
0
9672
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9519
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10436
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
7538
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6780
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5563
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4113
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3722
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2920
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.