Computation slow with float than double. - Page 2

Michele Guidolin

Hello to everybody.

I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
dimensional grid of different size and type, I have some strange result
when I change the computation from double to float.

Here are the time of test with different grid SIZE and type:

SIZE 128 256 512

float 2.20s 2.76s 7.86s

double 2.30s 2.47s 2.59s

As you can see when the grid has a size of 256 node the code with float
type increase the time drastically.

What could be the problem? could be the cache? Should the float
computation always fastest than double?

Hope to receive an answer as soon as possible,
Thanks

Michele Guidolin.
P.S.

Here are some more information about the test:

The code that I'm testing is this and it is the same for the double
version (the constant are not 0.25f but 0.25).

------------- CODE -------------

float u[SIZE][SIZE];
float rhs[SIZE][SIZE];

inline void gs_relax(int i,int j)
{

u[i][j] = ( rhs[i][j] +
0.0f * u[i][j] +
0.25f* u[i+1][j]+
0.25f* u[i-1][j]+
0.25f* u[i][j+1]+
0.25f* u[i][j-1]);
}

void gs_step_fusion( )
{
int i,j;

/* update the red points:
*/

for(j=1; j<SIZE-1; j=j+2)
{
gs_relax(1,j);
}
for(i=2; i<SIZE-1; i++)
{
for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)
{
gs_relax(i,j);
gs_relax(i-1,j);
}

}
for(j=1; j<SIZE-1; j=j+2)
{
gs_relax(SIZE-2,j);
}

}
---------------CODE--------------

I'm testing this code on this machine:

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping : 1
cpu MHz : 3192.311
cache size : 1024 KB
physical id : 0
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 3
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
monitor ds_cpl cid
bogomips : 6324.22

with Hyper threading enable on Linux 2.6.8.

The compiler is gcc 3.4.4 and the flags are:
CFLAGS = -g -O2 -funroll-loops -msse2 -march=pentium4 -Wall

Nov 14 '05

Subscribe Reply

2726

Michele Guidolin

Lawrence Kirby wrote:

Moreover the time considerer only the loop itself and not other things,
like data initialization and print of result.

The new time test are:

SIZE 100 200 300 400 500 513
Float 2.17s 2.44s 3.35s 5.82s 8.37s 7.98s
Double 2.32s 2.34s 2.57s 2.63s 2.63s 2.65s

When I use a profiler it show me that the 95% of time is on this two
function:

for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)

What is i? Is this an inner loop?

{
gs_relax(i,j); // 45%
gs_relax(i-1,j); // 45%

This suggests that you need to look in gs_relax to see what is happening.

}
}
So I still doesn't understand why the float version is going so slowy.
Any help?

You have yet to show any code that accesses float or double data.

Lawrence

The gs_relax simply do a Gauss Seidel red black relaxion.
I already posted the code in the first message, but I post it again.
The double version is exactly the same (with the constant 0.25 and not
0.25f).

I realy don't understand why the float version is going so slowly whit a
SIZE > 300. Maybe gcc bug?
If someone has an idea will be very appreciate.
Thanks
Michele.

------------- CODE -------------

float u[SIZE][SIZE];
float rhs[SIZE][SIZE];

inline void gs_relax(int i,int j)
{

u[i][j] = ( rhs[i][j] +
0.0f * u[i][j] +
0.25f* u[i+1][j]+
0.25f* u[i-1][j]+
0.25f* u[i][j+1]+
0.25f* u[i][j-1]);
}

void gs_step_fusion( )
{
int i,j;

/* update the red points:
*/

for(j=1; j<SIZE-1; j=j+2)
{
gs_relax(1,j);
}
for(i=2; i<SIZE-1; i++)
{
for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)
{
gs_relax(i,j);
gs_relax(i-1,j);
}

}
for(j=1; j<SIZE-1; j=j+2)
{
gs_relax(SIZE-2,j);
}

}
---------------CODE--------------

Nov 14 '05 #11

Christian Bau

In article <ne************ ********@weblab .ucd.ie>,
Michele Guidolin <"michele dot guidolin at ucd dot ie"> wrote:

I realy don't understand why the float version is going so slowly whit a
SIZE > 300. Maybe gcc bug?

Is the "double" version at SIZE > 300 slow as well?

Slowness would be expected when things exceed cache size. 300x300 floats
would be 360,000 bytes. The "double" version should slow down a bit
earlier.

Nov 14 '05 #12

Dik T. Winter

In article <Qj************ ***@newssvr13.n ews.prodigy.com > "Tim Prince" <tp*****@nospam computer.org> writes:
....

When you divide your grid more finely, are you running into gradual
underflow?
That might very well be the case. In float that happens much earlier than
in double.
If so, what happens when you invoke abrupt underflow, as
gcc -O2 -funroll-loops -march=pentium4 -mfpmath=sse -ffast-math
might do?
Another option would be to shift the origin of the coordinate system.
Most compilers have gradual underflow on as a default, since it
is required according to IEEE standard, and turn it off either by a specific
option or as a part of some "fast" package.
Gradual underflow is quite slow on early P4 steppings, in case you didn't
believe this question could go far OFF TOPIC.

The main reason is that gradual underflow on most systems is not handled
by the processor, but by software. And that requires interrupts.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Nov 14 '05 #13

robert.thorpe

Michele Guidolin wrote:

Lawrence Kirby wrote:
Moreover the time considerer only the loop itself and not other things,
like data initialization and print of result.

The new time test are:

SIZE 100 200 300 400 500 513
Float 2.17s 2.44s 3.35s 5.82s 8.37s 7.98s
Double 2.32s 2.34s 2.57s 2.63s 2.63s 2.65s

When I use a profiler it show me that the 95% of time is on this two
function:

for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)

What is i? Is this an inner loop?

{
gs_relax(i,j); // 45%
gs_relax(i-1,j); // 45%

This suggests that you need to look in gs_relax to see what is happening.

}
}
So I still doesn't understand why the float version is going so slowy.
Any help?

You have yet to show any code that accesses float or double data.

Lawrence

The gs_relax simply do a Gauss Seidel red black relaxion.
I already posted the code in the first message, but I post it again.
The double version is exactly the same (with the constant 0.25 and not
0.25f).

I realy don't understand why the float version is going so slowly whit a
SIZE > 300. Maybe gcc bug?
If someone has an idea will be very appreciate.
Thanks
Michele.

------------- CODE -------------

float u[SIZE][SIZE];
float rhs[SIZE][SIZE];

inline void gs_relax(int i,int j)
{

u[i][j] = ( rhs[i][j] +
0.0f * u[i][j] +
0.25f* u[i+1][j]+
0.25f* u[i-1][j]+
0.25f* u[i][j+1]+
0.25f* u[i][j-1]);
}

void gs_step_fusion( )
{
int i,j;

/* update the red points:
*/

for(j=1; j<SIZE-1; j=j+2)
{
gs_relax(1,j);
}
for(i=2; i<SIZE-1; i++)
{
for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)
{
gs_relax(i,j);
gs_relax(i-1,j);
}

}
for(j=1; j<SIZE-1; j=j+2)
{
gs_relax(SIZE-2,j);
}

}
---------------CODE--------------

You may be much better asking this question on a gcc specific group
such as gnu.gcc.help. It may be an eccentricity of a specific GCC
version.

Also, do not test this function using all zeros in the arrays.
Floating point units often treat zero specially.

Nov 14 '05 #14

Similar topics

2596

Slow Python - what can be done?

by: Jason | last post by:

Hey, I'm an experience programmer but new to Python. I'm doing a simple implementation of a field morphing techinique due to Beier and Neely (1992) and I have the simple case working in Python 2.3 - but it's REALLY slow. Basically, you specify two directed line segments in the coordinate system of a raster image and use the difference between those two lines to transform the image.

Python

2650

float algorithm is slow

by: Wenfei | last post by:

float percentage; for (j = 0; j < 10000000; j++) { percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq ); buffer =ceilf(volume * percentage) + volume; totalBytes++; } Because the float variable, the above loop take 2 seconds in c or c++

C / C++

2335

float bug or bad implementation

by: Kubik | last post by:

Hi! Let's see, we got: float var=4.6f; //as we know 414/4.6 shoud be equal to 90 but Math.Ceiling(414/var) gives us 91 but (414/var).ToString() prints '90'.

C# / C Sharp

9527

Cube root computation

by: johnywalkyra | last post by:

Hello, first of all sorry for crossposting, but I could not decide which group is more appropriate. To my question: Recently I've came across the code in GCC standard library, which computes the cube root of a real (floating point) number. Could anyone explain me the math behind the computation? Here's the snippet: ---8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<--

C / C++

7616

user defined function that converts string to float

by: karthi | last post by:

hi, I need user defined function that converts string to float in c. since the library function atof and strtod occupies large space in my processor memory I can't use it in my code. regards, Karthi

C / C++

7239

float? double?

by: Erick-> | last post by:

hi all... I've readed some lines about the difference between float and double data types... but, in the real world, which is the best? when should we use float or double?? thanks Erick

C / C++

2160

scientific computation (more)

by: Wing | last post by:

Thanks for those who answered my question previously. Everytime I want to output high precision numbers, I use the following code: cout << setprecision (9) << f << endl; where f is some double number. However, I don't want to add "setprecision (9)" in every "cout" and

C / C++

3914

Multiply float by -1: fast or slow?

by: Chris Stankevitz | last post by:

Is this a fast way to invert a float: inline Invert(float& f) { f *= -1.0f; } I'd like the CPU to flip the sign bit (and not carry out a float-float multiplication). Please enlighten me! I'm going be performing a lot of these.

C / C++

3808

Direct computation of integer limits in K&R2?

by: santosh | last post by:

Hello all, In K&R2 one exercise asks the reader to compute and print the limits for the basic integer types. This is trivial for unsigned types. But is it possible for signed types without invoking undefined behaviour triggered by overflow? Remember that the constants in limits.h cannot be used.

C / C++

9579

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10332

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

9152

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7620

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6853

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5522

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5651

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3820

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2991

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General