473,548 Members | 2,598 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

C# (float) cast is costly for speed if not used appropriately

Folks,

We ran into a pretty significant performance penalty when casting floats.
We've identified a code workaround that we wanted to pass along but also was
wondering if others had experience with this and if there is a better
solution.

-jeff
.....
I'd like to share findings regarding C# (float) cast.

As we convert double to float, we found several slow down issues.
We realized C# (float) cast can be costly if not used appropriately.

------------------------------------------------------------
Slow cases
------------------------------------------------------------
(A)
private void someMath(float[] input, float[] output)
{
int length = input.Length;
for (int i = 0; i < length; i++)
{
output[i] = (float)Math.Log 10(input[i]); // <--- inline (float)
cast is slow!
}
}

(B)
private void Copy(double[] input, float[] output)
{
int length = input.Length;
for (int i = 0; i < length; i++)
{
output[i] = (float)input[i]; // <--- inline (float)
cast is slow!
}
}

In these examples, "inline" (float) casts are executed on the same line as
other operations
such as Math.Log10() or simple data fetch from input array.

These are slow. Even with Release build.
(A): It takes 3 to 6 % more than double[] case. ;-)
(B): It takes as twice(!) as double[] case. ;-)

In my understanding and articles on the Net, the slow down comes from
writing intermediate value
back to memory as follows. The extra trips are costly.
(A) CPU/FPU +--fetch --Math.Log10 --+ +--(float) --+
| | | |
| | | |
| V | V
memory input written back to heap output
Extra memory access!

(B) CPU/FPU +--fetch --+ +--(float) --+
| | | |
| | | |
| V | V
memory input written back to heap output
Extra memory access!

------------------------------------------------------------
Fast cases
------------------------------------------------------------

To avoid the extra memory access, we can use a temporary variable to store
the intermediate data.
The temporary variable is allocated in CPU register and we can keep the
speed fast.

(C)
private void someMath(float[] input, float[] output)
{
int length = input.Length;
for (int i = 0; i < length; i++)
{
double tmp = Math.Log10(inpu t[i]); // <-- store in a
temporary variable in CPU register
output[i] = (float)tmp; // <-- then (float) cast.
Fast!
}
}

(D)
private void Copy(double[] input, float[] output)
{
int length = input.Length;
for (int i = 0; i < length; i++)
{
double tmp = input[i]; // <-- store in a
temporary variable in CPU register
output[i] = (float)tmp; // <-- then (float) cast.
Fast!
}
}

In these improved versions, the intermediate data are not written back to
the memory.
The improved versions are actually slightly faster than the double[] case.
(C): 1% faster than double[] case. :)
(D): 3% faster than double[] case. :)

(C) CPU/FPU +--fetch --Math.Log10 --stays in -----(float) --+
| CPU register |
| Fast! |
| V
memory input
output

(D) CPU/FPU +--fetch --stays in -----(float) --+
| CPU register |
| Fast! |
| V
memory input output
OK, this is what we found from benchmarking and googling.

The same thing can be said for ArraySegment<fl oatarrays as well.
This is because the issue relates to float variables in the array, not the
array itself.

You would say this is .NET compiler optimization issue.
If you know optimization flags or anything that can fix this issue on
compiler side, please let us know.
That would be a great help!
(By the way, simple release build does not help.)

Otherwise, we will need to optimize our code by hand using temporary
variable technique as in the example.
Well, we have many instances of this kind of "inline" casts in our code.
Dec 31 '07 #1
3 10649
Arnie <je************ *****@msn.comwr ote:
We ran into a pretty significant performance penalty when casting floats.
To be honest, it doesn't really sound that significant to me. Read
on...
We've identified a code workaround that we wanted to pass along but also was
wondering if others had experience with this and if there is a better
solution.
<snip>
I'd like to share findings regarding C# (float) cast.

As we convert double to float, we found several slow down issues.
We realized C# (float) cast can be costly if not used appropriately.
<snip>
In my understanding and articles on the Net, the slow down comes from
writing intermediate value back to memory as follows. The extra trips
are costly.
I see no reason to believe that there's an extra value written to the
*heap* (rather than the stack), and no reason why the JIT shouldn't use
a register for the intermediate value without an explicit local
variable.

<snip>

I have included a short but complete program below which uses an array
of a million elements and iterates each method a thousand times. Here
are the results on my laptop:

Log10Fast: 64489ms
Log10Slow: 70420ms
CopyFast: 3841ms
CopySlow: 4070ms

So your optimisation improves things by about 10% for the Log10 case
and about 5% for the Copy case.
Otherwise, we will need to optimize our code by hand using temporary
variable technique as in the example.
Well, we have many instances of this kind of "inline" casts in our code.
And have you any reason to believe that's *actually* the bottleneck in
your code? Do you regularly convert a billion floats and care about
200ms of performance loss?

I don't understand why the results are as they are (it would be worth
looking at the JITted, optimised code to find out) - but even so, I
certainly wouldn't start micro-optimising all over the place. Find out
where the *actual* bottleneck in your code is, and consider reducing
readability/simplicity for the sake of performance just in the most
significant parts. Don't start doing it all over the place, which
sounds like the course of action you're considering at the moment.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Jan 1 '08 #2
Thanks for the feedback Jon.

This is a mature system where they are "ringing" out the last bit of
performance.

It is a scientific test insrument (spectrum analyzer) so they are acquiring
and converting extremely large chunks of data (wave forms). Some runs can
acquire as much as 500MB of data at a time.

So they have "progressed " to the point where they are looking at the right
optimization spots in their code. Casting from double[] is indeed 2x as slow
without the optimization and quite different then the 5-10% case you
demonstrated. Again the only thing they changed was assigning a local
variable hence their curiosity in what the C#/jit compiler is doing.

I think our premise is that given this single change .... it would seem that
the there would be no performance difference if the compiler were taking
advantage of every reasonable performance optimization.

Time to look at IL as see what is going on.

-jeff
"Jon Skeet [C# MVP]" <sk***@pobox.co mwrote in message
news:MP******** *************@m snews.microsoft .com...
Arnie <je************ *****@msn.comwr ote:
>We ran into a pretty significant performance penalty when casting floats.

To be honest, it doesn't really sound that significant to me. Read
on...
>We've identified a code workaround that we wanted to pass along but also
was
wondering if others had experience with this and if there is a better
solution.

<snip>
>I'd like to share findings regarding C# (float) cast.

As we convert double to float, we found several slow down issues.
We realized C# (float) cast can be costly if not used appropriately.

<snip>
>In my understanding and articles on the Net, the slow down comes from
writing intermediate value back to memory as follows. The extra trips
are costly.

I see no reason to believe that there's an extra value written to the
*heap* (rather than the stack), and no reason why the JIT shouldn't use
a register for the intermediate value without an explicit local
variable.

<snip>

I have included a short but complete program below which uses an array
of a million elements and iterates each method a thousand times. Here
are the results on my laptop:

Log10Fast: 64489ms
Log10Slow: 70420ms
CopyFast: 3841ms
CopySlow: 4070ms

So your optimisation improves things by about 10% for the Log10 case
and about 5% for the Copy case.
>Otherwise, we will need to optimize our code by hand using temporary
variable technique as in the example.
Well, we have many instances of this kind of "inline" casts in our code.

And have you any reason to believe that's *actually* the bottleneck in
your code? Do you regularly convert a billion floats and care about
200ms of performance loss?

I don't understand why the results are as they are (it would be worth
looking at the JITted, optimised code to find out) - but even so, I
certainly wouldn't start micro-optimising all over the place. Find out
where the *actual* bottleneck in your code is, and consider reducing
readability/simplicity for the sake of performance just in the most
significant parts. Don't start doing it all over the place, which
sounds like the course of action you're considering at the moment.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Jan 3 '08 #3
Arnie <je************ *****@msn.comwr ote:
This is a mature system where they are "ringing" out the last bit of
performance.
Hmm... it still sounds dubious to me. I doubt you'll really see
performance benefits which are significant in the context of the whole
app. Mind you, it sounds like you're not seeing the same behaviour as
me to start with, so hey...
It is a scientific test insrument (spectrum analyzer) so they are acquiring
and converting extremely large chunks of data (wave forms). Some runs can
acquire as much as 500MB of data at a time.
500MB isn't that much though, in the context of the tests I was doing -
it was using a billion points of data, which would be 8GB. The copy was
then only taking 4 seconds, and making the change only shaved off a
very small amount.
So they have "progressed " to the point where they are looking at the right
optimization spots in their code. Casting from double[] is indeed 2x as slow
without the optimization and quite different then the 5-10% case you
demonstrated.
So can you give a short but complete program which *does* demonstrate
the 2x difference?

Just as a thought, which CLR are you using? I'm on the 2.0, on x86. If
you're using 1.0, 1.1, or 2.0 on x64, that could account for some
differences.
Again the only thing they changed was assigning a local
variable hence their curiosity in what the C#/jit compiler is doing.

I think our premise is that given this single change .... it would seem that
the there would be no performance difference if the compiler were taking
advantage of every reasonable performance optimization.

Time to look at IL as see what is going on.
The IL doesn't show much. It's the optimised assembly you need to be
looking at, really. cordbg is your friend - but don't forget to tell it
to perform JIT optimisations. SOS may help too. I don't envy you...

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Jan 3 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
8755
by: Phil... | last post by:
Does anyone have any data comparing the speed of java double vs. float operations? According to a document I found on the Intel site, hardware support for 64 bit floating point was not added until the SSE2 Extensions on the Pentium IV and Xenon processors. I have the el cheapo Intel Celeron. (It reminds me of being on a diet and eating...
5
12602
by: Peter Scheurer | last post by:
Hi, we found some strange behavior when operating with floats and round(). The following simplified statement reproduces the problem. select 6.56 - round(convert(float, 6.56), 2) from sysusers where name = 'public'; =========== -8.88178419
0
1666
by: A. W. Dunstan | last post by:
I'm porting some code to Visual C++ and have run into a problem - the compiler won't use a user-written cast operator. The code uses an envelope-letter approach to passing (potentially) large pieces of data around, and requires that certain methods return an Envelope with a specific kind of Letter as it's content. I have a cast operator...
5
5502
by: Code4u | last post by:
In the course of writing numerical code I needed to convert a float to an int with a defined behavior: if the float is great than INT_MAX, set the int to INT_MAX, otherwise assign directly. The problem I ran into is a float with value INT_MAX assigned to an int results in the value -2147483648 being assigned, but if the conversion takes place...
54
8316
by: Andy | last post by:
Hi, I don't know if this is the correct group to post this, but when I multiply a huge floating point value by a really small (non-zero) floating point value, I get 0 (zero) for the result. This creates a big hole in a 32-bit timer routine I wrote. Questions. 1. Why does this happen? 2. Is there C macros/functions I can call to tell me...
19
2195
by: Jon Shemitz | last post by:
Is there a difference between a constant like "12.34f" and "(float) 12.34"? In principle, at least, the latter is a double constant being cast to a float; while the two both generate actual constants, does the latter ACTUALLY do a conversion at compile time? That is, are there constants where <constant>f != (float) <constant>
9
2359
by: Nicolas Blais | last post by:
Hi, I have this following class which I use as a timer: #include <sys/time.h> using namespace std; class chrono { public: chrono() {};
6
1496
by: toebens | last post by:
Hi, i read this http://msdn.microsoft.com/chats/transcripts/vstudio/05_0811_dn_csharp.aspx : Cyrusn_MS (Expert): Q: please explain the difference 1: Convert.ToInt32(o) 2: (int)o 3: o as int32 A: Rk: It all depends on what o's type is. Convert.ToInt32 will convert a
2
5331
by: Mike | last post by:
I'm running DB2 v7 for z/OS. When I use SPUFI, SELECT CAST(6.0 AS FLOAT)/CAST(10.0 AS FLOAT) FROM SYSIBM.SYSDUMMY1 returns 0.6000000000000000E+00. When I use DSNTIAUL,DSNTEP2, or DSNALI (call attach facility), the same statement returns 0.59999999999999999E 00. The only reason I$B!G(Bve heard to explain this behavior is that float stores...
0
7512
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7438
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7951
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7466
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7803
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6036
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
3495
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3475
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1926
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.