floating point over-/under-flow

Mark L Pappin

<puts on Compiler Vendor hat>

I've recently discovered that [at least two of] our compilers don't
make any attempt to handle floating point overflow in add/subtract/
multiply/divide, with the result that programmers who don't range-
check their data can e.g. multiply two very tiny values and end up
with a very large one. This is (AFAICT) quite fine according to the
Standard but is certainly a QoI issue and I'd like to raise ours.

I believe that it's the programmer's job to know what possible ranges
his values could take, and to check they're sane before performing
operations upon them. If this is done, then overflow and underflow
can never happen. Rarely is this done, of course, so the compromise
suggested by TPTB is to offer a safety net in the form of adding
inexpensive checks to the generated code which can propagate status
back to the user.

The approach I'm taking is to detect overflow or underflow and set a
flag (in the implementation namespace) as appropriate, but leave the
[invalid] value in the result. This way, if it really is important to
the user to eke that last bit (pun not intended) of information out of
the operation then the way is clear for them to do so. For example, a
multiplication which overflows will end up with a value pow(2,256)
smaller than the correct [unrepresentable] value, thanks to wraparound
of our 8-bit signed binary exponent, but the mantissa will still have
full precision - some recovery may be possible.

An alternative approach (since we use an IEEE-754-compatible
representation already) is to go the whole hog and implement
Infinities, NaNs, and Denormals. At this time we don't want to do this
because it appears to be a significant cost (see below) for not much
return. In any case, how to deal with those types of value in C is
still not defined, so any solution of this type would still be Not C.
I'm interested to hear the c.l.c set of viewpoints on silent vs. noisy
underflow and overflow - in this case, coercing values to 0 or Inf,
vs. leaving them as is and setting a user-checkable flag.
('Very-noisy' would be throwing an exception of some kind, also
undefined by the Standards.) What practices to you use to avoid
hitting overflow and underflow? Is our plan to let the dodgy value
continue to exist of any conceivable value?
For those who've read this far: we make cross-compilers for [often
tiny; <256 bytes of RAM is common] CPUs commonly used in embedded
systems, and we do our damnedest to make them meet C90 and are inching
toward C99 in places. Floating point has become popular, some
deficiencies have been found, and Muggins put his hand up to fix them.

mlp

Nov 14 '05 #1

Subscribe Reply

3783

Tim Prince

"Mark L Pappin" <ml*@acm.org> wrote in message
news:m3******** ****@Claudio.Me ssina...

<puts on Compiler Vendor hat>

I've recently discovered that [at least two of] our compilers don't
make any attempt to handle floating point overflow in add/subtract/
multiply/divide, with the result that programmers who don't range-
check their data can e.g. multiply two very tiny values and end up
with a very large one. .....
The approach I'm taking is to detect overflow or underflow and set a
flag (in the implementation namespace) as appropriate, but leave the
[invalid] value in the result. This way, if it really is important to
the user to eke that last bit (pun not intended) of information out of
the operation then the way is clear for them to do so. For example, a
multiplication which overflows will end up with a value pow(2,256)
smaller than the correct [unrepresentable] value, thanks to wraparound
of our 8-bit signed binary exponent, but the mantissa will still have
full precision - some recovery may be possible.
.... I'm interested to hear the c.l.c set of viewpoints on silent vs. noisy
underflow and overflow - in this case, coercing values to 0 or Inf,
vs. leaving them as is and setting a user-checkable flag.
('Very-noisy' would be throwing an exception of some kind, also
undefined by the Standards.) What practices to you use to avoid
hitting overflow and underflow? Is our plan to let the dodgy value
continue to exist of any conceivable value?

Long, long ago, before C, the usual practice for such hardware was to throw
an exception. If the application wished to continue, it had to provide an
exception handler. Thus, the application would make the choice whether to
set 0 or Inf and continue silently, or issue a diagnostic, or take a branch
to process the out of range. Continuing without fixing it up seems unlikely
to be useful. When IEEE-754 came in, the argument was this exception
handling is an unnecessary burden.

Nov 14 '05 #2

Kevin Bracey

In message <m3************ @Claudio.Messin a>
Mark L Pappin <ml*@acm.org> wrote:

An alternative approach (since we use an IEEE-754-compatible
representation already) is to go the whole hog and implement
Infinities, NaNs, and Denormals. At this time we don't want to do this
because it appears to be a significant cost (see below) for not much
return. In any case, how to deal with those types of value in C is
still not defined, so any solution of this type would still be Not C.
Actually, C99 does fully define how to deal with all those values for
IEEE754, in Annex F. Only "signalling NaNs" aren't covered.

The main problem with implementation of NaNs etc is the extra cost
of handling them as INPUTs to operations. All your operations will need
to be able to detect them and propagate them accordingly, otherwise there's
no point generating them in the first place. This can be expensive.

I'd be tempted to say that the basic operations should signal SIGFPE in the
event of an error (ie enable Invalid Operation, Divide By Zero and Overflow
traps, in IEEE754 terminology), and kill the program. This means NaNs and
Infinities can't get into the system. I've used such systems quite
extensively.

If you intend to just set a flag and continue execution, then I feel it's
important that some sort of indicator value be left in the result, rather
than just leaving it wrapped. If you can't manage "NaN" or "Inf", then at the
very least it should be "HUGE_VAL", as per the <math.h> functions.

As for denormals, flushing all tiny results to zero is a reasonable thing to
do if you can't implement denormals properly. Kahan wouldn't approve, as it
leaves a massive hole in the number line, but it's far better than your
current "tiny * tiny -> huge".

By the way, the overflow flags you're describing are defined in <fenv.h> in
C99 - there's no need for you to invent your own, I believe.

You may want to consider the FENV #pragmas. I'm not sure that FENV_ACCESS
will give you any leeway on shrinking your code size, but you should have a
look at that area. Otherwise some sort of compiler option or private #pragma
to suppress the extra code bloat of proper range checking will probably be in
order in your situation.
Floating point has become popular, some deficiencies have been found, and
Muggins put his hand up to fix them.

Well, this will be an important life lesson for you.

--
Kevin Bracey, Principal Software Engineer
Tematic Ltd Tel: +44 (0) 1223 503464
182-190 Newmarket Road Fax: +44 (0) 1728 727430
Cambridge, CB5 8HE, United Kingdom WWW: http://www.tematic.com/

Nov 14 '05 #3

Gordon Burditt

>I've recently discovered that [at least two of] our compilers don't

make any attempt to handle floating point overflow in add/subtract/
multiply/divide, with the result that programmers who don't range-
check their data can e.g. multiply two very tiny values and end up
with a very large one. This is (AFAICT) quite fine according to the
Standard but is certainly a QoI issue and I'd like to raise ours.
What floating point hardware / software allows you to multiply two
small floating-point numbers and end up with a huge one? The
implementations I know of either end up with zero or a small number.
IEEE-754 certainly requires that. You also can't multiply two large
numbers and end up with a small one on any hardware I know of.
Overflow sticks at either Inf or the largest possible value, or
traps. Now you can still end up with an answer that is a large
number of orders of magnitude wrong mathematically, but it won't
appear to be a small, reasonable result when it overflows.

I consider your hardware / software emulation broken, although ANSI
C doesn't.
I believe that it's the programmer's job to know what possible ranges
his values could take, and to check they're sane before performing
operations upon them.
The problem here is that individually reasonable values may produce
a collectively unreasonable result. For example, linear interpolation
using two points that are nearly identical. Roundoff error in
decimal conversion can also change a reasonable situation to an
unreasonable one (e.g. change division by zero, which you've handled,
to division by what was supposed to be zero but for roundoff error,
which you didn't because in a corner case it was more than expected).
If this is done, then overflow and underflow
can never happen. Rarely is this done, of course, so the compromise
suggested by TPTB is to offer a safety net in the form of adding
inexpensive checks to the generated code which can propagate status
back to the user. The approach I'm taking is to detect overflow or underflow and set a
flag (in the implementation namespace) as appropriate, but leave the
[invalid] value in the result.
I believe the appropriate action is to substitute +Inf, -Inf,
+DBL_MAX, or -DBL_MAX for overflow, or cause a trap.

Underflow to zero is problematical as it provides no clear way to
check for an error, but then it isn't always obvious when an underflow
*IS* an error. Is 0.10 - 0.10 underflowing to zero an error?
Or is it a case of "I had a dime and I spent it"?
This way, if it really is important to
the user to eke that last bit (pun not intended) of information out of
the operation then the way is clear for them to do so. For example, a
multiplicati on which overflows will end up with a value pow(2,256)
smaller than the correct [unrepresentable] value, thanks to wraparound
of our 8-bit signed binary exponent, but the mantissa will still have
full precision - some recovery may be possible.
Do you know if anyone is actually trying to do that kind of recovery?
It's very, very system-specific. How often do you deal in numbers
that are within 10 orders of magnitude of DBL_MAX? I consider it
dangerous.

An alternative approach (since we use an IEEE-754-compatible
representati on already) is to go the whole hog and implement
Infinities, NaNs, and Denormals.
Overflow to +/- DBL_MAX might cover most of the issues if the
whole IEEE implementation is too expensive.

At this time we don't want to do this
because it appears to be a significant cost (see below) for not much
return. In any case, how to deal with those types of value in C is
still not defined, so any solution of this type would still be Not C.
How costly is it to check for the binary exponent overflow
and overflow to +/- DBL_MAX and underflow to 0?
I'm interested to hear the c.l.c set of viewpoints on silent vs. noisy
underflow and overflow - in this case, coercing values to 0 or Inf,
vs. leaving them as is and setting a user-checkable flag.
('Very-noisy' would be throwing an exception of some kind, also
undefined by the Standards.) What practices to you use to avoid
hitting overflow and underflow?
In most cases, range-checking the result to within sane values is
sufficient, given overflow to +/- Inf or +/- DBL_MAX. Legitimate
values are usually smaller in absolute value than the 10th root of
DBL_MAX.

It would be really nasty having 1e+200 * 1e+200 end up being 1e+4
(two outrageous values ending up with a sane-looking product).
Is our plan to let the dodgy value
continue to exist of any conceivable value?
I think it is of negative value, unless someone really needs to
use the full range. How often do you deal with numbers like 1e+300?

If someone really wants to check this hidden flag, could they not
get the dodgy value from some other hidden place, where the
normal result is DBL_MAX?
For those who've read this far: we make cross-compilers for [often
tiny; <256 bytes of RAM is common] CPUs commonly used in embedded
systems, and we do our damnedest to make them meet C90 and are inching
toward C99 in places. Floating point has become popular, some
deficiencies have been found, and Muggins put his hand up to fix them.

Gordon L. Burditt

Nov 14 '05 #4

Similar topics

3648

small numerical differences in floating point result between wintel and Sun/SPARC

by: JS | last post by:

We have the same floating point intensive C++ program that runs on Windows on Intel chip and on Sun Solaris on SPARC chips. The program reads the exactly the same input files on the two platforms. However, they generate slightly different results for floating point numbers. Are they really supposed to generate exactly the same results? I guess so because both platforms are supposed to be IEEE floating point standard (754?) compliant. ...

by: cody | last post by:

no this is no trollposting and please don't get it wrong but iam very curious why people still use C instead of other languages especially C++. i heard people say C++ is slower than C but i can't believe that. in pieces of the application where speed really matters you can still use "normal" functions or even static methods which is basically the same. in C there arent the simplest things present like constants, each struct and enum...

C / C++

12670

floating point precision problem

by: syntax | last post by:

hi, i need to get high presion float numbers. say, i need pi = 22/7.0 = 3.142857....(upto 80 digits) is it possible ? does gcc/g++ compiler can give such type of high precision?? plz GIVE A SMALL CODE HOW CAN I ACHIEVE THAT ? which way, i have to

C / C++

3628

Floating point number to binary

by: Gaurav Verma | last post by:

Hi, I want to convert a floating point number (or a decimal number say 123.456) into binary notation using a C program. Can somebody help me out with it? Thanks Gaurav -- comp.lang.c.moderated - moderation address: clcm@plethora.net

C / C++

2575

Floating-point bit hacking: iterated nextafter() without loop?

by: Daniel Vallstrom | last post by:

I am having trouble with floating point addition because of the limited accuracy of floating types. Consider e.g. x0 += f(n) where x0 and f are of some floating type. Sometimes x0 is much larger than f(n) so that x0 + f(n) == x0. For example, x0 could be 2**300*(1.1234...) while f(n) is 2**100*(1.4256...). If x0 + f(n) == x0 I still sometimes want the addition of f(n) to x0 to have some effect on x0. A good solution seems to be to update...

C / C++

5422

truncating a floating type variable

by: VISHNU VARDHAN REDDY UNDYALA | last post by:

Hello, Can someone over here help me in truncating a float variable. I mean if PI=3.14159 ...How can I get to read the first two or first three decimal values with out rounding them. Any suggestions are appreciated. I am BEGINNER TO C PROGRAAMING. Thanks

C / C++

4116

/CLR floating point performance, inter-assembly function call performance

by: Bern McCarty | last post by:

I have run an experiment to try to learn some things about floating point performance in managed C++. I am using Visual Studio 2003. I was hoping to get a feel for whether or not it would make sense to punch out from managed code to native code (I was using IJW) in order to do some amount of floating point work and, if so, what that certain amount of floating point work was approximately. To attempt to do this I made a program that...

.NET Framework

1096

floating point inaccuracies

by: Peteroid | last post by:

Try putting this in your code: double x(.008) ; // 8/1000 double y(.064) ; // 64/1000 double z(x+y) ; // 72/1000 ? You'd think that z = .072. Nope, z = 0.072000000000000008. That seems REALLY inaccurate to me. Is there a floating-point package (3rd party) that boasts being more accurate AND is known to be reliable, simple to use, and hopefully somewhat cheap? Thanx in advance for any help...! : )

.NET Framework

2058

problem returning floating point

by: Peter | last post by:

I have written this small app to explain an issue I'm having with a larger program. In the following code I'm taking 10 ints from the keyboard. In the call to average() these 10 ints are then added and Divided by the number of ints to return the average, a float. Can someone see why I get for example: if the total is 227 then divided by 10 I get a return of 22.00000 instead of the correct answer 22.7.

C / C++

4143

To access the bytes of floating point number

by: avais | last post by:

I am transmitting a numer over a uart and reading the number in Matlab by using fread function. To transmit a number that occupies more than byte. I use and operation to access the individual bytes of floating point number and transmits these bytes separately For example, x & 0xff; gives least significand bytes of variable x Thhen fread combines individual bytes into a number. This method works well...

C / C++

7899

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8050

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8264

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6718

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

5850

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5438

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

2412

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1504

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

1250

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General