By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,573 Members | 938 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,573 IT Pros & Developers. It's quick & easy.

How to determine which of the generic decimal datatypes to use.

P: n/a
John Bentley:
INTRO
The phrase "decimal number" within a programming context is ambiguous. It could
refer to the decimal datatype or the related but separate concept of a generic
decimal number. "Decimal Number" sometimes serves to distinguish Base 10
numbers, eg "15", from Base 2 numbers, Eg "1111". At other times "Decimal
Number" serves to differentiate a number from an integer. For the rest of this
post I shall only use either "Generic Decimal Number " or "decimal datatype" for
clarity.

DEFINTIONS
Generic Decimal Number: a base 10 number with a fractional part represented with
digits. A Generic Decimal Number may be implemented with any of several
datatypes including the decimal datatype, a double, a single, a string.

Decimal Datatype: the .Net (or other programming language) decimal datatype.

ISSUES
When programming with generic decimal numbers there are a few of key issues to
consider:

1 How to round a number.
2 How to determine which of the generic decimal datatypes to use: Single;
Double; or Decimal
3 Determine if two generic decimal numbers are equal.
4 Work with fractions that cannot be represented accurately as a generic decimal
numbers.

These are interrelated issues but for the moment I'm interested in 2.

Would you like to tell me the rules you use when deciding to use the floating
point datatypes (Single and Double) V the Fixed Point/Scaled Integer Datatype
(Decimal)? By all means address other related issues if it helps in the
answering of this question. Although I would be interested in answers that
relate to .NET specifially I'm interested more in the general
mathematical/computational ideas that govern the choice.
Jul 19 '05 #1
Share this Question
Share on Google+
17 Replies


P: n/a
John Bentley <no*****@nowhere.com> wrote:
These are interrelated issues but for the moment I'm interested in 2.

Would you like to tell me the rules you use when deciding to use the floating
point datatypes (Single and Double) V the Fixed Point/Scaled Integer Datatype
(Decimal)? By all means address other related issues if it helps in the
answering of this question. Although I would be interested in answers that
relate to .NET specifially I'm interested more in the general
mathematical/computational ideas that govern the choice.


http://www.pobox.com/~skeet/csharp/floatingpoint.html gives some of the
details of floating point numbers, and some suggestions as to when to
use what. It's not exactly what you were after, but hopefully you'll
find it useful.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Jul 19 '05 #2

P: n/a
> http://www.pobox.com/~skeet/csharp/floatingpoint.html gives some of
the details of floating point numbers, and some suggestions as to
when to
use what. It's not exactly what you were after, but hopefully you'll
find it useful.


Jon, that is exactly the type of thing I was after. Yours is a well written
article. Standby 1 or 2 days: I am digesting the issues you raise there. I will
have some questions. Thanks for publishing it.
Jul 19 '05 #3

P: n/a
John Bentley <no*****@nowhere.com> wrote:
http://www.pobox.com/~skeet/csharp/floatingpoint.html gives some of
the details of floating point numbers, and some suggestions as to
when to
use what. It's not exactly what you were after, but hopefully you'll
find it useful.
Jon, that is exactly the type of thing I was after. Yours is a well written
article.


Thanks - that's very kind of you.
Standby 1 or 2 days: I am digesting the issues you raise there. I will
have some questions. Thanks for publishing it.


Questions are more than welcome - they'll suggest ways I could expand
the article, for one thing :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Jul 19 '05 #4

P: n/a
http://www.pobox.com/~skeet/csharp/floatingpoint.html gives some of
the details of floating point numbers, and some suggestions as to
when to
use what.

I address these points to Jon Skeet but, of course, any body may have some
worthy insights they wish to contribute.
Questions are more than welcome - they'll suggest ways I could expand
the article, for one thing :)


You mention "There are points to note about decimal, but this article doesn't go
into them ..." and perhaps my questioning might expand your article in that
direction? :)

Since having done a bit more reading I have decided to speak of "nonintegrals"
rather than "Generic Decimal Number"

So my definition is now:

A nonintegral: a any number with a fractional part represented with
digits. A nonintegral may be implemented with any of several
datatypes including the decimal datatype, a double, a single, a string. We can
have a nonintegral in base 10 or base 2.

You've given me an important epiphany:

1/10 or 0.1 cannot be represented, exactly, in Base 2.

So, to repeat your story in my own words:

There are many fractions that can be represented exactly in Base 10 but CANNOT
in Base 2, like 0.1. A Floating point datatype displays in Base 10 but stores
its information ultimately in Base 2. Therefore a floating point datatype
cannot, exactly, represent 0.1 (and many other fractions).

Which is why when we run (we are biased toward different languages perhaps):

' Need DoubleConverter from
http://www.yoda.arachsys.com/csharp/DoubleConverter.cs
Sub NonintegralEqualityTest()
Dim nonintegral As Double
Dim i As Integer

For i = 1 To 10
nonintegral += 0.1
Next i

Debug.WriteLine("nonintegral: " & nonintegral)

Debug.WriteLine("nonintegral Actual: " _
& DoubleConverter.ToExactString(nonintegral))
Debug.WriteLine("(nonintegral = 1): " & (nonintegral = 1))
End Sub

.... We get:

nonintegral: 1
nonintegral Actual: 0.999999999999999888977697537484345957636833190917 96875
(nonintegral = 1): False
However if we Test working with a third in a slightly new procedure:
Sub NonintegralEqualityTestAThird()
Dim nonintegral As Double
Dim i As Integer

For i = 1 To 3
nonintegral += 0.1 / 0.3
Next i

Debug.WriteLine("nonintegral: " & nonintegral)
Debug.WriteLine("nonintegral Actual: " _
& DoubleConverter.ToExactString(nonintegral))
Debug.WriteLine("(nonintegral = 1): " & (nonintegral = 1))
End Sub

.... We get:
nonintegral: 1
nonintegral Actual: 1
(nonintegral = 1): True

Let's test working with a third but with a Decimal data type
Sub NonintegralEqualityTestAThird()
Dim nonintegral As Decimal
Dim i As Integer

For i = 1 To 3
' D is the literal type character for Decimal NOT double in VB.NET
nonintegral += 0.1D / 0.3D
Next i

Debug.WriteLine("nonintegral: " & nonintegral)
Debug.WriteLine("nonintegral Actual: " _
& DoubleConverter.ToExactString(nonintegral))
Debug.WriteLine("(nonintegral = 1): " & (nonintegral = 1))
End Sub

.... We Get:
nonintegral: 0.9999999999999999999999999999
nonintegral Actual: 1
(nonintegral = 1): False

These tests seem to imply that while 1/3 cannot be represented in Base 10 it can
be represented in Base 2. Can you confirm that 1/3 can be represented exactly in
Base 2? Is this it: 0.010101011?

Whether 1/3 can be represented exactly in Base 2 or not contradicts nothing
you've said and is probably unimportant as:

"Whatever base you come up with, you'll have the same problem with some
numbers - and in particular, "irrational" numbers (numbers which can't be
represented as fractions) like the mathematical constants pi and e are always
going to give trouble."

Therefore we can have this rule for when working with nonintegrals:

Never use the equal operator to test for the equality of nonintegrals (whether a
floating point or fixed point datatype, like Decimal). Instead use a custom
EqualEnough(x,y,tolerance) function (See Bellow).

Private Const mDefaultTolerance As Single = 0.000001

' Returns: Whether two floating point numbers are close enough to be
' deemed equal.
' Remarks: Never use the equality operator, =, to test for equality with
' floating point datatypes.
' Params:
' x, y
' Floating point numbers in any order.
' tolerance
' An amount that is sufficient to the numbers
' to differ by and still be considered equal . Eg 0.001
'
' Example:
' If EqualEnough(d,1) then
'
' Created: 31 Aug 2003
' John Bentley jo************@yahoo.com.au
' +61 (0)40 912 4414
Overloads Function EqualEnough(ByVal x As Double, ByVal y As Double, _
Optional ByVal tolerance As Double _
= CDbl(mDefaultTolerance)) As Boolean

Return (Math.Abs(x - y) <= tolerance)
End Function

Overloads Function EqualEnough(ByVal x As Single, ByVal y As Single, _
Optional ByVal tolerance As Single _
= mDefaultTolerance) As Boolean

Return (Math.Abs(x - y) <= tolerance)
End Function

Overloads Function EqualEnough(ByVal x As Decimal, ByVal y As Decimal, _
Optional ByVal tolerance As Single _
= CDec(mDefaultTolerance)) As Boolean

Return (Math.Abs(x - y) <= tolerance)
End Function

I'm still pursuing the question of How to determine which of the nonintegral
Datatypes to use: Floating point Datatypes versus Fixed Point (I know you say
that a decimal is really a floating point). Your suggestion, if I can represent
it oversimply, to use Floating point for Scientific apps and Fixed for Financial
apps, is a helpful one. However, I'm trying to grasp the issue a little more by
understanding the nature of a Decimal Datatype

What is this beast the Decimal Datatype? Is it that while System.Double and
System.Single are stored in Base 2, a System.Decimal is Stored, somehow, in Base
10? This would seem, to my present niave understanding, impossible as the CPU
ultimately works in machine code, that is, 0s and 1s.

For if we run
Sub NonintegralEqualityTest()
' Note this is now a Decimal rather than a Double
Dim nonintegral As Decimal
Dim i As Integer

For i = 1 To 10
' D is literal type character for Decimal in VB.NET
nonintegral += 0.1D
Next i

Debug.WriteLine("nonintegral: " & nonintegral)
Debug.WriteLine("nonintegral Actual: " _
& DoubleConverter.ToExactString(nonintegral))
Debug.WriteLine("(nonintegral = 1): " & (nonintegral = 1))
End Sub

We get:
nonintegral: 1
nonintegral Actual: 1
(nonintegral = 1): True

This shows, perhaps, an essential difference between a floating point datatype
and a fixed point datatype (can we stick with "float" V "fixed" point as a
convenient distinction?) A fixed point datatype, which in .NET is the Decimal
Datatype, is said to be a "scaled" number. What does this mean? Does being a
"scaled" number give it the magic powers (or make it Base 10 somehow) that
enables it to give the above results?

Enough for now :)

Jul 19 '05 #5

P: n/a
John Bentley <no*****@nowhere.com> wrote:
Questions are more than welcome - they'll suggest ways I could expand
the article, for one thing :)
You mention "There are points to note about decimal, but this article doesn't go
into them ..." and perhaps my questioning might expand your article in that
direction? :)


Or into another article :)
Since having done a bit more reading I have decided to speak of "nonintegrals"
rather than "Generic Decimal Number"

So my definition is now:

A nonintegral: a any number with a fractional part represented with
digits. A nonintegral may be implemented with any of several
datatypes including the decimal datatype, a double, a single, a string. We can
have a nonintegral in base 10 or base 2.
I personally think it's better to leave it without the last sentence,
or a modified one - a nonintegral can be *represented* (often
imprecisely) in base 10 or base 2. Numbers themselves don't
fundamentally have a base though - they're just numbers. Don't worry if
you don't see what I mean - it's a slightly philosophical distinction
to make, but the mathematician in me wants to make it :)
You've given me an important epiphany:

1/10 or 0.1 cannot be represented, exactly, in Base 2.

So, to repeat your story in my own words:

There are many fractions that can be represented exactly in Base 10 but CANNOT
in Base 2, like 0.1. A Floating point datatype displays in Base 10 but stores
its information ultimately in Base 2. Therefore a floating point datatype
cannot, exactly, represent 0.1 (and many other fractions).
Yup.
Which is why when we run (we are biased toward different languages perhaps):
<snip - adding 0.1 ten times>
... We get:

nonintegral: 1
nonintegral Actual: 0.999999999999999888977697537484345957636833190917 96875
(nonintegral = 1): False
Exactly.
However if we Test working with a third in a slightly new procedure:
<snip - adding a 0.1/0.3 three times>
... We get:
nonintegral: 1
nonintegral Actual: 1
(nonintegral = 1): True
Hmm... that's interesting, as a third isn't represented exactly in
binary either. I think it's just a coincidence, to be honest. If you
print out the exact double of each stage, you get:

0.333333333333333370340767487505218014121055603027 34375
0.666666666666666740681534975010436028242111206054 6875
1

Note that the first two don't sum to 1, which the third number would
suggest.

Basically there have been various stages where accuracy has been lost,
but they've *happened* to cancel each other out, whereas they didn't
before.

Note that if you keep going (ie keep adding a third) you don't get to
2. The sequence is:

0.333333333333333370340767487505218014121055603027 34375
0.666666666666666740681534975010436028242111206054 6875
1
1.333333333333333481363069950020872056484222412109 375
1.666666666666666962726139900041744112968444824218 75
2.000000000000000444089209850062616169452667236328 125
Let's test working with a third but with a Decimal data type
<snip adding a third as a decimal>
... We Get:
nonintegral: 0.9999999999999999999999999999
nonintegral Actual: 1
(nonintegral = 1): False

These tests seem to imply that while 1/3 cannot be represented in Base 10 it can
be represented in Base 2. Can you confirm that 1/3 can be represented exactly in
Base 2? Is this it: 0.010101011?
See above.
Whether 1/3 can be represented exactly in Base 2 or not contradicts nothing
you've said
Actually, it does slightly - but only if you look closely. As I say
when introducing DoubleConverter, *every* double value can exactly be
represented in decimal, and clearly 1/3 can't - therefore 1/3 can't be
represented exactly in binary.
and is probably unimportant as:

"Whatever base you come up with, you'll have the same problem with some
numbers - and in particular, "irrational" numbers (numbers which can't be
represented as fractions) like the mathematical constants pi and e are always
going to give trouble."

Therefore we can have this rule for when working with nonintegrals:

Never use the equal operator to test for the equality of nonintegrals (whether a
floating point or fixed point datatype, like Decimal). Instead use a custom
EqualEnough(x,y,tolerance) function (See Bellow).
<snip>

Yes. I'll definitely add something about this to the article.
I'm still pursuing the question of How to determine which of the nonintegral
Datatypes to use: Floating point Datatypes versus Fixed Point (I know you say
that a decimal is really a floating point). Your suggestion, if I can represent
it oversimply, to use Floating point for Scientific apps and Fixed for Financial
apps, is a helpful one. However, I'm trying to grasp the issue a little more by
understanding the nature of a Decimal Datatype

What is this beast the Decimal Datatype? Is it that while System.Double and
System.Single are stored in Base 2, a System.Decimal is Stored, somehow, in Base
10? This would seem, to my present niave understanding, impossible as the CPU
ultimately works in machine code, that is, 0s and 1s.
Looks like it's time for the decimal datatype article then, doesn't it?
:)

If you have a look in the MSDN you'll find more information, but
basically a decimal is 96 bits of integer information, 1 bit of sign,
and 5 bits of exponent (which aren't all used - the exponent goes from
0 to 28, but is always treated as negative - to get big numbers, you
use a small exponent (eg 0) and a big mantissa). I'll go into more
detail in the article :)

<snip>
This shows, perhaps, an essential difference between a floating point datatype
and a fixed point datatype (can we stick with "float" V "fixed" point as a
convenient distinction?)
We could, but it would be inaccurate :) Fixed point is where the
exponent is always assumed to have the same value. For instance, you
could have a very simple decimal fixed point data type where the
exponent was always -2 - a stored value of 1586 would therefore just
represent 15.86.
A fixed point datatype, which in .NET is the Decimal
Datatype, is said to be a "scaled" number. What does this mean? Does being a
"scaled" number give it the magic powers (or make it Base 10 somehow) that
enables it to give the above results?


Where did you get the scaled term from? If I had more context I could
perhaps answer the question better :)

I'll modify the existing article and start on the decimal type one...
thanks for the feedback.

By the way, I've been trying out different bits of CSS, so if you go
back to the article and it looks strange, just hit refresh and
hopefully the new version of the CSS will load and all will be well. If
the problem doesn't go away, mail me and I'll check :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Jul 19 '05 #6

P: n/a
John Bentley <no*****@nowhere.com> wrote:

<snip>
I get different results with:
Sub NonintegralEqualityTestAThird()
Dim nonintegral As Decimal
Dim i As Integer

For i = 1 To 42
' D is the literal type character for
' Decimal NOT double in VB.NET
nonintegral += 0.1D / 0.3D
Debug.WriteLine( _
DoubleConverter.ToExactString(nonintegral))
Next i
End Sub
Hang on - I'm confused now, as I thought before you were showing a very
similar program (using decimals rather than doubles) which *didn't* get
to 1 correctly.

(My program was using doubles, by the way.)
If there weren't more to know about System.Double & System.Single Versus
System.Decimal then this would be bizzare. If we look at the range of values
(just the positives) each data type can hold we have (approximately):

Min Non Zero Max
decimal: 1.000E-028 7.923E+028
single: 1.401E-045 3.403E+038
double: 4.941E-324 1.798E+308

double stores can hold a far smaller number than decimal. On that basis one
would expect double to return exact results more often, be less prone to round
off errors, and be the datatype of choice for financial transactions. (I'm just
highlighting where my confusion lays).
Ah, but look at it the other way: double stores a much larger range,
despite holding it in a smaller amount of memory. That *must* mean that
it's representing values more sparsely (if you see what I mean).
Actually, it does slightly - but only if you look closely. As I say
when introducing DoubleConverter, *every* double value can exactly be
represented in decimal, and clearly 1/3 can't


Wait a moment! You're DoubleConverter was written specifically to reveal the
actual representation of doubles NOT decimals. Could it be that our use of your
DoubleConverter class on Decimals is leading us astray?


Almost certainly - I didn't actually see that you were calling
DoubleConverter with a Decimal. That would indeed make a huge
difference, as there'd be a conversion from decimal to double involved
to start with. I think it's best not to think too much further along
those lines :)
Do we not have to wait
(pending your Decimal article perhaps :) )
No longer pending: see http://wwww.pobox.com/~skeet/csharp/decimal.html
Shall we ask not whether "*every* double value can exactly be represented in
decimal" but whether "every number that can be exactly represented as a double
be also exactly represented as a decimal?"

To repeat: can every number that can be exactly represented as a double be also
exactly represented as a decimal?
No - because they have a different range. On the other hand, every
number which can be exactly represented as a double *and* is within the
decimal range *and* which has 28 or fewer significant digits in its
exact decimal string representation can be exactly represented as a
decimal.
Conversly, are there some numbers that can have exact representation as a
decimal datatype but not as a double datatype?
0.1 for a start :)
- therefore 1/3 can't be
represented exactly in binary.


That wouldn't follow, I offer.


It does: Suppose 1/3 can be exactly represented as a double.

Then, from the statement "every double value can exactly be
represented in decimal" (not necessarily in the decimal type, but as a
decimal string representation) there must be an exact decimal string
representation of 1/3. However, as we know that there *isn't* an exact
decimal string representation of 1/3, our original supposition must be
wrong.
I'm happy though, to accept without proof that
1/3 can't be represented exactly in binary.


I included the above anyway, just in case it helps. :)
This shows, perhaps, an essential difference between a floating
point datatype and a fixed point datatype (can we stick with "float"
V "fixed" point as a convenient distinction?)


We could, but it would be inaccurate :) Fixed point is where the
exponent is always assumed to have the same value. For instance, you
could have a very simple decimal fixed point data type where the
exponent was always -2 - a stored value of 1586 would therefore just
represent 15.86.


Accepted and this explaination teaches me. Could you suggest an alternative
vocabulary to distinguish between the two groups of datatypes?


Decimal floating point and binary floating point is the best I can come
up with - both of them are effectively of the form

sign * mantissa * base^exponent

where the base is 2 for binary and 10 for decimal. The particular
details of how the mantissa, sign and exponent are stored (bias, etc)
is a separate matter, IMO.
Where did you get the scaled term from? If I had more context I could
perhaps answer the question better :)


I'm using the term too loosely, repeating other loose uses of it in various
acrhived newsgroup articles.

It could come from
"Decimal variables are stored as signed 128-bit (16-byte) integers scaled by a
variable power of 10"

http://msdn.microsoft.com/library/de...blr7/html/vada
tdecimal.asp


Right. Basically "scaled" here means the "multiply by base^exponent" as
far as I can see. Note the *variable* power of 10 bit - that's the
equivalent to the word "floating" in "floating point".

Hope the new article helps... again, feedback is welcome. (I've updated
the previous article with some talk about comparisons, as well.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Jul 19 '05 #7

P: n/a
John Skeet,

Your last post and new article on the decimal datatype taught me more.
I still have a lot to digest from these.

Standby about 3 days for a fuller response. I am writing my own articles that
parallel yours in many ways.

In short (and as a preview) the insight that you've given me is precisely the
distinction Binary Versus Decimal Datatype: All datatypes are ultimately
represented as binary bits, However for nonintegral numbers the base of the
scaling is different. For Doubles and Singles ("Binary Datatypes") it is a base
2 scale, for decimals ("Decimal Datatypes") it is base 10.

This is why 0.1 can be represented exactly in a decimal datatype but not in a
binary datatype.

To help me with this could I trouble you/ challenge you to write a function to
return the Binary representation of a Double?

Eg (To Copy straight from your example)
In: 46.42829231507700882275457843206822872161865234375
Out: 01000000010001110011011011010010010010000101011100 11000100100011

I was about to attempt it myself after succesfully doing this for a Decimal (due
with thanks to your hint about Decimal.GetBits. Alas the job appears a bit
tricky. I dare say you will be able to whip something up though (?)


Jul 19 '05 #8

P: n/a
>
Ah, that's easy :)

using System;
using System.Text;

public class Test
{
static void Main()
{
Console.WriteLine (ToDoubleBitsString
(46.4282923150770088227545784320682287216186523437 5d));
}

static string ToDoubleBitsString(double d)
{
long bits = BitConverter.DoubleToInt64Bits(d);
return Convert.ToString (bits, 2).PadLeft (64, '0');
}
}


Brilliant! :)

Bravo Jon. Now I have all the tools (I think) to digest your articles and the
issue. Expect fuller post in 3 days.

Jul 19 '05 #9

P: n/a
> -------------------------------------------------
John Bentley wrote (at this indent level):
-------------------------------------------------
Ah, that's easy :)

using System;
using System.Text;

public class Test
{
static void Main()
{
Console.WriteLine (ToDoubleBitsString
(46.4282923150770088227545784320682287216186523437 5d));
}

static string ToDoubleBitsString(double d)
{
long bits = BitConverter.DoubleToInt64Bits(d);
return Convert.ToString (bits, 2).PadLeft (64, '0');
}
}


Brilliant! :)

Bravo Jon. Now I have all the tools (I think) to digest your articles
and the issue. Expect fuller post in 3 days.


Jon. Probably another 3 days :)
Jul 21 '05 #10

P: n/a
Jon, I have finally come to understand more of you're excellent posts and
articles
http://www.yoda.arachsys.com/csharp/decimal.html
http://www.yoda.arachsys.com/csharp/floatingpoint.html

Recall, and to orientate others, my interest springs from a wish to know the
answer to three question when working with nonintegrals:
1. How do you round a number?
2. How do you determine if two nonintegrals are equal?
3. Which nonintegral datatype should you choose?

1. I have answered and will publish (to the web) my own article in about 3
months.
2. Has finally been cemented through this discussion with you.
3. Is still unclear.
-------------------------------------------------
Jon Skeet wrote (at this indent level):
-------------------------------------------------
I get different results with:
Sub NonintegralEqualityTestAThird()
Dim nonintegral As Decimal
Dim i As Integer

For i = 1 To 42
' D is the literal type character for
' Decimal NOT double in VB.NET
nonintegral += 0.1D / 0.3D
Debug.WriteLine( _
DoubleConverter.ToExactString(nonintegral))
Next i
End Sub


Hang on - I'm confused now, as I thought before you were showing a
very
similar program (using decimals rather than doubles) which *didn't*
get
to 1 correctly.

(My program was using doubles, by the way.)


I was confused and probably drew you into my confusion. Now I am less confused
(at least).

Once again, the key insight I have so far aquired from your articles and posts,
is that although all nonintegral datatypes are finally stored as bits, as a
binary number, there is, conceptually, an intervening stage. We could call that
stage, the "scaling stage". One datatype group is scaled using a base 2 number
and the other is scaled using a base 10 number. So you're point makes absolute
sense to me now...
Could you suggest an
alternative vocabulary to distinguish between the two groups of
datatypes?


Decimal floating point and binary floating point is the best I can
come
up with - both of them are effectively of the form

sign * mantissa * base^exponent

where the base is 2 for binary and 10 for decimal. The particular
details of how the mantissa, sign and exponent are stored (bias, etc)
is a separate matter, IMO.


My current inclination is to use, then "Binary scaled nonintegral Datatype"
versus "Decimal scaled nonintegral datatype". Or, for short, a "Binary scaled
Datatype" versus "Decimal scaled datatype". I am preferring these terms over
your current ones as:
1. It emphasises that the "binary" V "Decimal" refers to this "scaling"
intermediate representation. Calling it a "Binary nonintegral", for example,
could be refering to how the numeber is stored in the chip, which would not
differentiate it, or how the developer should employ it, only for base 2
numbers, which would be misleading.
2. Nonintegral over floating point is a mere preference for emphasising the
generic nature of the number rather than the details of the implementation (a
small point probably worth no debate, but feel free if you wish).

Back to clear the confusion around NonintegralEqualityTestAThird(). It's clear
now why a binary scaled datatype can't exactly represent 0.1 and a decimal
scaled datatype can. It's also clear now why neither can exactly represent 1/3.
The function NonintegralEqualityTestAThird() does bare this out (modified
slightly in whatever way you like for clarity and optional multiple looping).
Whether you run it for the Decimal datatype or a Double datatype it fails for
both. The only difference is that for a Double datatype it sometimes succeeds.
For example:

Sub NonintegralEqualityTestAThirdDouble()
Dim nonintegral As Double
Dim i As Integer
Dim testTotal As Double = 12.0R

Dim loops As Integer = 3
For i = 1 To loops
' R is the literal type character for
' Double (NOT D) in VB.NET
nonintegral += 1.0R / 3.0R
Debug.WriteLine(nonintegral & " (displayed)")
Debug.WriteLine(ToExactString(nonintegral) & " (Stored)")
Debug.WriteLine("")
Next i

testTotal = Convert.ToDouble(loops \ 3)
Debug.WriteLine("testTotal : " & testTotal)
Debug.WriteLine("nonintegral: " & nonintegral)
Debug.WriteLine("(nonintegral = testTotal): " _
& (nonintegral = testTotal))
Debug.WriteLine("nonintegral almost equal: " _
& AlmostEqual(nonintegral, testTotal))
End Sub

Gives:
***********************************
0.333333333333333 (displayed)
0.333333333333333314829616256247390992939472198486 328125 (Stored)

0.666666666666667 (displayed)
0.666666666666666629659232512494781985878944396972 65625 (Stored)

1 (displayed)
1 (Stored)

testTotal : 1
nonintegral: 1
(nonintegral = testTotal): True
nonintegral almost equal: True
**********************************

But if you change loops to 6 you get:
*************************************
..... [Output snipped]
1 (displayed)
1 (Stored)

1.33333333333333 (displayed)
1.333333333333333259318465024989563971757888793945 3125 (Stored)

1.66666666666667 (displayed)
1.666666666666666518636930049979127943515777587890 625 (Stored)

2 (displayed)
1.999999999999999777955395074968691915273666381835 9375 (Stored)

testTotal : 2
nonintegral: 2
(nonintegral = testTotal): False
nonintegral almost equal: True
**********************************

This finding of a double sometims exactly representing 1/3 and sometimes not, as
you have previously written, need not trouble us. (perhaps: Just like einstein
we can discard data that doesn't conviently fit with theory :) )

It is enough to know that for BOTH binary scaled datatypes and decimal scaled
datatypes 1/3 (and other fractions) are never stored exactly, they only ever are
sometimes APPARENTLY stored exactly.

Or as you say "Whatever base you come up with, you'll have the same problem with
some numbers - and in particular, "irrational" numbers (numbers which can't be
represented as fractions)"

From the point of view of asking "How do you determine if two nonintegrals are
equal?" it is enough to know that for both you can't reliably ask that question.
Instead you must not ask if they are equal but ask are they close enough. So to
repeat (a point which you have already incorporated into your article) a
principle we can come up with:

Never compare two nonintegral values to see if they are equal or not equal.
Instead, always check to see if the numbers are nearly equal for whatever
purpose you are engaged. This applies as much to decimal scaled datatypes as
binary scaled datatypes.

I consider now "How do you determine if two nonintegrals are equal?" closed. The
short answer is: don't. To answer it properly it was necessary to understand the
differences between the nonintegral datatypes.

Still, however, The question of "Which nonintegral datatype should you choose?"
is still open.

In order to answer this can we now aggree that the following statement from
microsoft is wrong?

"The Decimal value type is appropriate for financial calculations requiring
large numbers of significant integral and fractional digits and no round-off
errors."
VS Documentation/MSDN > .NET Framework Class Library > Decimal Structure

The part that is wrong, I claim, is that the decimal value type is not immune
from round off errors in financial applications. Where 1/3 is involved, for
example, you might well get a round off error. It is just less prone to round
off errors than a binary scaled datatype because it will have more exact
representations (true or false?).

If you have a scolarship fund of $100, for example, that is to be divided
equally between 3 students then it's not clear who will get the 1 cent.

Am I right to claim that the MS statement is wrong or is there a way of
contextualising the statement to make it both true and expose a deeper
understanding?
If we look at the range of values
(just the positives) each data type can hold we have (approximately):

Min Non Zero Max
decimal: 1.000E-028 7.923E+028
single: 1.401E-045 3.403E+038
double: 4.941E-324 1.798E+308

double stores can hold a far smaller number than decimal. On that
basis one would expect double to return exact results more often, be
less prone to round off errors, and be the datatype of choice for
financial transactions. (I'm just highlighting where my confusion
lays).


Ah, but look at it the other way: double stores a much larger range,
despite holding it in a smaller amount of memory. That *must* mean
that
it's representing values more sparsely (if you see what I mean).


I do (I believe) see what you mean, but perhaps not clearly the truth of it.

Could we then write that 0.1 is such a sparse value under the double datatype?:
it has trouble represententing this and many other values exactly where the
decimal datatype does not.
Shall we ask not whether "*every* double value can exactly be
represented in decimal" but whether "every number that can be
exactly represented as a double be also exactly represented as a
decimal?"

To repeat: can every number that can be exactly represented as a
double be also exactly represented as a decimal?


No - because they have a different range. On the other hand, every
number which can be exactly represented as a double *and* is within
the
decimal range *and* which has 28 or fewer significant digits in its
exact decimal string representation can be exactly represented as a
decimal.


Thanks, that helps. I concur.

Now could a financial application have a number which:
1. can be represented exactly as a double; and
2. is inside the decimal range ( -7.923E+028 < x < +7.923E+028 ); and
3. has *more than* 28 significant bits to the right of the decimal point.
?

More simply, are there numbers that could arise in a financial application that
can be exactly represented by a double but can't be represented at all by a
decimal?

If that is true then we ought then ask: are there numbers that *would arise as a
matter of course* in a financial application that can be exactly represented by
a double but can't be represented at all by a decimal?

If that is true then it would seem difficult to choose the decimal over the
double for financial application.
Conversly, are there some numbers that can have exact representation
as a decimal datatype but not as a double datatype?


0.1 for a start :)


Yes indeed :)

The above questions still stand though.

Looking at a scientific appliation now. If it is true that a decimal is less
prone to round off error then wouldn't you *definitely* use this for a
scientific application. When building the sky scrapper for example, it is
important when adding the floors to find the total height that there is no
massive error. Futhermore, it would be unlikely that any girder on any floor
will be built to a tolerance requiring more than 28 decimal places (or even 20).

< Snip proof of 1/3 not representable as a binary number >

[ The term "Scaled" ]
It could come from
"Decimal variables are stored as signed 128-bit (16-byte) integers
scaled by a variable power of 10"

http://msdn.microsoft.com/library/de...blr7/html/vada tdecimal.asp


Right. Basically "scaled" here means the "multiply by base^exponent"
as
far as I can see. Note the *variable* power of 10 bit - that's the
equivalent to the word "floating" in "floating point".


Yes. I see all this now can see the sense of each of the terms "scaled",
"variable", "floating".
Jul 21 '05 #11

P: n/a
John Bentley <no*****@nowhere.com> wrote:
My current inclination is to use, then "Binary scaled nonintegral Datatype"
versus "Decimal scaled nonintegral datatype". Or, for short, a "Binary scaled
Datatype" versus "Decimal scaled datatype". I am preferring these terms over
your current ones as:
<snip>

Fair enough. I still prefer my terms as I think they're more commonly
used, but it's really just a matter of taste - the main thing is that
we can understand each other :)

One alternative way of putting it for me would be "floating binary
point" or "floating decimal point" - how does that grab you?
It is enough to know that for BOTH binary scaled datatypes and decimal scaled
datatypes 1/3 (and other fractions) are never stored exactly, they only ever are
sometimes APPARENTLY stored exactly.

Or as you say "Whatever base you come up with, you'll have the same problem with
some numbers - and in particular, "irrational" numbers (numbers which can't be
represented as fractions)"
It's worth noting that 1/3 *isn't* an irrational number though - and
indeed if we had a base 3 floating point number type, it would be
exactly representable (but 1/2 wouldn't be).

<snip>
Still, however, The question of "Which nonintegral datatype should you choose?"
is still open.

In order to answer this can we now aggree that the following statement from
microsoft is wrong?

"The Decimal value type is appropriate for financial calculations requiring
large numbers of significant integral and fractional digits and no round-off
errors."
VS Documentation/MSDN > .NET Framework Class Library > Decimal Structure

The part that is wrong, I claim, is that the decimal value type is not immune
from round off errors in financial applications. Where 1/3 is involved, for
example, you might well get a round off error. It is just less prone to round
off errors than a binary scaled datatype because it will have more exact
representations (true or false?).

If you have a scolarship fund of $100, for example, that is to be divided
equally between 3 students then it's not clear who will get the 1 cent.

Am I right to claim that the MS statement is wrong or is there a way of
contextualising the statement to make it both true and expose a deeper
understanding?
I think the difference is that most of the time financial calculations
*don't* require divisions like that. They usually include a lot of
addition and subtraction, and multiplication - but rarely actual
division. On the other hand, I haven't done much financial work, so
that's really speculation. Certainly as soon as division comes in,
you're likely to get inaccuracies.
Ah, but look at it the other way: double stores a much larger range,
despite holding it in a smaller amount of memory. That *must* mean
that
it's representing values more sparsely (if you see what I mean).


I do (I believe) see what you mean, but perhaps not clearly the truth of it.

Could we then write that 0.1 is such a sparse value under the double datatype?:
it has trouble represententing this and many other values exactly where the
decimal datatype does not.


I'm not sure what you mean by a "sparse value" here. Put it this way:
Suppose we had two data types, one of which represented a thousand
numbers between 0 and 10000, and another of which represented a hundred
numbers between 0 and 10000000 - there will be more of a "gap" between
represented numbers in the second type than in the first type, on
average.
No - because they have a different range. On the other hand, every
number which can be exactly represented as a double *and* is within
the
decimal range *and* which has 28 or fewer significant digits in its
exact decimal string representation can be exactly represented as a
decimal.


Thanks, that helps. I concur.

Now could a financial application have a number which:
1. can be represented exactly as a double; and
2. is inside the decimal range ( -7.923E+028 < x < +7.923E+028 ); and
3. has *more than* 28 significant bits to the right of the decimal point.
?


It could - but I suspect it's unlikely.
More simply, are there numbers that could arise in a financial application that
can be exactly represented by a double but can't be represented at all by a
decimal?
I think it's unlikely, and you'd be *extremely* unlikely to actually
know for sure that such a number would be exactly represented - it
would be pure fluke.
If that is true then we ought then ask: are there numbers that *would arise as a
matter of course* in a financial application that can be exactly represented by
a double but can't be represented at all by a decimal?
That's basically not true.
The above questions still stand though.

Looking at a scientific appliation now. If it is true that a decimal is less
prone to round off error then wouldn't you *definitely* use this for a
scientific application. When building the sky scrapper for example, it is
important when adding the floors to find the total height that there is no
massive error. Futhermore, it would be unlikely that any girder on any floor
will be built to a tolerance requiring more than 28 decimal places (or even 20).


There will be no *massive* error using double. The error is likely to
be smaller than the engineers would be able to cope with anyway. The
real world doesn't tend to be as precise as a double, in other words.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #12

P: n/a
> -------------------------------------------------
Jon Skeet wrote (at this indent level):
------------------------------------------------- John Bentley at this level.
Note some snipping without notification.
My current inclination is to use, then "Binary scaled nonintegral
Datatype" versus "Decimal scaled nonintegral datatype". Or, for
short, a "Binary scaled Datatype" versus "Decimal scaled datatype".
I am preferring these terms over your current ones as:


Fair enough. I still prefer my terms as I think they're more commonly
used, but it's really just a matter of taste - the main thing is that
we can understand each other :)


That, absolutely, is the main thing.
One alternative way of putting it for me would be "floating binary
point" or "floating decimal point" - how does that grab you?


I've just thought of a better reason to avoid the "floating" in these phrases.
It is true that the three .Net datatypes we've been talking about are all
floating point datatypes. However there has been, in VBA for example, a
nonintegral datatype that was a fixed point datatype. That was the currency
datatype that was fixed at 4 decimal places. Presumably, being a currency, the
scaling would have been decimal (base 10). Prsesumably, all that was stored was
the mantissa. The implicit exponent would have always been -4, that is, there
would have always been a Base 10, decimal, scaling of 10^-4 (= / 10,000).

If we are interested in coming up with the most generic terms, that which could
apply to any computer language, and that which gets at a common quality, then a
datatypes being a "floating datatype" is not the important thing. In languages
like VBA we have fixed and floating point datatypes but the important difference
is not this one. It is the base of the scaling. You could, for example, have a
language with fixed and floating point datatypes all of which scale with the
same base. Maybe in the future, when processors are 100 times faster than they
are now, all nonintegrals will scale in base 10. (Maybe then there will be only
one nonintegral datatype to choose from)

As an aside I have thought of a quick way to appreciate the difference between a
Base 2 scaling and a Base 10 scaling. Base 2 scaling is sliding the (binary)
point while the mantissa is still in base 2. Base 10 scaling is sliding the
(decimal) point after the mantissa has been converted to base 10. You know this,
I'm just drilling it into my own understanding.
It is enough to know that for BOTH binary scaled datatypes and
decimal scaled datatypes 1/3 (and other fractions) are never stored
exactly, they only ever are sometimes APPARENTLY stored exactly.

Or as you say "Whatever base you come up with, you'll have the same
problem with some numbers - and in particular, "irrational" numbers
(numbers which can't be represented as fractions)"


It's worth noting that 1/3 *isn't* an irrational number though - and
indeed if we had a base 3 floating point number type, it would be
exactly representable (but 1/2 wouldn't be).


Yes, thanks. That clears a confusion. 1/3 is representable as a fraction. I've
just done it. It is just not exactly representable as a decimal with a finite
number of digits in Base 10, but is in Base 3. PI and e are not representable as
a fraction. Yes, 1/3 *isn't* an irrational number.
Still, however, The question of "Which nonintegral datatype should
you choose?" is still open.

In order to answer this can we now aggree that the following
statement from microsoft is wrong?

"The Decimal value type is appropriate for financial calculations
requiring large numbers of significant integral and fractional
digits and no round-off errors."
VS Documentation/MSDN > .NET Framework Class Library > Decimal
Structure

The part that is wrong, I claim, is that the decimal value type is
not immune from round off errors in financial applications. Where
1/3 is involved, for example, you might well get a round off error.
It is just less prone to round off errors than a binary scaled
datatype because it will have more exact representations (true or
false?).

If you have a scolarship fund of $100, for example, that is to be
divided equally between 3 students then it's not clear who will get
the 1 cent.

Am I right to claim that the MS statement is wrong or is there a way
of contextualising the statement to make it both true and expose a
deeper understanding?


I think the difference is that most of the time financial calculations
*don't* require divisions like that. They usually include a lot of
addition and subtraction, and multiplication - but rarely actual
division. On the other hand, I haven't done much financial work, so
that's really speculation. Certainly as soon as division comes in,
you're likely to get inaccuracies.

If we think about our savings accounts then I agree that division never comes in
(as far as I can see). We deposit and withdraw exact amounts most of the time.
Occasionaly we get an interest payment. Unless the bank is cruel to its
developers the interest figure will be able to be exactly represented in base
10, something like 4.1% as opposed to 4 1/3 %

125.78 * ' Initial Balance
04.1%
-------
5.15698 +
125.78
--------
130.93698 ' Final Balance

As an aside there are a few interesting issues here. What, exactly is your
entitlement? 130.93, 130.94, 130.9367, 130.93698? I imagine Jon Skeet walking
into his local branch and asking "What is my exact balance?", teller replies,
Jon enquires further, "I mean my exact string representation, I can supply the
algorithm if you wish."

I speculate, and this might require a new thread, that there is some IEEE
standard for financial transactions which says that currency amounts shall be
stored to 4 decimal places, rounded according to "banker's rounding" (which is
toward the even number). So, for example, if an interest payment yields an
intermediate balance of 157.34865 this gets *stored* in your account as
157.3486. The 0.0086 is kept there for future rounding operations. However if
you closed your account with 157.3486 in it you will be actually handed 157.35.
If you closed your account on a day with 157.3428 in it, you will get back
157.34.

The question here, as an aside: Is there a standard for financial transactions
which specifies a maximum number of digits to which you should store currency
amounts? (I have posted this in a new thread:
Standard for financial applications specifying maximum number of decimal
places?)

Back to our main thread. My point is, yes for some financial applications,
division may never come into things.
However, in many they will. Instead of the scolarship example think of a
companies annual profit that must be paid out to shareholders as dividends.

I've mislead us a little bit with this scolarship example (which we could easily
change into a dividend example). Wondering what we do with the extra cent is a
problem that is not due to any limitation of the decimal datatype rather a
limitation due to the nature of currency: it is a quantity that, at the point
where money has to change hands, requires exactness, a limited precision. If we
had a magic computer that had an infinite datatype then we still have the same
problem.

The problem arises not only for division, but as my previous interest example
shows, for multiplication too. How should we specify the problem? Perhaps, it is
a only a problem of how we should round a more precise number to a less precise
number.
Ah, but look at it the other way: double stores a much larger
range,
despite holding it in a smaller amount of memory. That *must* mean
that
it's representing values more sparsely (if you see what I mean).


I do (I believe) see what you mean, but perhaps not clearly the
truth of it.

Could we then write that 0.1 is such a sparse value under the double
datatype?: it has trouble represententing this and many other values
exactly where the decimal datatype does not.


I'm not sure what you mean by a "sparse value" here. Put it this way:
Suppose we had two data types, one of which represented a thousand
numbers between 0 and 10000, and another of which represented a
hundred
numbers between 0 and 10000000 - there will be more of a "gap" between
represented numbers in the second type than in the first type, on
average.


Let us, so that we can have more numbers that are representble by both
datatypes, have:
SuperCool 50 (Decimal Analogy): 11 numbers between 0 and 50 (0, 5, 10, 15, ...
50)
SuperCool 100 (Double Analogy): 6 numbers between 0 and 100 (0, 20, 40, 60, 80,
100)

In common we have (0, 20, 40)

Under this scheme, yes, the datatype SuperCool 100 takes up less memory and also
is representing numbers more sparsely (I like this phrase of yours).

However, If we ask "what number can SuperCool 50 hold that SuperCool 100 cannot,
within the range of SuperCool 50?" we can give a number: 5, for example. If, on
the other hand, I ask you "what number can a Decimal hold that a Double cannot,
withhin the range (and precision) of a Decimal?" can you come up with any
number?
No - because they have a different range. On the other hand, every
number which can be exactly represented as a double *and* is within
the
decimal range *and* which has 28 or fewer significant digits in its
exact decimal string representation can be exactly represented as a
decimal.


Thanks, that helps. I concur.

Now could a financial application have a number which:
1. can be represented exactly as a double; and
2. is inside the decimal range ( -7.923E+028 < x < +7.923E+028 ); and
3. has *more than* 28 significant bits to the right of the decimal
point. ?


It could - but I suspect it's unlikely.

If it turns out that there is a Standard for storing a maximum number of digits
in a financial application then it will never happen. That is, the decimal
datatype's precision would be big enough, presumabley, to hold intermediate
numbers during a calculation before being rounded off the the nearest (say)
fourth decimal place.

Looking at a scientific appliation now. If it is true that a decimal
is less prone to round off error then wouldn't you *definitely* use
this for a scientific application. When building the sky scrapper
for example, it is important when adding the floors to find the
total height that there is no massive error. Futhermore, it would be
unlikely that any girder on any floor will be built to a tolerance
requiring more than 28 decimal places (or even 20).


There will be no *massive* error using double. The error is likely to
be smaller than the engineers would be able to cope with anyway. The
real world doesn't tend to be as precise as a double, in other words.


I suspect that no matter how far we wish to pursue this discussion the basic
rule of thumb will be: for Financial apps use the Decimal datatype, for
scientific apps use a double.

I won't mind even if I can't see the exhaustive list of reason why this would be
so. The final motivation is in order to guide my .NET programming choices.
Thinking of this end in mind we could shift our discussion more directly around
the programming rules and work backwards rather than starting with axioms and
moving toward the rules.

A fuller list of rules could be:

Choosing between a Binary Scaled Datatype (single, double in VB.NET) and Decimal
Scaled data type (decimal) is governed by considerations:

1. Size: Use Binary Scaled Datatype (doubles and singles) if you need to store a
number that is too large or too small for a Decimal Scaled Datatype (decimal).

2. Exactness: If you deal with quantities that start life with, and require,
exact representation, like the price of a shirt, use Binary Scaled Datatype
(doubles and singles). If you deal with quantities that start life with
imprecise representations, and can never have precise representation, like the
length of a diameter of a tyre, use floating point data types (double and single
in VB.NET). Rough guide: For Financial applications use the fixed point
datatype, for scientific applications use floating point data types.

3. Tolerating round off errors: The Decimal Scaled Datatype (Decimal Datatype)
is less prone to round off errors.

4. Performance: Floating datatypes (Double and Single in VB.NET) are in the
order of 40 times faster than the Decimal Scaled Datatype(Decimal Datatype).

What do ya reckon?

A further issue: Double V Single (Float in C#). At the moment I'd be inclined to
always choose a double by default, even though it is slower and takes more
memory. In any given app, it's speed is always apparent while an inaccuracy
might not be until a disaster occurs. With processor power and memory becoming
larger and cheaper I think coding optimizations become less important. In
database apps, to take a specific type of app, the major speed bottlenecks will
be in the number of records (and the number of fields) coming across the network
rather than choosing a double over a single.

In any case have you done any performance tests of Doubles V Singles?


Jul 21 '05 #13

P: n/a
John Bentley <no*****@nowhere.com> wrote:
One alternative way of putting it for me would be "floating binary
point" or "floating decimal point" - how does that grab you?
I've just thought of a better reason to avoid the "floating" in these phrases.
It is true that the three .Net datatypes we've been talking about are all
floating point datatypes. However there has been, in VBA for example, a
nonintegral datatype that was a fixed point datatype. That was the currency
datatype that was fixed at 4 decimal places. Presumably, being a currency, the
scaling would have been decimal (base 10). Prsesumably, all that was stored was
the mantissa. The implicit exponent would have always been -4, that is, there
would have always been a Base 10, decimal, scaling of 10^-4 (= / 10,000).

If we are interested in coming up with the most generic terms, that which could
apply to any computer language, and that which gets at a common quality, then a
datatypes being a "floating datatype" is not the important thing. In languages
like VBA we have fixed and floating point datatypes but the important difference
is not this one. It is the base of the scaling.


Which is why it's important to say "decimal floating point" for decimal
:)
You could, for example, have a
language with fixed and floating point datatypes all of which scale with the
same base.
Indeed.
Maybe in the future, when processors are 100 times faster than they
are now, all nonintegrals will scale in base 10. (Maybe then there will be only
one nonintegral datatype to choose from)

As an aside I have thought of a quick way to appreciate the difference between a
Base 2 scaling and a Base 10 scaling. Base 2 scaling is sliding the (binary)
point while the mantissa is still in base 2. Base 10 scaling is sliding the
(decimal) point after the mantissa has been converted to base 10. You know this,
I'm just drilling it into my own understanding.
Right - that was the idea of "floating binary point" and "floating
decimal point" (rather than the versions with the first two words
reversed).
I think the difference is that most of the time financial calculations
*don't* require divisions like that. They usually include a lot of
addition and subtraction, and multiplication - but rarely actual
division. On the other hand, I haven't done much financial work, so
that's really speculation. Certainly as soon as division comes in,
you're likely to get inaccuracies.

If we think about our savings accounts then I agree that division never comes in
(as far as I can see). We deposit and withdraw exact amounts most of the time.
Occasionaly we get an interest payment. Unless the bank is cruel to its
developers the interest figure will be able to be exactly represented in base
10, something like 4.1% as opposed to 4 1/3 %


Exactly.
125.78 * ' Initial Balance
04.1%
-------
5.15698 +
125.78
--------
130.93698 ' Final Balance

As an aside there are a few interesting issues here. What, exactly is your
entitlement? 130.93, 130.94, 130.9367, 130.93698? I imagine Jon Skeet walking
into his local branch and asking "What is my exact balance?", teller replies,
Jon enquires further, "I mean my exact string representation, I can supply the
algorithm if you wish."
I'm pretty sure that the exact balance is 130.93 - I suspect banks keep
the remainders of interest, and actually make a fair amount of money on
it. Depending on the bank, of course :)
I speculate, and this might require a new thread, that there is some IEEE
standard for financial transactions which says that currency amounts shall be
stored to 4 decimal places, rounded according to "banker's rounding" (which is
toward the even number). So, for example, if an interest payment yields an
intermediate balance of 157.34865 this gets *stored* in your account as
157.3486. The 0.0086 is kept there for future rounding operations. However if
you closed your account with 157.3486 in it you will be actually handed 157.35.
If you closed your account on a day with 157.3428 in it, you will get back
157.34.
That's possible too, certainly - I'm not really qualified to comment,
I'm afraid.
The question here, as an aside: Is there a standard for financial transactions
which specifies a maximum number of digits to which you should store currency
amounts? (I have posted this in a new thread:
Standard for financial applications specifying maximum number of decimal
places?)
I suspect there are different standards for different situations.
Back to our main thread. My point is, yes for some financial applications,
division may never come into things.
However, in many they will. Instead of the scolarship example think of a
companies annual profit that must be paid out to shareholders as dividends.
I suspect there are standards for exactly how those are worked out -
and the number of accurate digits given by the decimal type is likely
to be fine to work out the dividend with appropriate rules.

<snip>
The problem arises not only for division, but as my previous interest example
shows, for multiplication too. How should we specify the problem? Perhaps, it is
a only a problem of how we should round a more precise number to a less precise
number.
Exactly - that problem is always going to be there, and requires
appropriate business rules.
I'm not sure what you mean by a "sparse value" here. Put it this way:
Suppose we had two data types, one of which represented a thousand
numbers between 0 and 10000, and another of which represented a
hundred
numbers between 0 and 10000000 - there will be more of a "gap" between
represented numbers in the second type than in the first type, on
average.


Let us, so that we can have more numbers that are representble by both
datatypes, have:
SuperCool 50 (Decimal Analogy): 11 numbers between 0 and 50 (0, 5, 10, 15, ...
50)
SuperCool 100 (Double Analogy): 6 numbers between 0 and 100 (0, 20, 40, 60, 80,
100)

In common we have (0, 20, 40)

Under this scheme, yes, the datatype SuperCool 100 takes up less memory and also
is representing numbers more sparsely (I like this phrase of yours).


Yes.
However, If we ask "what number can SuperCool 50 hold that SuperCool 100 cannot,
within the range of SuperCool 50?" we can give a number: 5, for example. If, on
the other hand, I ask you "what number can a Decimal hold that a Double cannot,
withhin the range (and precision) of a Decimal?" can you come up with any
number?
I presume you meant to ask the other way round - but it's the
*precision* part that gets us into trouble. For instance, take the
closest double to 0.1:

0.100000000000000005551115123125782702118158340454 1015625

That's certainly within the *range* of decimal, but isn't exactly
representable as a decimal. If you could clarify exactly what you mean
by "within the precision of a decimal" I could possibly answer better.
Now could a financial application have a number which:
1. can be represented exactly as a double; and
2. is inside the decimal range ( -7.923E+028 < x < +7.923E+028 ); and
3. has *more than* 28 significant bits to the right of the decimal
point. ?


It could - but I suspect it's unlikely.


If it turns out that there is a Standard for storing a maximum number of digits
in a financial application then it will never happen. That is, the decimal
datatype's precision would be big enough, presumabley, to hold intermediate
numbers during a calculation before being rounded off the the nearest (say)
fourth decimal place.


Well, it depends on what that maximum number of digits is, and how
large the values can get. The units of some currencies are very small -
for instance, $1(US) is worth over 100 Japanese Yen - so the number for
the GDP of Japan when expressed in Yen is going to be a pretty big
number. If that number needed to be expressed to 10 decimal places,
that might be a problem. However, it's unlikely that a huge number
needs to be expressed with that kind of accuracy - and if it does,
you're likely to know about it when designing the app, and accomodate
by using a custom type.
There will be no *massive* error using double. The error is likely to
be smaller than the engineers would be able to cope with anyway. The
real world doesn't tend to be as precise as a double, in other words.


I suspect that no matter how far we wish to pursue this discussion the basic
rule of thumb will be: for Financial apps use the Decimal datatype, for
scientific apps use a double.


Yes.
I won't mind even if I can't see the exhaustive list of reason why this would be
so. The final motivation is in order to guide my .NET programming choices.
Thinking of this end in mind we could shift our discussion more directly around
the programming rules and work backwards rather than starting with axioms and
moving toward the rules.

A fuller list of rules could be:

Choosing between a Binary Scaled Datatype (single, double in VB.NET) and Decimal
Scaled data type (decimal) is governed by considerations:

1. Size: Use Binary Scaled Datatype (doubles and singles) if you need to store a
number that is too large or too small for a Decimal Scaled Datatype (decimal).
Yes - although with very very large and very very small numbers, you
need to look quite carefully at the accuracy anyway, as things will
start to get less accurate as you reach denormal numbers or the top
end.
2. Exactness: If you deal with quantities that start life with, and require,
exact representation, like the price of a shirt, use Binary Scaled Datatype
(doubles and singles).
I think you meant decimal scaled datatype (decimal) here. It's also
only appropriate for quantities that start life with an exact *decimal*
representation.
If you deal with quantities that start life with
imprecise representations, and can never have precise representation, like the
length of a diameter of a tyre, use floating point data types (double and single
in VB.NET). Rough guide: For Financial applications use the fixed point
datatype, for scientific applications use floating point data types.
Yes.
3. Tolerating round off errors: The Decimal Scaled Datatype (Decimal Datatype)
is less prone to round off errors.
Hmm... I think I'd want to see a bit more detail about exactly what you
mean before agreeing to that.
4. Performance: Floating datatypes (Double and Single in VB.NET) are in the
order of 40 times faster than the Decimal Scaled Datatype(Decimal Datatype).

What do ya reckon?
That's about right - although that 40x faster was only based on a
single benchmark I did (assuming you've just taken it from my page).
A further issue: Double V Single (Float in C#). At the moment I'd be inclined to
always choose a double by default, even though it is slower and takes more
memory.
It may well be just as fast, or even possibly faster - many processors
do all floating point arithmetic at double precision anyway, so there
may be less conversion cost to start with.
In any given app, it's speed is always apparent while an inaccuracy
might not be until a disaster occurs. With processor power and memory becoming
larger and cheaper I think coding optimizations become less important. In
database apps, to take a specific type of app, the major speed bottlenecks will
be in the number of records (and the number of fields) coming across the network
rather than choosing a double over a single.

In any case have you done any performance tests of Doubles V Singles?


I haven't, to be honest - mostly because I'd never really consider
using a single unless I had a particularly good reason to - I agree
with your reasoning here.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #14

P: n/a
> -------------------------------------------------
Jon Skeet wrote (at this indent level):
-------------------------------------------------
John Bentley Wrote:
In languages like VBA we have fixed and
floating point datatypes but the important difference is not this
one. It is the base of the scaling.
Which is why it's important to say "decimal floating point" for
decimal :)


:) You're perspective is that even a fixed point datatype's point floats.
.. I have thought of a quick way to appreciate the
difference between a Base 2 scaling and a Base 10 scaling. Base 2
scaling is sliding the (binary) point while the mantissa is still in
base 2. Base 10 scaling is sliding the (decimal) point after the
mantissa has been converted to base 10. You know this, I'm just
drilling it into my own understanding.


Right - that was the idea of "floating binary point" and "floating
decimal point" (rather than the versions with the first two words
reversed).


I see. You are using "floating", perhaps, where I might use "scaling". We both
mean the same. I'd suggest that using "floating" as you have might force the
loss of distinction between a fixed point datatype and a floating point
datatype. Maybe that's the "polictal agenda" :) you have: to show others that
the distinction is misleading to begin with. But we would need to distinguish
between datatypes whose represented number has a set number of decimal places
versus datatypes where the represented number has a varying number of decimal
places.

The question here, as an aside: Is there a standard for financial
transactions which specifies a maximum number of digits to which you
should store currency amounts? (I have posted this in a new thread:
Standard for financial applications specifying maximum number of
decimal places?)


I suspect there are different standards for different situations.


I have posted in this newsgroup but will post in alt.accounting.
The problem arises not only for division, but as my previous
interest example shows, for multiplication too. How should we
specify the problem? Perhaps, it is a only a problem of how we
should round a more precise number to a less precise number.


Exactly - that problem is always going to be there, and requires
appropriate business rules.


Yes That's a more fundamental way of understanding it: it's merely a business
rules problem.
I'm not sure what you mean by a "sparse value" here. Put it this
way:
Suppose we had two data types, one of which represented a thousand
numbers between 0 and 10000, and another of which represented a
hundred
numbers between 0 and 10000000 - there will be more of a "gap"
between represented numbers in the second type than in the first
type, on
average.


Let us, so that we can have more numbers that are representble by
both datatypes, have:
SuperCool 50 (Decimal Analogy): 11 numbers between 0 and 50 (0, 5,
10, 15, ... 50)
SuperCool 100 (Double Analogy): 6 numbers between 0 and 100 (0, 20,
40, 60, 80, 100)

In common we have (0, 20, 40)

Under this scheme, yes, the datatype SuperCool 100 takes up less
memory and also is representing numbers more sparsely (I like this
phrase of yours).


Yes.
However, If we ask "what number can SuperCool 50 hold that SuperCool
100 cannot, within the range of SuperCool 50?" we can give a number:
5, for example. If, on the other hand, I ask you "what number can a
Decimal hold that a Double cannot, withhin the range (and precision)
of a Decimal?" can you come up with any number?


I presume you meant to ask the other way round -


Thanks for being careful with what I have wrote versus what I intended. However
in this case I did have the the right way around. Thanks to your answer, though,
I believe I can clarify the issue. Perhaps even to the stage of agreement.

Your claim is that the double type (example SuperCool 100) represents values
more sparsely. The decimal type (example SuperCool 50) takes more memory, has a
smaller range, and has less decimal places to represent numbers. This must mean,
so goes you're argument, it represents values more densely (though not
necessarily more numbers), just like SuperCool 50. That is, between any two
consecutive numbers that a double type (eg SuperCool 100) can hold there will be
many more numbers that the decimal type (eg SuperCool 50) can hold, provided we
are within the range of the decimal type (eg SuperCool 50)

So with our example datatypes let's pick an interval within the range of
SuperCool 50 (Decimal), but let's look first at an interval between two
SuperCool 100 (double) numbers: between 20 and 40. Between 20 and 40:
* SuperCool 100 (Double) can only store two numbers. 20 and 40.
* SuperCool 50 (Decimal) can store many more numbers. 20, 25, 30, 35, 40.

....
but it's the
*precision* part that gets us into trouble. For instance, take the
closest double to 0.1:

0.100000000000000005551115123125782702118158340454 1015625

That's certainly within the *range* of decimal, but isn't exactly
representable as a decimal. If you could clarify exactly what you mean
by "within the precision of a decimal" I could possibly answer better.


By "within the precision of a decimal" I meant, confusingly and unclearly, to
the number of decimal places that a Decimal Datatype can hold.

So with our .NET datatypes we can ask: are there two consecutive numbers that a
double type can hold for which there will be many more numbers that a decimal
type can hold, provided we are within the range of the decimal type and talking
of numbers with 28 decimal places or less?

I can now answer this question: Yes

Sub DatatypeDensities()
Dim number As Decimal = 0.10000000000000000555111512D
Dim interval As Decimal = 0.00000000000000000000000001D
Dim i As Integer

For i = 1 To 10
Debug.WriteLine("")
Debug.WriteLine("Decimal Exact: " & number)
Debug.WriteLine("Double Exact: " &
DoubleConverter.ToExactString(number))
Debug.WriteLine("")
number += interval
Next i
End Sub

Gives:

Decimal Exact: 0.10000000000000000555111512
Double Exact: 0.100000000000000005551115123125782702118158340454 1015625
Decimal Exact: 0.10000000000000000555111513
Double Exact: 0.100000000000000005551115123125782702118158340454 1015625
Decimal Exact: 0.10000000000000000555111514
Double Exact: 0.100000000000000005551115123125782702118158340454 1015625

.....

Note the loop does not traverse an entire interval between two numbers that are
representable in Double. It is enough to show that there are many consecutive
numbers storeable as Decimal Datatypes between two consecutive numbers that are
storeable as a Double type. Enough because we only wish to support your claim
that the double type represents numbers more sparsely.

Where was my confusion? Over the meaning of precision.

In

MSDN > KB > (Complete) Tutorial to Understand IEEE Floating-Point Errors

we have ".. Precision refers to the number of digits that you can represent"

But in

MSDN > NET Framework Class Library> Decimal Structure

we have
"Conversions from Decimal to Single or Double are narrowing conversions that
might lose precision but not information about the magnitude of the converted
value."

If precision would only mean the number of digits you can represent then
converting a Decimal to a double would be an increase in precision. A double can
hold more decimal places. If we take a more general understanding of precision,
that a higher precision means an ability to hold finer distinctions, then it
becomes clear that the Decimal datatype is more precise even though it can
represent numbers with a lower number of decimal places. The loop above shows
this.

The confusion comes from failing (as I did) to see the difference between
conceptual numbers and numbers that can be stored in a datatype. When talking
about conceptual numbers, that is any number we can come up with in our head,
then a higher precision number will necessarily be one with a greater number of
decimal places. In a datatype just because you have more decimal places to play
with doesn't mean you are a more precise datatype.

A fuller list of rules could be:

Choosing between a Binary Scaled Datatype (single, double in VB.NET)
and Decimal Scaled data type (decimal) is governed by considerations:

1. Size: Use Binary Scaled Datatype (doubles and singles) if you
need to store a number that is too large or too small for a Decimal
Scaled Datatype (decimal).


Yes - although with very very large and very very small numbers, you
need to look quite carefully at the accuracy anyway, as things will
start to get less accurate as you reach denormal numbers or the top
end.
2. Exactness: If you deal with quantities that start life with, and
require, exact representation, like the price of a shirt, use Binary
Scaled Datatype (doubles and singles).


I think you meant decimal scaled datatype (decimal) here. It's also
only appropriate for quantities that start life with an exact
*decimal* representation.


Yes I did mean the decimal scaled datatype. Thanks.
If you deal with quantities that start life with
imprecise representations, and can never have precise
representation, like the length of a diameter of a tyre, use
floating point data types (double and single in VB.NET). Rough
guide: For Financial applications use the fixed point datatype, for
scientific applications use floating point data types.


Yes.


Again, sorry I meant:

Exactness: If you deal with quantities that start life with, and require, exact
representation, like the price of a shirt, use Decimal Scaled Datatype. If you
deal with quantities that start life with imprecise representations, and can
never have exact representation, like the length of a diameter of a tyre, use
Binary Scaled data types (double and single in VB.NET). Rule of Thumb: For
Financial applications use the Decimal Scaled datatypes, for scientific
applications use Binary Scaled data types.
3. Tolerating round off errors: The Decimal Scaled Datatype (Decimal
Datatype) is less prone to round off errors.


Hmm... I think I'd want to see a bit more detail about exactly what
you
mean before agreeing to that.


The Decimal Scaled Datatype can store many more numbers exactly that the Binary
Scaled Datatype cannot. For example, 0.1.
As previously shown when looping 0.1 in an addition the Decimal Datatype will
come out exactly, the double will not.
Neither can exactly store a third. When looping a third, and testing the result,
both reveal a round off error.

Therefore both a prone to round off errors but the Decimal Scaled Datatype less
so.

Recall I disagree with the "no round-off" errors in

"The Decimal value type is appropriate for financial calculations requiring
large numbers of significant integral and fractional digits and no round-off
errors."

The round off errors, we aggreed, where due to business rule considerations.
However, to say there are no round off errors in the Decimal type, and leave it
at that, would be to lull into a false sense of security (you might not bother
with the business rules)

MSDN > NET Framework Class Library> Decimal Structure
4. Performance: Floating datatypes (Double and Single in VB.NET) are
in the order of 40 times faster than the Decimal Scaled
Datatype(Decimal Datatype).

What do ya reckon?


That's about right - although that 40x faster was only based on a
single benchmark I did (assuming you've just taken it from my page).


Yes. I'm just taking this from your page. In the article that I'm storing the
results of our discussion I do reference you're page and quote you as saying
"quickly devised test". If I get around to publishing the article, I will
properly quote your qualifications about the test.
A further issue: Double V Single (Float in C#). At the moment I'd be
inclined to always choose a double by default, even though it is
slower and takes more memory.


It may well be just as fast, or even possibly faster - many processors
do all floating point arithmetic at double precision anyway, so there
may be less conversion cost to start with.
In any given app, it's speed is always apparent while an inaccuracy
might not be until a disaster occurs. With processor power and
memory becoming larger and cheaper I think coding optimizations
become less important. In database apps, to take a specific type of
app, the major speed bottlenecks will be in the number of records
(and the number of fields) coming across the network rather than
choosing a double over a single.

In any case have you done any performance tests of Doubles V Singles?


I haven't, to be honest - mostly because I'd never really consider
using a single unless I had a particularly good reason to - I agree
with your reasoning here.


The whole motivation for starting down the nonintegral road was in response to
my building a Stopwatch class to time code. I'm close to returning to that
project and should then be able to test it for myself :)
Jul 21 '05 #15

P: n/a
John Bentley <no*****@nowhere.com> wrote:
However, If we ask "what number can SuperCool 50 hold that SuperCool
100 cannot, within the range of SuperCool 50?" we can give a number:
5, for example. If, on the other hand, I ask you "what number can a
Decimal hold that a Double cannot, withhin the range (and precision)
of a Decimal?" can you come up with any number?
I presume you meant to ask the other way round -


Thanks for being careful with what I have wrote versus what I intended.


In that case, an example is trivial: 0.1. That is a number which
decimal can hold (precisely) but which double can't (precisely).
However
in this case I did have the the right way around. Thanks to your answer, though,
I believe I can clarify the issue. Perhaps even to the stage of agreement.

Your claim is that the double type (example SuperCool 100) represents values
more sparsely. The decimal type (example SuperCool 50) takes more memory, has a
smaller range, and has less decimal places to represent numbers. This must mean,
so goes you're argument, it represents values more densely (though not
necessarily more numbers), just like SuperCool 50. That is, between any two
consecutive numbers that a double type (eg SuperCool 100) can hold there will be
many more numbers that the decimal type (eg SuperCool 50) can hold, provided we
are within the range of the decimal type (eg SuperCool 50)
No - that's putting a level of precision in that I wouldn't necessarily
claim. It may be true, but I wouldn't like to say that for sure.
Doubles are more dense at some places and less dense at others - but
*overall* decimal is more dense.
but it's the
*precision* part that gets us into trouble. For instance, take the
closest double to 0.1:

0.100000000000000005551115123125782702118158340454 1015625

That's certainly within the *range* of decimal, but isn't exactly
representable as a decimal. If you could clarify exactly what you mean
by "within the precision of a decimal" I could possibly answer better.


By "within the precision of a decimal" I meant, confusingly and unclearly, to
the number of decimal places that a Decimal Datatype can hold.


And that's fairly well documented for decimal: 28 or 29. It's far
harder for double, as it can vary between "none at all" (for very large
numbers) and "quite a lot" for very small numbers.
So with our .NET datatypes we can ask: are there two consecutive numbers that a
double type can hold for which there will be many more numbers that a decimal
type can hold, provided we are within the range of the decimal type and talking
of numbers with 28 decimal places or less?
Again, talking about numbers "with 28 decimal places or less" feels odd
to me, because the double closest to 0.1, for instance, has more than
28 decimal places - many, many doubles will.

<snip>
Note the loop does not traverse an entire interval between two numbers that are
representable in Double.
Yes, although I could give you opposite examples as well: 0 and
Double.Epsilon are two doubles with *no* decimals between them.
It is enough to show that there are many consecutive
numbers storeable as Decimal Datatypes between two consecutive numbers that are
storeable as a Double type. Enough because we only wish to support your claim
that the double type represents numbers more sparsely.
More sparsely *in general* though, which is not the same as more
sparsely at every point.
Where was my confusion? Over the meaning of precision.

In

MSDN > KB > (Complete) Tutorial to Understand IEEE Floating-Point Errors

we have ".. Precision refers to the number of digits that you can represent"
Which is fine when you're talking about decimal, but not when talking
about doubles - and when you're talking about numbers in general, and
whether or not they can be represented as doubles/decimals, it doesn't
make as much sense.
But in

MSDN > NET Framework Class Library> Decimal Structure

we have
"Conversions from Decimal to Single or Double are narrowing conversions that
might lose precision but not information about the magnitude of the converted
value."

If precision would only mean the number of digits you can represent then
converting a Decimal to a double would be an increase in precision. A double can
hold more decimal places.
Or fewer, depending on the scale.
If we take a more general understanding of precision,
that a higher precision means an ability to hold finer distinctions, then it
becomes clear that the Decimal datatype is more precise even though it can
represent numbers with a lower number of decimal places. The loop above shows
this.
Partially, yes.
The confusion comes from failing (as I did) to see the difference between
conceptual numbers and numbers that can be stored in a datatype. When talking
about conceptual numbers, that is any number we can come up with in our head,
then a higher precision number will necessarily be one with a greater number of
decimal places. In a datatype just because you have more decimal places to play
with doesn't mean you are a more precise datatype.
To me, precision is a measure of the difference between a theoretical
exact value and the represented value closest to it. The greater the
precision, the smaller the difference is. In that sense, the precision
of a conceptual number doesn't make sense, because there's no
representation involved.
Again, sorry I meant:

Exactness: If you deal with quantities that start life with, and require, exact
representation, like the price of a shirt, use Decimal Scaled Datatype. If you
deal with quantities that start life with imprecise representations, and can
never have exact representation, like the length of a diameter of a tyre, use
Binary Scaled data types (double and single in VB.NET). Rule of Thumb: For
Financial applications use the Decimal Scaled datatypes, for scientific
applications use Binary Scaled data types.
Fine.
3. Tolerating round off errors: The Decimal Scaled Datatype (Decimal
Datatype) is less prone to round off errors.


Hmm... I think I'd want to see a bit more detail about exactly what
you
mean before agreeing to that.


The Decimal Scaled Datatype can store many more numbers exactly that the Binary
Scaled Datatype cannot.


The reverse is true as well though.
For example, 0.1.
As previously shown when looping 0.1 in an addition the Decimal Datatype will
come out exactly, the double will not.
True - but when looping the double closest to 0.1 in an addition, the
double will come out exactly, the decimal will not (as it can't exactly
represent the double closest to 0.1 in the first place).
Neither can exactly store a third. When looping a third, and testing the result,
both reveal a round off error.

Therefore both a prone to round off errors but the Decimal Scaled Datatype less
so.
Only because you chose in the above to increment by an amount which is
exactly representable in decimal but not in double.
Recall I disagree with the "no round-off" errors in

"The Decimal value type is appropriate for financial calculations requiring
large numbers of significant integral and fractional digits and no round-off
errors."

The round off errors, we aggreed, where due to business rule considerations.
However, to say there are no round off errors in the Decimal type, and leave it
at that, would be to lull into a false sense of security (you might not bother
with the business rules)


True.
That's about right - although that 40x faster was only based on a
single benchmark I did (assuming you've just taken it from my page).


Yes. I'm just taking this from your page. In the article that I'm storing the
results of our discussion I do reference you're page and quote you as saying
"quickly devised test". If I get around to publishing the article, I will
properly quote your qualifications about the test.


Goodo :)
I haven't, to be honest - mostly because I'd never really consider
using a single unless I had a particularly good reason to - I agree
with your reasoning here.


The whole motivation for starting down the nonintegral road was in response to
my building a Stopwatch class to time code. I'm close to returning to that
project and should then be able to test it for myself :)


You might want to look at
http://www.pobox.com/~skeet/csharp/benchmark.html

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #16

P: n/a
Jon,
-------------------------------------------------
Jon Skeet wrote (at this indent level):
-------------------------------------------------
John Bentley <no*****@nowhere.com> wrote:

No - that's putting a level of precision in that I wouldn't
necessarily
claim. It may be true, but I wouldn't like to say that for sure.
Doubles are more dense at some places and less dense at others - but
*overall* decimal is more dense.
OK. That teaches me.
Note the loop does not traverse an entire interval between two
numbers that are representable in Double.
Yes, although I could give you opposite examples as well: 0 and
Double.Epsilon are two doubles with *no* decimals between them.


Good example.
It is enough to show that there are many consecutive
numbers storeable as Decimal Datatypes between two consecutive
numbers that are storeable as a Double type. Enough because we only
wish to support your claim that the double type represents numbers
more sparsely.


More sparsely *in general* though, which is not the same as more
sparsely at every point.


Yes. It is good to reiterate your point like that.
Where was my confusion? Over the meaning of precision.

In

MSDN > KB > (Complete) Tutorial to Understand IEEE Floating-Point
Errors

we have ".. Precision refers to the number of digits that you can
represent"


Which is fine when you're talking about decimal, but not when talking
about doubles


With doubles, we can be interested in a greater or lesser number of decimal
places. For example, we might round a double to 2 decimal places and say that we
are *using* a double with a precision to two decimal places (even though it
still might be *storing* many more).
- and when you're talking about numbers in general, and
whether or not they can be represented as doubles/decimals, it doesn't
make as much sense.
I'd offer numbers in general, or "conceptual numbers" as I call them, can have a
specified number of decimal places. If I ask how far it to Paris? You could
reply "About 20 km" or "20.5 Km". One of these conceptual number has a greater
precision (thought of as strictly the number of decimal places.

The confusion comes from failing (as I did) to see the difference
between conceptual numbers and numbers that can be stored in a
datatype. When talking about conceptual numbers, that is any number
we can come up with in our head, then a higher precision number will
necessarily be one with a greater number of decimal places. In a
datatype just because you have more decimal places to play with
doesn't mean you are a more precise datatype.
To me, precision is a measure of the difference between a theoretical
exact value and the represented value closest to it.


I'd offer, a conceptual number has either a range (explicit or implied) or not.
If not it is an exact value.

"About 20 km to Paris" may have an implied range of plus or minus 5km. "There
are 20 people in the room" is a sentence with an exact conceptual number.

A number that has a range can have a greater or less precision (of finer or
coarser level of differentiation). We might say that an exact number is
absolutely precise.

Secondly, and following on, I'd offer that we do well to distinguish precision
from accuracy. We could change your definition a bit.

For conceptual numbers with a range: Precision is a measure of the difference
between a theoretical exact value and the max and min that specifies the range.

For exact conceptual numbers. Precision is absolute, the represented
(conceptual) number being the same as the theoretical exact value.

Accuracy could be the how well a conceptual number captures an exact value,
where that exact value is a truth claim.

A number, for example, can be very precise but wildly inaccurate:

"There are 20.763 Kms (+/- 0.001 km) left to go before Paris?" might be very
precise but inaccurate if we are in Hanoi.

On the other hand a number can be completely accurate but very imprecise:

"Paris is between 1 metre and 20 light years away" would be completely
accurate, if we are in Hanoi, but very imprecise.

I might be confusing the issue :) and you'll have to be particularly tolerant
(pun intented) today as I'm very sleepy.

The greater the
precision, the smaller the difference is.
Yes, it is honing in an exact value (if the conceptual number is not already an
exact number).
In that sense, the precision
of a conceptual number doesn't make sense, because there's no
representation involved.


I suppose a conceptual number either has no representation (as in "Here's follow
a number: 2.6") or can have representation (namely when it represents a truth
claim "Paris is 2.6 Km away)

If precision is a difference between a conceptual number and an exact
theoretical value, or a between the limits of the range of a conceptual number
and an exact theoretical value Then:
For conceptual numbers that have no representation, yes, there is no exact
theoretical value to speak of. That is, there is no value that the conceptual
number "aims at". So 2.45 is NOT more precise than 2.6 because 2.45 is not
closer to anything.

For conceptual numbers that do represent something, then there is an exact
theoretical value which is aimed at. Imagine Paris is 3 km away, exactly. 2.45
*would be* more precise than 2.6, even though it is more innaccurate. That is,
2.45 is moving closer to an exact value than 2.6 despite that movement being
toward an exact value that is futher away from the truth.

The Decimal Scaled Datatype can store many more numbers exactly that
the Binary Scaled Datatype cannot.


The reverse is true as well though.
For example, 0.1.
As previously shown when looping 0.1 in an addition the Decimal
Datatype will come out exactly, the double will not.


True - but when looping the double closest to 0.1 in an addition, the
double will come out exactly, the decimal will not (as it can't
exactly
represent the double closest to 0.1 in the first place).
Neither can exactly store a third. When looping a third, and testing
the result, both reveal a round off error.

Therefore both a prone to round off errors but the Decimal Scaled
Datatype less so.


Only because you chose in the above to increment by an amount which is
exactly representable in decimal but not in double.


And is it not the case that since we generally code in Base 10 numbers, and
culturely we work in base 10, especially in Financial apps say, the numbers that
we choose to increment by (and perform other operations with) are more likely to
be exactly representable in the Decimal Datatype. By "code in base 10 numbers" i
mean we assign base 10 numbers to variable: Dim int as Integer = 8.5 rahter than
Dim int as Integer = BinToDec(1000.1).

Would you agree that the Decimal type is less prone to round off errors than the
double in, say, a finacial applications?
The whole motivation for starting down the nonintegral road was in
response to my building a Stopwatch class to time code. I'm close to
returning to that project and should then be able to test it for
myself :)


You might want to look at
http://www.pobox.com/~skeet/csharp/benchmark.html


It's clear I would do well just to read your entire web site :)

I've done two things since the last post. Got my benchmarking code working and
consolidated my understanding of attributes. So after a very superficical look
at your benchmarking code I can say that I especially like they way you use
custom attributes to flag which procedures to time.

Just for you information the way I've done things is like this:
1. I use procedure delegates to reference the procedures.
2. I have set it up to always compare two procedures.

So I have one line to call my class:

Dim pt As New PerformanceTester(AddressOf MehtodA, AddressOf MethodB)

Which produces output like this:

24 timings taken for each procedure.
MethodA
Average Time: 0.00823492697163094 Seconds
MethodB
Average Time: 0.00600536055094526 Seconds
MethodA is slower than MethodB by:
A factor of 1.37
A Speed of 0.002 seconds(s), 2 milliseconds(ms)

MethodB is the fastest.

3. I use the QueryPerformanceCounter to give near millisecond (0.001) accuracy
(I use the term with trepidation). See Microsoft KB Articles Q172338 and
Q306978.

My tests bare this out

with...
Sub MethodC()
Thread.CurrentThread.Sleep(100)
End Sub

Sub MethodD()
Thread.CurrentThread.Sleep(50)
End Sub

I get ...
124 timings taken for each procedure.
MethodC
Average Time: 0.0995711790546095 Seconds
MethodD
Average Time: 0.0494590072021678 Seconds

MethodC is slower than MethodD by:
A factor of 2.01
A Speed of 0.050 seconds(s), 50 milliseconds(ms)

MethodD is the fastest.

4. When I tested my code with two procedures with Identical code I found that
different times would be returned. This would be true even though I run each
procedure twice, changing the order of the call. The first method call would be
the fasted. I fixed this (I wouldn't say "solved" because it was a bit of a stab
in the dark) by inserting this line before testing the procedures:
Threading.Thread.CurrentThread.Sleep(1)

You seem to have a better understanding of what is going on by dealing with the
garbage collection before you're tests. The garbase collection and JIT compiling
is something I'm trying to avoid learning about :)

5. Taking Inspiration from Ken Getz, et al. In Access 2000 Dev Hand book I call
each procedure twice but within a loop. The number of interations is determined
by intially comparing a current average with a previous average. When these
averages get sufficiently close we have determined the number of iterations we
need. We then restart the test for both procedures to yeild a final average for
each.

I do however have a big problem with which you may be able to help. (This risks
requireing a new newsgroup thread)

Dim pt As New PerformanceTester(AddressOf MehtodA, AddressOf MethodB)

Works fine within my standard libary. However when I compile my standard library
as a class libary and reference it in another project.

I get " A first chance exception of type 'System.NullReferenceException'
occurred in johnbentley.standardlibrary.dll Additional information: Object
reference not set to an instance of an object."

It seems I get use delegates accross an assembly boundary. Do you know of some
quick away around this Or will I simply need to put the time in understanding
AppDomains, Remoting, and Serialization?
That's about right - although that 40x faster was only based on a
single benchmark I did (assuming you've just taken it from my page).


Yes. I'm just taking this from your page. In the article that I'm
storing the results of our discussion I do reference you're page and
quote you as saying "quickly devised test". If I get around to
publishing the article, I will properly quote your qualifications
about the test.


Goodo :)


My results with a quickly devised test yeilded a difference only in the order of
3.5 : Decimal was 3.5x slower. My test was a simple addition of an integral. Eg

Sub DecimalSpeed()
Dim i As Integer
Dim x As Decimal

For i = 1 To 500000
x += Convert.ToDecimal(i)
Next
End Sub

What sort of calculation did you try?

I haven't, to be honest - mostly because I'd never really consider
using a single unless I had a particularly good reason to - I agree
with your reasoning here.


My quickly devised test with a Double V Single. Double only was 1.5 times
slower.
Jul 21 '05 #17

P: n/a
John Bentley <no*****@nowhere.com> wrote:

<snip>
- and when you're talking about numbers in general, and
whether or not they can be represented as doubles/decimals, it doesn't
make as much sense.
I'd offer numbers in general, or "conceptual numbers" as I call them, can have a
specified number of decimal places. If I ask how far it to Paris? You could
reply "About 20 km" or "20.5 Km". One of these conceptual number has a greater
precision (thought of as strictly the number of decimal places.


But the only numbers involved there are "20km" and "20.5km". It could
well be that the "20km" is just as exact as the "20.5km" - it's because
you've put the "about" in there that it's obvious that it's not.

I think we need to be clear about whether or not we consider the
conceptual number "20" to be the same as the conceptual number
"20.0000". I do - but I suspect you don't.
To me, precision is a measure of the difference between a theoretical
exact value and the represented value closest to it.


I'd offer, a conceptual number has either a range (explicit or implied) or not.
If not it is an exact value.


I would always consider a conceptual number to be exact.
"About 20 km to Paris" may have an implied range of plus or minus 5km. "There
are 20 people in the room" is a sentence with an exact conceptual number.
Whereas I would say that the "about 20km" isn't a number; it's a number
*and* an implied statement about the likely difference between the
number and the actual distance to Paris.

It's all a matter of definition though.

<snip>
A number, for example, can be very precise but wildly inaccurate:
Ah - no, I'd say that a number in itself is never inaccurate, because a
number is just a number. It's the statement about distance which is
inaccurate, not the number itself.

<snip>
I might be confusing the issue :) and you'll have to be particularly tolerant
(pun intented) today as I'm very sleepy.
It's all really in the realms of definition and even philosophy, to be
honest - I'm no more right than you, or vice versa.
In that sense, the precision
of a conceptual number doesn't make sense, because there's no
representation involved.


I suppose a conceptual number either has no representation (as in "Here's follow
a number: 2.6") or can have representation (namely when it represents a truth
claim "Paris is 2.6 Km away)


Right - I only consider the former to be a number; the latter is a
statement *involving* a number, and it's the statement which is
inaccurate, not the number itself.
If precision is a difference between a conceptual number and an exact
theoretical value, or a between the limits of the range of a conceptual number
and an exact theoretical value Then:
For conceptual numbers that have no representation, yes, there is no exact
theoretical value to speak of. That is, there is no value that the conceptual
number "aims at". So 2.45 is NOT more precise than 2.6 because 2.45 is not
closer to anything.
Exactly.
Therefore both a prone to round off errors but the Decimal Scaled
Datatype less so.


Only because you chose in the above to increment by an amount which is
exactly representable in decimal but not in double.


And is it not the case that since we generally code in Base 10 numbers, and
culturely we work in base 10, especially in Financial apps say, the numbers that
we choose to increment by (and perform other operations with) are more likely to
be exactly representable in the Decimal Datatype. By "code in base 10 numbers" i
mean we assign base 10 numbers to variable: Dim int as Integer = 8.5 rahter than
Dim int as Integer = BinToDec(1000.1).

Would you agree that the Decimal type is less prone to round off errors than the
double in, say, a finacial applications?


Yes - the "in financial applications" being utterly necessary to the
truth of the statement.
The whole motivation for starting down the nonintegral road was in
response to my building a Stopwatch class to time code. I'm close to
returning to that project and should then be able to test it for
myself :)


You might want to look at
http://www.pobox.com/~skeet/csharp/benchmark.html


It's clear I would do well just to read your entire web site :)


As ever, any comments would be welcome :)

(I'm just in the process of trying to make the front page somewhat
clearer, so don't be alarmed if it changes.)
I've done two things since the last post. Got my benchmarking code working and
consolidated my understanding of attributes. So after a very superficical look
at your benchmarking code I can say that I especially like they way you use
custom attributes to flag which procedures to time.

Just for you information the way I've done things is like this:
<snip>

Right. That sounds a fine way of doing things - it all depends on
exactly how you want to use the tests.
3. I use the QueryPerformanceCounter to give near millisecond (0.001) accuracy
(I use the term with trepidation). See Microsoft KB Articles Q172338 and
Q306978.
Yes - I don't use QueryPerformanceCounter at the moment in order to
keep the code portable, but I could in the future. Note that if you
need that kind of level of precision, your benchmarks probably aren't
running for long enough anyway.
4. When I tested my code with two procedures with Identical code I found that
different times would be returned. This would be true even though I run each
procedure twice, changing the order of the call. The first method call would be
the fasted. I fixed this (I wouldn't say "solved" because it was a bit of a stab
in the dark) by inserting this line before testing the procedures:
Threading.Thread.CurrentThread.Sleep(1)
You should just call Thread.Sleep(1) if you're going to call Sleep at
all - it's a static (shared) method, not an instance method. The above
would not compile in C#.
You seem to have a better understanding of what is going on by dealing with the
garbage collection before you're tests. The garbase collection and JIT compiling
is something I'm trying to avoid learning about :)
Calling the garbage collector explicitly (twice, allowing it to run
finalizers etc between calls) might help here. I don't think sleeping
is generally a good idea though.
5. Taking Inspiration from Ken Getz, et al. In Access 2000 Dev Hand book I call
each procedure twice but within a loop. The number of interations is determined
by intially comparing a current average with a previous average.
Do you mean that you're calling the *delegate* within the loop? If so,
that may well be affecting the benchmark slightly - I think it's better
to bring the loop as close to the tested code as possible. I suspect it
won't make a huge difference though.
When these
averages get sufficiently close we have determined the number of iterations we
need. We then restart the test for both procedures to yeild a final average for
each.
Right - at the moment I get the developer to specify how many
iterations to run for, to avoid the problem above.
I do however have a big problem with which you may be able to help. (This risks
requireing a new newsgroup thread)

Dim pt As New PerformanceTester(AddressOf MehtodA, AddressOf MethodB)

Works fine within my standard libary. However when I compile my standard library
as a class libary and reference it in another project.

I get " A first chance exception of type 'System.NullReferenceException'
occurred in johnbentley.standardlibrary.dll Additional information: Object
reference not set to an instance of an object."

It seems I get use delegates accross an assembly boundary. Do you know of some
quick away around this Or will I simply need to put the time in understanding
AppDomains, Remoting, and Serialization?
If it's within the same AppDomain, just in a different assembly, it
shouldn't be a problem - but I'd need to see more code to work out what
was going on.
My results with a quickly devised test yeilded a difference only in the order of
3.5 : Decimal was 3.5x slower. My test was a simple addition of an integral. Eg

Sub DecimalSpeed()
Dim i As Integer
Dim x As Decimal

For i = 1 To 500000
x += Convert.ToDecimal(i)
Next
End Sub

What sort of calculation did you try?


Multiplication, I believe. However, your test above isn't very good -
most of the time will be spent performing Convert.ToDecimal or
Convert.ToDouble. I would change the test to take two doubles (or two
decimals) and just add *those* repeatedly. Note that which
doubles/decimals you add may well have an effect on the speed - for
instance, numbers of radically different scales (10000000 and 0.0001)
may take longer to add than similar numbers.
I haven't, to be honest - mostly because I'd never really consider
using a single unless I had a particularly good reason to - I agree
with your reasoning here.


My quickly devised test with a Double V Single. Double only was 1.5 times
slower.


I'm surprised it was even that much slower - but the same effect as
above may have been going on.

I might work on a little benchmark suite for these things.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #18

This discussion thread is closed

Replies have been disabled for this discussion.