473,378 Members | 1,236 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Floating point calculation

I'm working in an ARM (ARM9) system which does not have Floating point
co-processor or Floating point libraries. But it does support long long int
(64 bits).
Can you provide some link that would discuss about ways to emulate floating
point calculations with just long int or long long int. For eg., if i've a
formula X=(1-b)*Y + b*Z in floating point domain, i can calculate X with
just long ints (but, some data may be lost in final division; That's OK)

Floating Point:
X=(1-b)*Y + b*Z
/* 'b' is a floating point variable with 4 points precision and 'b' is in
the range of 0 to 1;X, Y and Z are unsigned int*/

With long int:
I can emulate the above calculation as:
X=((10000-10000*b)*Y +10000*b*Z)/10000

I'm in need of some link that would discuss this and any similar approach.

--
-Vinoth

Nov 14 '05 #1
7 3354

Vinoth wrote:
I'm working in an ARM (ARM9) system which does not have Floating point co-processor or Floating point libraries. But it does support long long int (64 bits).
Can you provide some link that would discuss about ways to emulate floating point calculations with just long int or long long int. For eg., if i've a formula X=(1-b)*Y + b*Z in floating point domain, i can calculate X with just long ints (but, some data may be lost in final division; That's OK)
Floating Point:
X=(1-b)*Y + b*Z
/* 'b' is a floating point variable with 4 points precision and 'b' is in the range of 0 to 1;X, Y and Z are unsigned int*/

With long int:
I can emulate the above calculation as:
X=((10000-10000*b)*Y +10000*b*Z)/10000

I'm in need of some link that would discuss this and any similar approach.
--
-Vinoth


One way of faking floating point that I thought of is to find free
multi-precision math libraries -- GNU mp and the library that comes
with GNU bc and dc come to mind -- since those libraries treat the
numbers as arrays of digits.

Gregory Pietsch

Nov 14 '05 #2
Vinoth wrote:
I'm working in an ARM (ARM9) system which does not have Floating point
co-processor or Floating point libraries. But it does support long long int
(64 bits).
Can you provide some link that would discuss about ways to emulate floating
point calculations with just long int or long long int. For eg., if i've a
formula X=(1-b)*Y + b*Z in floating point domain, i can calculate X with
just long ints (but, some data may be lost in final division; That's OK)

Floating Point:
X=(1-b)*Y + b*Z
/* 'b' is a floating point variable with 4 points precision and 'b' is in
the range of 0 to 1;X, Y and Z are unsigned int*/

With long int:
I can emulate the above calculation as:
X=((10000-10000*b)*Y +10000*b*Z)/10000

I'm in need of some link that would discuss this and any similar approach.

Hi, I'm a bit loth to step in here (reading this lurking in c.l.c, not a
C expert), but couldn't you implement floating-point using those long
longs? Write fmul, fdiv, fadd etc functions that mask off a long long
into sign, exponent and mantissa and deal with them.

Multiplication is like in standard index form (multiply the mantissa,
add the exponent) and with adding you multiply so the numbers have the
same exponent, add, then return to the normal form).

There is an IEEE specification for floating point (e.g. google IEEE
floating-point) that includes rules for what the bit patterns mean,
representation of small numbers (a special case for numbers between -1
and 1), infinities etc as well - probably better than coming up with
your own scheme. I don't know if this is worth the effort for you, or
if there are drawbacks I've not thought of, but I don't see why you
couldn't do all this in standard C.

This link seems good:
http://stevehollasch.com/cgindex/coding/ieeefloat.html

HTH, all the best,
Rob M

--
Rob Morris: arr emm four four five [at] cam dot ac dot uk
Nov 14 '05 #3


Vinoth wrote:
I'm working in an ARM (ARM9) system which does not have Floating point
co-processor or Floating point libraries. But it does support long long int
(64 bits).
Can you provide some link that would discuss about ways to emulate floating
point calculations with just long int or long long int. For eg., if i've a
formula X=(1-b)*Y + b*Z in floating point domain, i can calculate X with
just long ints (but, some data may be lost in final division; That's OK)

Floating Point:
X=(1-b)*Y + b*Z
/* 'b' is a floating point variable with 4 points precision and 'b' is in
the range of 0 to 1;X, Y and Z are unsigned int*/

With long int:
I can emulate the above calculation as:
X=((10000-10000*b)*Y +10000*b*Z)/10000

I'm in need of some link that would discuss this and any similar approach.


Your "emulation" should work fine, if the products and
sum in the numerator don't grow too large for `long'. If
you know enough about the ranges of Y and Z to be sure this
won't happen, all is well. If not, you can use `long long'
for the intermediate results:

X = ((10000LL - 10000LL*b) * Y + 10000LL*b * Z) / 10000LL;

There are a number of possible improvements you may want
to consider. The first is to get rid of those `10000LL*b'
computations, which is easy: instead of storing `b' itself,
store `10000 * b' in a `long' variable called `B':

X = ((10000LL - B) * Y + (long long)B * Z) / 10000LL;

Rearranging the expression with a little algebra can
eliminate one of the multiplications and permit a little more
of the computation to use plain `long' instead of `long long'
(which may be faster, especially if `long long' is emulated
in software):

X = Y + (long)((Z - Y) * (long long)B / 10000LL);

If you change the scaling factor from 10000 to something
that's a power of two, you can replace the division with a
shift. 16384 (1 << 14) is pretty close to your original
10000, so assuming that `B' is now `b * 16384' you'd have

X = Y + (long)( ((Z - Y) * (long long)B) >> 14 );

There's a potential trap here: if `Z - Y' is negative
so the product being shifted is also negative, C doesn't
specify exactly what happens with the right shift. Since
you're only concerned with one implementation you could
check whether it does what you want. If it doesn't, or
if you want to be sure the code will work elsewhere, too,
you could make sure that no negative numerators appear:

if (Z >= Y)
X = Y + (long)( ((Z - Y) * (long long)B) >> 14 );
else
X = Y - (long)( ((Y - Z) * (long long)B) >> 14 );

This is about as far as you can go with portable C --
which is a shame, really, because some machines are capable
of better. For example, there may be an instruction (or
instruction sequence) to multiply two 32-bit numbers and
yield a 64-bit product, but C cannot multiply two `long's to
get a `long long'. If you used 32-bit scaling instead of
the 14 bits shown above, the second term would simply be the
high-order 32 bits of the 64-bit product and the machine might
be able to extract it without shifting, but C has no portable
way to perform such dissections. It's possible that a smart
optimizing compiler might be able to exploit such capabilities
of the machine (I'd especially recommend looking into the
possibility of 32-bit scaling), but there are no guarantees.

What you're doing with the "emulation" is called "fixed-
point arithmetic," and the techniques can be applied in more
sophisticated form -- to get a properly-rounded answer, for
example, or to deal with numbers that have both integer and
fractional parts. A small amount of research may give you
some good ideas ...

--
Er*********@sun.com

Nov 14 '05 #4
Thanks to all for the information. I'm intrested in trying out all basic
operations on fixed-point arithmetic. Can you point to some free library
available? Google didn't help much.

"Eric Sosman" <er*********@sun.com> wrote in message
news:d6**********@news1brm.Central.Sun.COM...


Vinoth wrote:
I'm working in an ARM (ARM9) system which does not have Floating point
co-processor or Floating point libraries. But it does support long long
int
(64 bits).
Can you provide some link that would discuss about ways to emulate
floating
point calculations with just long int or long long int. For eg., if i've
a
formula X=(1-b)*Y + b*Z in floating point domain, i can calculate X with
just long ints (but, some data may be lost in final division; That's OK)

Floating Point:
X=(1-b)*Y + b*Z
/* 'b' is a floating point variable with 4 points precision and 'b' is in
the range of 0 to 1;X, Y and Z are unsigned int*/

With long int:
I can emulate the above calculation as:
X=((10000-10000*b)*Y +10000*b*Z)/10000

I'm in need of some link that would discuss this and any similar
approach.


Your "emulation" should work fine, if the products and
sum in the numerator don't grow too large for `long'. If
you know enough about the ranges of Y and Z to be sure this
won't happen, all is well. If not, you can use `long long'
for the intermediate results:

X = ((10000LL - 10000LL*b) * Y + 10000LL*b * Z) / 10000LL;

There are a number of possible improvements you may want
to consider. The first is to get rid of those `10000LL*b'
computations, which is easy: instead of storing `b' itself,
store `10000 * b' in a `long' variable called `B':

X = ((10000LL - B) * Y + (long long)B * Z) / 10000LL;

Rearranging the expression with a little algebra can
eliminate one of the multiplications and permit a little more
of the computation to use plain `long' instead of `long long'
(which may be faster, especially if `long long' is emulated
in software):

X = Y + (long)((Z - Y) * (long long)B / 10000LL);

If you change the scaling factor from 10000 to something
that's a power of two, you can replace the division with a
shift. 16384 (1 << 14) is pretty close to your original
10000, so assuming that `B' is now `b * 16384' you'd have

X = Y + (long)( ((Z - Y) * (long long)B) >> 14 );

There's a potential trap here: if `Z - Y' is negative
so the product being shifted is also negative, C doesn't
specify exactly what happens with the right shift. Since
you're only concerned with one implementation you could
check whether it does what you want. If it doesn't, or
if you want to be sure the code will work elsewhere, too,
you could make sure that no negative numerators appear:

if (Z >= Y)
X = Y + (long)( ((Z - Y) * (long long)B) >> 14 );
else
X = Y - (long)( ((Y - Z) * (long long)B) >> 14 );

This is about as far as you can go with portable C --
which is a shame, really, because some machines are capable
of better. For example, there may be an instruction (or
instruction sequence) to multiply two 32-bit numbers and
yield a 64-bit product, but C cannot multiply two `long's to
get a `long long'. If you used 32-bit scaling instead of
the 14 bits shown above, the second term would simply be the
high-order 32 bits of the 64-bit product and the machine might
be able to extract it without shifting, but C has no portable
way to perform such dissections. It's possible that a smart
optimizing compiler might be able to exploit such capabilities
of the machine (I'd especially recommend looking into the
possibility of 32-bit scaling), but there are no guarantees.

What you're doing with the "emulation" is called "fixed-
point arithmetic," and the techniques can be applied in more
sophisticated form -- to get a properly-rounded answer, for
example, or to deal with numbers that have both integer and
fractional parts. A small amount of research may give you
some good ideas ...

--
Er*********@sun.com

Nov 14 '05 #5


Vinoth wrote:
Thanks to all for the information. I'm intrested in trying out all basic
operations on fixed-point arithmetic. Can you point to some free library
available? Google didn't help much.


Wow! You must be an awfully fast reader to have
studied those "about 26,200" results in less than three
hours! I'm afraid I can't offer more help than those
26,200 articles can -- and since you've already found
them inadequate it follows that I'm inadequate, too.
Sorry.

(You might also want to Google for "top-posting."
The "about 84,400" articles won't take *you* very long,
and may convey something useful.)

--
Er*********@sun.com

Nov 14 '05 #6
On 19 May 2005 "Vinoth" <no*********@emailaddress.com> wrote:
Thanks to all for the information. I'm intrested in trying out all basic
operations on fixed-point arithmetic. Can you point to some free library
available? Google didn't help much.


[Previous messages removed]

Please do not top post to newsgroups.

---druck

--
The ARM Club Free Software - http://www.armclub.org.uk/free/
The 32bit Conversions Page - http://www.quantumsoft.co.uk/druck/
Nov 14 '05 #7
Vinoth wrote:

Thanks to all for the information. I'm intrested in trying out all basic
operations on fixed-point arithmetic. Can you point to some free library
available? Google didn't help much.


App Note 33 on this page:

http://www.arm.com/documentation/App...tes/index.html

contains some a basic introduction on implementing fixed point
binary arithmetic on ARM cores.

Chris
Nov 14 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Benny | last post by:
Hi, I would like to know, if I need to do some floting point operations (mainly multiplication and division) on each roll of a table, should I read the data out from the DB and do the...
8
by: Michel | last post by:
Hi there, I need to make a poisson distribution function that uses: double Math.Exp(double d) The d argument is a negative number in my case. When d becomes bigger and bigger, the result...
4
by: Dave | last post by:
Hi folks, I am trying to develop a routine that will handle sphere-sphere and sphere-triangle collisions and interactions. My aim is to develop a quake style collision engine where a player can...
31
by: JS | last post by:
We have the same floating point intensive C++ program that runs on Windows on Intel chip and on Sun Solaris on SPARC chips. The program reads the exactly the same input files on the two platforms....
687
by: cody | last post by:
no this is no trollposting and please don't get it wrong but iam very curious why people still use C instead of other languages especially C++. i heard people say C++ is slower than C but i can't...
9
by: Klaus Bonadt | last post by:
I have found strange behaviour in casting floating point values in C++ 6.0 to int: If I enter in the watch window while debugging in version 6.0 the following term: (1.4 - 1.0) * 10.0 the...
6
by: yong321 | last post by:
With this script <script> alert(8/(3-8/3)) </script> I hope to get 24, but I get 23.99999999999999 in IE 6.0.2800 and Firefox 1.5.0.4. alert(6/(1-3/4)) returns 24 as expected. I see a...
9
by: pereges | last post by:
Hi, I'm trying to write a macro for the relative difference function which is used to check the close enough floating point values. Is this correct way to write it ? : #define max(x, y) ((x)...
3
by: roche72 | last post by:
Help. I have a simple floating point calculation and depending on what thread it gets executed on, I get a different result. Here are the pieces of code: class OmniCamera { private float...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.