By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,635 Members | 2,174 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,635 IT Pros & Developers. It's quick & easy.

How Math.Cos & Math.Sin is implemented?

P: n/a
Hi,

I am writing a program that will take a lot of Math.Cos & Math.Sin
operation. I am afraid this will be source of performance impact.

Anybody knows how Math.cos & Math.Sin is implemented?
I suppose it just retrieving a huge pre-computed table, it might be
quick. I tried to cache all possible angle cos/sin in my own array , it
turns to be much faster to call Math.Cos & Math.Sin all the time.

Oct 13 '06 #1
Share this Question
Share on Google+
15 Replies


P: n/a

"Morgan Cheng" <mo************@gmail.comwrote in message
news:11*********************@f16g2000cwb.googlegro ups.com...
Hi,

I am writing a program that will take a lot of Math.Cos & Math.Sin
operation. I am afraid this will be source of performance impact.

Anybody knows how Math.cos & Math.Sin is implemented?
I suppose it just retrieving a huge pre-computed table, it might be
quick. I tried to cache all possible angle cos/sin in my own array , it
turns to be much faster to call Math.Cos & Math.Sin all the time.
It uses an algorithm. Tables only produce finite precision and it takes
something like 2*Pi*10^7 values to get the same precision of a float from a
good algorithm. You can reduce the table size by using symmetry and such but
you end up introducing overhead when you do that too and its still on the
same order.

Utlimately its your choice. You choose a table for speed and waste memory or
you use an algorithm for precision and not wste memory.
Oct 13 '06 #2

P: n/a
On 12 Oct 2006 19:59:05 -0700, "Morgan Cheng"
<mo************@gmail.comwrote:
>Anybody knows how Math.cos & Math.Sin is implemented?
Not at all. That is, all modern personal computer CPUs have a
built-in math coprocessor that directly provides trigonometric
functions. The Math.* methods simply forwards calls to these
optimized hardware facilities. So it's extremely unlikely that you'll
get better speed by writing a software algorithm in C# or even C++.
--
http://www.kynosarges.de
Oct 13 '06 #3

P: n/a
Morgan Cheng <mo************@gmail.comwrote:
I am writing a program that will take a lot of Math.Cos & Math.Sin
operation. I am afraid this will be source of performance impact.
Whenever you have performance fears, run tests. Most of the time, in my
experience, you'll find that performance fears about individual bits of
code are unfounded.

In this case, a quick test on my laptop showed Math.Sin being called
1,000,000,000 times in less than 2 seconds. Just how often is your
program going to call the trig methods?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Oct 13 '06 #4

P: n/a
In this case, a quick test on my laptop showed Math.Sin being called
1,000,000,000 times in less than 2 seconds. Just how often is your
program going to call the trig methods?
With different arguments each time, or were most of the calls optimized
away?

That's 500 sine computations per microsecond. A microsecond is maybe 2400
clock cycles on your Pentium. I don't recall if the Pentium clock is
divided down. Even if it's not, 4.8 clock cycles per sine computation is
not quite credible.

Oct 14 '06 #5

P: n/a
Michael A. Covington <lo**@ai.uga.edu.for.addresswrote:
In this case, a quick test on my laptop showed Math.Sin being called
1,000,000,000 times in less than 2 seconds. Just how often is your
program going to call the trig methods?

With different arguments each time, or were most of the calls optimized
away?

That's 500 sine computations per microsecond. A microsecond is maybe 2400
clock cycles on your Pentium. I don't recall if the Pentium clock is
divided down. Even if it's not, 4.8 clock cycles per sine computation is
not quite credible.
You're right. Here's a somewhat better test - I suspect things were
being optimised out before and I was too sleepy to notice. Oops!

using System;

class Test
{
static void Main()
{
double total = 0.23;

DateTime start = DateTime.Now;
for (int i=0; i < 100000000; i++)
{
total += Math.Sin(total);
total += Math.Cos(total);
}
DateTime end = DateTime.Now;

Console.WriteLine (end-start);
Console.WriteLine (total);
}
}

This is harder work, of course - 2 trig operations and 2 additions per
cycle. The timing on my box is 12 seconds for the 100,000,000 cycles.
Not as fast as before, but still likely to be fast enough for the OP
not to have to worry :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Oct 14 '06 #6

P: n/a

"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
| Michael A. Covington <lo**@ai.uga.edu.for.addresswrote:
| In this case, a quick test on my laptop showed Math.Sin being called
| 1,000,000,000 times in less than 2 seconds. Just how often is your
| program going to call the trig methods?
| >
| With different arguments each time, or were most of the calls optimized
| away?
| >
| That's 500 sine computations per microsecond. A microsecond is maybe
2400
| clock cycles on your Pentium. I don't recall if the Pentium clock is
| divided down. Even if it's not, 4.8 clock cycles per sine computation
is
| not quite credible.
|
| You're right. Here's a somewhat better test - I suspect things were
| being optimised out before and I was too sleepy to notice. Oops!
|
| using System;
|
| class Test
| {
| static void Main()
| {
| double total = 0.23;
|
| DateTime start = DateTime.Now;
| for (int i=0; i < 100000000; i++)
| {
| total += Math.Sin(total);
| total += Math.Cos(total);
| }
| DateTime end = DateTime.Now;
|
| Console.WriteLine (end-start);
| Console.WriteLine (total);
| }
| }
|
| This is harder work, of course - 2 trig operations and 2 additions per
| cycle. The timing on my box is 12 seconds for the 100,000,000 cycles.
| Not as fast as before, but still likely to be fast enough for the OP
| not to have to worry :)

Following is exactly what the JIT has produced from the loop in release mode
, the figures between () are the instruction latencies ( here for AMD64,
your's may vary).

dd0424 fld qword ptr [esp] (4)
d9fe fsin (93)
dc0424 fadd qword ptr [esp] (6)
dd1c24 fstp qword ptr [esp] (2)
dd0424 fld qword ptr [esp] (4)
d9ff fcos (92)
dc0424 fadd qword ptr [esp] (6)
dd1c24 fstp qword ptr [esp] (2)
83c601 add esi,1 (1)
81fe00e1f505 cmp esi,5F5E100h (4)
7cda jl 00cb00a0 (1)

that's a total 215 clock cycles per loop. On my box with a clock cycle of
~0,4329 nSec. that would account for ~93 nSec per loop, or 9.3 sec. for
100.000.000.000. Actually the test runs in 8.59 sec. this because there is
some amount of // execution done.

Willy.

Oct 14 '06 #7

P: n/a

Willy Denoyette [MVP] wrote:
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
| Michael A. Covington <lo**@ai.uga.edu.for.addresswrote:
| In this case, a quick test on my laptop showed Math.Sin being called
| 1,000,000,000 times in less than 2 seconds. Just how often is your
| program going to call the trig methods?
| >
| With different arguments each time, or were most of the calls optimized
| away?
| >
| That's 500 sine computations per microsecond. A microsecond is maybe
2400
| clock cycles on your Pentium. I don't recall if the Pentium clock is
| divided down. Even if it's not, 4.8 clock cycles per sine computation
is
| not quite credible.
|
| You're right. Here's a somewhat better test - I suspect things were
| being optimised out before and I was too sleepy to notice. Oops!
|
| using System;
|
| class Test
| {
| static void Main()
| {
| double total = 0.23;
|
| DateTime start = DateTime.Now;
| for (int i=0; i < 100000000; i++)
| {
| total += Math.Sin(total);
| total += Math.Cos(total);
| }
| DateTime end = DateTime.Now;
|
| Console.WriteLine (end-start);
| Console.WriteLine (total);
| }
| }
|
| This is harder work, of course - 2 trig operations and 2 additions per
| cycle. The timing on my box is 12 seconds for the 100,000,000 cycles.
| Not as fast as before, but still likely to be fast enough for the OP
| not to have to worry :)

Following is exactly what the JIT has produced from the loop in release mode
, the figures between () are the instruction latencies ( here for AMD64,
your's may vary).

dd0424 fld qword ptr [esp] (4)
d9fe fsin (93)
dc0424 fadd qword ptr [esp] (6)
dd1c24 fstp qword ptr [esp] (2)
dd0424 fld qword ptr [esp] (4)
d9ff fcos (92)
dc0424 fadd qword ptr [esp] (6)
dd1c24 fstp qword ptr [esp] (2)
83c601 add esi,1 (1)
81fe00e1f505 cmp esi,5F5E100h (4)
7cda jl 00cb00a0 (1)

that's a total 215 clock cycles per loop. On my box with a clock cycle of
~0,4329 nSec. that would account for ~93 nSec per loop, or 9.3 sec. for
100.000.000.000. Actually the test runs in 8.59 sec. this because there is
some amount of // execution done.
Thanks for your clarification.
I did some expriementation too. It shows sin/cos doesn't take much cpu
cycles, but I still prefer to pre-compute needed sin/cos value in two
array, and fetch them later. Since I am implementing Hough
Transformation, which needs cos/sin in a X*Y loop(X & Y are image width
and height). Accessing an array is always supposed to be faster than
Math.Cos & Math.Sin function call, right?


Willy.
Oct 16 '06 #8

P: n/a

Jon Slaughter wrote:
"Morgan Cheng" <mo************@gmail.comwrote in message
news:11*********************@f16g2000cwb.googlegro ups.com...
Hi,

I am writing a program that will take a lot of Math.Cos & Math.Sin
operation. I am afraid this will be source of performance impact.

Anybody knows how Math.cos & Math.Sin is implemented?
I suppose it just retrieving a huge pre-computed table, it might be
quick. I tried to cache all possible angle cos/sin in my own array , it
turns to be much faster to call Math.Cos & Math.Sin all the time.

It uses an algorithm. Tables only produce finite precision and it takes
something like 2*Pi*10^7 values to get the same precision of a float from a
good algorithm. You can reduce the table size by using symmetry and such but
you end up introducing overhead when you do that too and its still on the
same order.

Utlimately its your choice. You choose a table for speed and waste memory or
you use an algorithm for precision and not wste memory.
In my case, I don't need cos/sin value of any angles. I just need 0,
1/4, 2/4, 3/4....355+3/4 degress. So, I precompute them and put them in
two array cos[360*4] and sin[360*4].

Oct 16 '06 #9

P: n/a
On Thu, 12 Oct 2006 23:37:12 -0500, "Jon Slaughter"
<Jo***********@Hotmail.comwrote:
>It uses an algorithm. Tables only produce finite precision and it takes
something like 2*Pi*10^7 values to get the same precision of a float from a
good algorithm. You can reduce the table size by using symmetry and such but
you end up introducing overhead when you do that too and its still on the
same order.

Utlimately its your choice. You choose a table for speed and waste memory or
you use an algorithm for precision and not wste memory.
Many years ago, I implemented SIN and COS tables as 16 bit values
rather than floating point. This was in assembler, mind you, not a
modern language. For the graphic resolution required at the time, 16
bits was quite enough. When calculating an X/Y position, I stored the
remainder and used that in the next calculation to cut down on
positional errors. It worked a treat, and was blindingly fast too. I
suspect that nowadays, 24 bit values might be required, but the
principle remains the same.

--
Posted via a free Usenet account from http://www.teranews.com

Oct 16 '06 #10

P: n/a

"Morgan Cheng" <mo************@gmail.comwrote in message
news:11**********************@i3g2000cwc.googlegro ups.com...
|
| Willy Denoyette [MVP] wrote:
| "Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
| news:MP************************@msnews.microsoft.c om...
| | Michael A. Covington <lo**@ai.uga.edu.for.addresswrote:
| | In this case, a quick test on my laptop showed Math.Sin being
called
| | 1,000,000,000 times in less than 2 seconds. Just how often is your
| | program going to call the trig methods?
| | >
| | With different arguments each time, or were most of the calls
optimized
| | away?
| | >
| | That's 500 sine computations per microsecond. A microsecond is
maybe
| 2400
| | clock cycles on your Pentium. I don't recall if the Pentium clock
is
| | divided down. Even if it's not, 4.8 clock cycles per sine
computation
| is
| | not quite credible.
| |
| | You're right. Here's a somewhat better test - I suspect things were
| | being optimised out before and I was too sleepy to notice. Oops!
| |
| | using System;
| |
| | class Test
| | {
| | static void Main()
| | {
| | double total = 0.23;
| |
| | DateTime start = DateTime.Now;
| | for (int i=0; i < 100000000; i++)
| | {
| | total += Math.Sin(total);
| | total += Math.Cos(total);
| | }
| | DateTime end = DateTime.Now;
| |
| | Console.WriteLine (end-start);
| | Console.WriteLine (total);
| | }
| | }
| |
| | This is harder work, of course - 2 trig operations and 2 additions per
| | cycle. The timing on my box is 12 seconds for the 100,000,000 cycles.
| | Not as fast as before, but still likely to be fast enough for the OP
| | not to have to worry :)
| >
| Following is exactly what the JIT has produced from the loop in release
mode
| , the figures between () are the instruction latencies ( here for AMD64,
| your's may vary).
| >
| dd0424 fld qword ptr [esp] (4)
| d9fe fsin (93)
| dc0424 fadd qword ptr [esp] (6)
| dd1c24 fstp qword ptr [esp] (2)
| dd0424 fld qword ptr [esp] (4)
| d9ff fcos (92)
| dc0424 fadd qword ptr [esp] (6)
| dd1c24 fstp qword ptr [esp] (2)
| 83c601 add esi,1 (1)
| 81fe00e1f505 cmp esi,5F5E100h (4)
| 7cda jl 00cb00a0 (1)
| >
| that's a total 215 clock cycles per loop. On my box with a clock cycle
of
| ~0,4329 nSec. that would account for ~93 nSec per loop, or 9.3 sec. for
| 100.000.000.000. Actually the test runs in 8.59 sec. this because there
is
| some amount of // execution done.
| >
| Thanks for your clarification.
| I did some expriementation too. It shows sin/cos doesn't take much cpu
| cycles, but I still prefer to pre-compute needed sin/cos value in two
| array, and fetch them later. Since I am implementing Hough
| Transformation, which needs cos/sin in a X*Y loop(X & Y are image width
| and height). Accessing an array is always supposed to be faster than
| Math.Cos & Math.Sin function call, right?
Could be, but keep in mind that using a table look-up might introduce some
hidden costs.
I wouldn't care about these kind of micro-optimizations, more important is
to take care of a good algorithm design and implementation. If ever it turns
out to be a bottleneck, you can switch to a table look-up if you are sure
about the performance gains after proper profiling.

Willy.
Oct 16 '06 #11

P: n/a
Morgan Cheng wrote:
In my case, I don't need cos/sin value of any angles. I just need 0,
1/4, 2/4, 3/4....355+3/4 degress. So, I precompute them and put them in
two array cos[360*4] and sin[360*4].
In which case the array lookup is obviously faster. But also
has nothing to do with the general case.

Arne
Oct 17 '06 #12

P: n/a

Arne Vajhøj wrote:
Morgan Cheng wrote:
In my case, I don't need cos/sin value of any angles. I just need 0,
1/4, 2/4, 3/4....355+3/4 degress. So, I precompute them and put them in
two array cos[360*4] and sin[360*4].

In which case the array lookup is obviously faster. But also
has nothing to do with the general case.

Arne
Implementation of Hough Transformation.
it is something like.

double angle = 0.0;
double angleStep = Math.PI / (180*4);
for (int x=0; x< image.Width; ++x)
for (int y =0; y< image.Height; ++y)
{
if (some codition)
{
double radius = x * cos (angle) + y * sine(angle);
....
}
angle += angleStep;
}

Just rush the code. Perhaps not accurate for Hough.
cos & sin are computed over and over again in two dimesion loop.

Oct 17 '06 #13

P: n/a

Willy Denoyette [MVP] wrote:
"Morgan Cheng" <mo************@gmail.comwrote in message
news:11**********************@i3g2000cwc.googlegro ups.com...
|
| Willy Denoyette [MVP] wrote:
| "Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
| news:MP************************@msnews.microsoft.c om...
| | Michael A. Covington <lo**@ai.uga.edu.for.addresswrote:
| | In this case, a quick test on my laptop showed Math.Sin being
called
| | 1,000,000,000 times in less than 2 seconds. Just how often is your
| | program going to call the trig methods?
| | >
| | With different arguments each time, or were most of the calls
optimized
| | away?
| | >
| | That's 500 sine computations per microsecond. A microsecond is
maybe
| 2400
| | clock cycles on your Pentium. I don't recall if the Pentium clock
is
| | divided down. Even if it's not, 4.8 clock cycles per sine
computation
| is
| | not quite credible.
| |
| | You're right. Here's a somewhat better test - I suspect things were
| | being optimised out before and I was too sleepy to notice. Oops!
| |
| | using System;
| |
| | class Test
| | {
| | static void Main()
| | {
| | double total = 0.23;
| |
| | DateTime start = DateTime.Now;
| | for (int i=0; i < 100000000; i++)
| | {
| | total += Math.Sin(total);
| | total += Math.Cos(total);
| | }
| | DateTime end = DateTime.Now;
| |
| | Console.WriteLine (end-start);
| | Console.WriteLine (total);
| | }
| | }
| |
| | This is harder work, of course - 2 trig operations and 2 additions per
| | cycle. The timing on my box is 12 seconds for the 100,000,000 cycles.
| | Not as fast as before, but still likely to be fast enough for the OP
| | not to have to worry :)
| >
| Following is exactly what the JIT has produced from the loop in release
mode
| , the figures between () are the instruction latencies ( here for AMD64,
| your's may vary).
| >
| dd0424 fld qword ptr [esp] (4)
| d9fe fsin (93)
| dc0424 fadd qword ptr [esp] (6)
| dd1c24 fstp qword ptr [esp] (2)
| dd0424 fld qword ptr [esp] (4)
| d9ff fcos (92)
| dc0424 fadd qword ptr [esp] (6)
| dd1c24 fstp qword ptr [esp] (2)
| 83c601 add esi,1 (1)
| 81fe00e1f505 cmp esi,5F5E100h (4)
| 7cda jl 00cb00a0 (1)
| >
| that's a total 215 clock cycles per loop. On my box with a clock cycle
of
| ~0,4329 nSec. that would account for ~93 nSec per loop, or 9.3 sec. for
| 100.000.000.000. Actually the test runs in 8.59 sec. this because there
is
| some amount of // execution done.
| >
| Thanks for your clarification.
| I did some expriementation too. It shows sin/cos doesn't take much cpu
| cycles, but I still prefer to pre-compute needed sin/cos value in two
| array, and fetch them later. Since I am implementing Hough
| Transformation, which needs cos/sin in a X*Y loop(X & Y are image width
| and height). Accessing an array is always supposed to be faster than
| Math.Cos & Math.Sin function call, right?
Could be, but keep in mind that using a table look-up might introduce some
hidden costs.
You mean cost introduced by array boundary check?

I wouldn't care about these kind of micro-optimizations, more important is
to take care of a good algorithm design and implementation. If ever it turns
out to be a bottleneck, you can switch to a table look-up if you are sure
about the performance gains after proper profiling.

Willy.
Oct 17 '06 #14

P: n/a
Boundary check is quite fast as you can see at
http://blogs.msdn.com/ricom/archive/...12/663642.aspx.

Tables can, potentially, induce more cache misses and even page faults...

"Morgan Cheng" <mo************@gmail.comha scritto nel messaggio
news:11**********************@h48g2000cwc.googlegr oups.com...
>
Willy Denoyette [MVP] wrote:
>"Morgan Cheng" <mo************@gmail.comwrote in message
news:11**********************@i3g2000cwc.googlegr oups.com...
|
| Willy Denoyette [MVP] wrote:
| "Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
| news:MP************************@msnews.microsoft.c om...
| | Michael A. Covington <lo**@ai.uga.edu.for.addresswrote:
| | In this case, a quick test on my laptop showed Math.Sin being
called
| | 1,000,000,000 times in less than 2 seconds. Just how often is
your
| | program going to call the trig methods?
| | >
| | With different arguments each time, or were most of the calls
optimized
| | away?
| | >
| | That's 500 sine computations per microsecond. A microsecond is
maybe
| 2400
| | clock cycles on your Pentium. I don't recall if the Pentium
clock
is
| | divided down. Even if it's not, 4.8 clock cycles per sine
computation
| is
| | not quite credible.
| |
| | You're right. Here's a somewhat better test - I suspect things were
| | being optimised out before and I was too sleepy to notice. Oops!
| |
| | using System;
| |
| | class Test
| | {
| | static void Main()
| | {
| | double total = 0.23;
| |
| | DateTime start = DateTime.Now;
| | for (int i=0; i < 100000000; i++)
| | {
| | total += Math.Sin(total);
| | total += Math.Cos(total);
| | }
| | DateTime end = DateTime.Now;
| |
| | Console.WriteLine (end-start);
| | Console.WriteLine (total);
| | }
| | }
| |
| | This is harder work, of course - 2 trig operations and 2 additions
per
| | cycle. The timing on my box is 12 seconds for the 100,000,000
cycles.
| | Not as fast as before, but still likely to be fast enough for the
OP
| | not to have to worry :)
| >
| Following is exactly what the JIT has produced from the loop in
release
mode
| , the figures between () are the instruction latencies ( here for
AMD64,
| your's may vary).
| >
| dd0424 fld qword ptr [esp] (4)
| d9fe fsin (93)
| dc0424 fadd qword ptr [esp] (6)
| dd1c24 fstp qword ptr [esp] (2)
| dd0424 fld qword ptr [esp] (4)
| d9ff fcos (92)
| dc0424 fadd qword ptr [esp] (6)
| dd1c24 fstp qword ptr [esp] (2)
| 83c601 add esi,1 (1)
| 81fe00e1f505 cmp esi,5F5E100h (4)
| 7cda jl 00cb00a0 (1)
| >
| that's a total 215 clock cycles per loop. On my box with a clock
cycle
of
| ~0,4329 nSec. that would account for ~93 nSec per loop, or 9.3 sec.
for
| 100.000.000.000. Actually the test runs in 8.59 sec. this because
there
is
| some amount of // execution done.
| >
| Thanks for your clarification.
| I did some expriementation too. It shows sin/cos doesn't take much cpu
| cycles, but I still prefer to pre-compute needed sin/cos value in two
| array, and fetch them later. Since I am implementing Hough
| Transformation, which needs cos/sin in a X*Y loop(X & Y are image width
| and height). Accessing an array is always supposed to be faster than
| Math.Cos & Math.Sin function call, right?
Could be, but keep in mind that using a table look-up might introduce
some
hidden costs.
You mean cost introduced by array boundary check?

>I wouldn't care about these kind of micro-optimizations, more important
is
to take care of a good algorithm design and implementation. If ever it
turns
out to be a bottleneck, you can switch to a table look-up if you are sure
about the performance gains after proper profiling.

Willy.

Oct 17 '06 #15

P: n/a

"Morgan Cheng" <mo************@gmail.comwrote in message
news:11**********************@h48g2000cwc.googlegr oups.com...
|
| Willy Denoyette [MVP] wrote:
| "Morgan Cheng" <mo************@gmail.comwrote in message
| news:11**********************@i3g2000cwc.googlegro ups.com...
| |
| | Willy Denoyette [MVP] wrote:
| | "Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
| | news:MP************************@msnews.microsoft.c om...
| | | Michael A. Covington <lo**@ai.uga.edu.for.addresswrote:
| | | In this case, a quick test on my laptop showed Math.Sin being
| called
| | | 1,000,000,000 times in less than 2 seconds. Just how often is
your
| | | program going to call the trig methods?
| | | >
| | | With different arguments each time, or were most of the calls
| optimized
| | | away?
| | | >
| | | That's 500 sine computations per microsecond. A microsecond is
| maybe
| | 2400
| | | clock cycles on your Pentium. I don't recall if the Pentium
clock
| is
| | | divided down. Even if it's not, 4.8 clock cycles per sine
| computation
| | is
| | | not quite credible.
| | |
| | | You're right. Here's a somewhat better test - I suspect things
were
| | | being optimised out before and I was too sleepy to notice. Oops!
| | |
| | | using System;
| | |
| | | class Test
| | | {
| | | static void Main()
| | | {
| | | double total = 0.23;
| | |
| | | DateTime start = DateTime.Now;
| | | for (int i=0; i < 100000000; i++)
| | | {
| | | total += Math.Sin(total);
| | | total += Math.Cos(total);
| | | }
| | | DateTime end = DateTime.Now;
| | |
| | | Console.WriteLine (end-start);
| | | Console.WriteLine (total);
| | | }
| | | }
| | |
| | | This is harder work, of course - 2 trig operations and 2 additions
per
| | | cycle. The timing on my box is 12 seconds for the 100,000,000
cycles.
| | | Not as fast as before, but still likely to be fast enough for the
OP
| | | not to have to worry :)
| | >
| | Following is exactly what the JIT has produced from the loop in
release
| mode
| | , the figures between () are the instruction latencies ( here for
AMD64,
| | your's may vary).
| | >
| | dd0424 fld qword ptr [esp] (4)
| | d9fe fsin (93)
| | dc0424 fadd qword ptr [esp] (6)
| | dd1c24 fstp qword ptr [esp] (2)
| | dd0424 fld qword ptr [esp] (4)
| | d9ff fcos (92)
| | dc0424 fadd qword ptr [esp] (6)
| | dd1c24 fstp qword ptr [esp] (2)
| | 83c601 add esi,1 (1)
| | 81fe00e1f505 cmp esi,5F5E100h (4)
| | 7cda jl 00cb00a0 (1)
| | >
| | that's a total 215 clock cycles per loop. On my box with a clock
cycle
| of
| | ~0,4329 nSec. that would account for ~93 nSec per loop, or 9.3 sec.
for
| | 100.000.000.000. Actually the test runs in 8.59 sec. this because
there
| is
| | some amount of // execution done.
| | >
| | Thanks for your clarification.
| | I did some expriementation too. It shows sin/cos doesn't take much cpu
| | cycles, but I still prefer to pre-compute needed sin/cos value in two
| | array, and fetch them later. Since I am implementing Hough
| | Transformation, which needs cos/sin in a X*Y loop(X & Y are image
width
| | and height). Accessing an array is always supposed to be faster than
| | Math.Cos & Math.Sin function call, right?
| >
| >
| Could be, but keep in mind that using a table look-up might introduce
some
| hidden costs.
| You mean cost introduced by array boundary check?

These are not hidden and are (really small) fixed costs, hidden costs are
things like L1/L2 data cache misses, hard to measure and unpredictable.
Using a lookup table will certainly speed-up the calculation, but large
tables will introduce cache misses and these might well be important enough
to reduce the performance gain considerably, so keep them as small as
possible, don't store doubles use floats instead.

Willy.


Oct 17 '06 #16

This discussion thread is closed

Replies have been disabled for this discussion.