473,396 Members | 1,884 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Multiplications by powers of 2

I am writing a program (precisely an image processing filter) which computes
several multiplications of floating-point numbers by constant powers of 2.
This means adding an integer constant to the exponent without changing the
mantissa, and I believed that this was simpler than multiplying two
floating-point numbers. Instead, if I write the program using the C "ldexp"
function, it runs much slower and produces the same output.

Since the program should run on a Pentium 4 processor, I downloaded a manual
from Intel's website
http://www.intel.com/design/pentium4/manuals/248966.htm
and indeed floating-point multiplication ("fmul" in assembler) has a latency
of 8 clock cycles, while changing the exponent ("fscale" in assembler) has a
latency of 60 cycles. This makes no sense to me.

I would like to ask if anyone knows a fast method for multiplying a
floating-point number by a constant power of 2, or if I should use standard
multiplication. I must use floating-point arithmetic because the input data
can span several orders of magnitude and the program also computes square
roots and exponentials. Speed is important, because the program should
process high-resolution video.

I tried to add an integer to the exponent using an union, but this is still
a bit slower than pure floating-point multiplication. I guess this happens
because the processor uses different registers for integer and
floating-point operations, therefore some clock cycles are wasted in moving
data around. Also in this way underflows give incorrect results, e.g.
"multiplying" 0.0 by 2^(-1) gives -Inf. Maybe I could gain some speed by
using SSE instructions, which (as far as I know) use the same set of
registers for both integer and floating-point operations, but this will not
solve the underflow problem.

I hope someone could help me. Thanks in advance.

An Electronics Engineering PhD student
Jun 27 '06 #1
14 2420
Lisa Simpson wrote:
I would like to ask if anyone knows a fast method for multiplying a
floating-point number by a constant power of 2,
or if I should use standard multiplication.


Use standard multiplication.
If one of the operands of the multiplication operator
is a constant expression with a value of 2,
then your compiler has all the information it needs
to do any optimization that there is to be done.

--
pete
Jun 27 '06 #2
Lisa Simpson wrote:
I am writing a program (precisely an image processing filter) which computes
several multiplications of floating-point numbers by constant powers of 2.
This means adding an integer constant to the exponent without changing the
mantissa, and I believed that this was simpler than multiplying two
floating-point numbers. Instead, if I write the program using the C "ldexp"
function, it runs much slower and produces the same output.

Since the program should run on a Pentium 4 processor, I downloaded a manual
from Intel's website
http://www.intel.com/design/pentium4/manuals/248966.htm
and indeed floating-point multiplication ("fmul" in assembler) has a latency
of 8 clock cycles, while changing the exponent ("fscale" in assembler) has a
latency of 60 cycles. This makes no sense to me.
Why not? Common operations are optimized; uncommon ones are not. Just
because you think one operation ought to be faster because it's conceptually
simpler doesn't mean that will be the case.
I would like to ask if anyone knows a fast method for multiplying a
floating-point number by a constant power of 2, or if I should use standard
multiplication.
You should use multiplication, since you're trying to multiply. :-)

Leave optimization to the compiler. If timing shows you an program isn't
fast enough for your purposes, use profiling to determine where the
bottleneck is, and find ways to remove it. The bottlenecks are almost never
primitive operations, but rather the algorithm that uses them.
I must use floating-point arithmetic because the input data
can span several orders of magnitude and the program also computes square
roots and exponentials. Speed is important, because the program should
process high-resolution video.
"Speed is important" is not the same as "I know how important the speed is,
what the current speed is, and where the speed isn't sufficient".

Speed is almost never *unimportant*, but that doesn't mean you should
optimize prematurely.
I tried to add an integer to the exponent using an union, but this is still
a bit slower than pure floating-point multiplication. I guess this happens
because the processor uses different registers for integer and
floating-point operations, therefore some clock cycles are wasted in moving
data around. Also in this way underflows give incorrect results, e.g.
"multiplying" 0.0 by 2^(-1) gives -Inf. Maybe I could gain some speed by
using SSE instructions, which (as far as I know) use the same set of
registers for both integer and floating-point operations, but this will not
solve the underflow problem.

SSE may very well speed up things, but not for the reasons you mention. SSE
is designed to quickly apply a single operation to multiple sets of data.
Rather than multiply two numbers, SSE could be used to multiply vectors of
numbers, effectively doing many multiplications in concert. Modern compilers
can even make use of this transparently in some circumstances.

How to use SSE and comparable vectorizing technologies is off-topic to this
ng, but there's plenty of information out there. This seems much more
relevant to your problem than trying to optimize multiplication.

S.
Jun 27 '06 #3
"Lisa Simpson" wrote:
I am writing a program (precisely an image processing filter) which computes
several multiplications of floating-point numbers by constant powers of 2.
This means adding an integer constant to the exponent without changing the
mantissa, and I believed that this was simpler than multiplying two
floating-point numbers. Instead, if I write the program using the C "ldexp"
function, it runs much slower and produces the same output.

Since the program should run on a Pentium 4 processor, I downloaded a manual
from Intel's website
http://www.intel.com/design/pentium4/manuals/248966.htm
and indeed floating-point multiplication ("fmul" in assembler) has a latency
of 8 clock cycles, while changing the exponent ("fscale" in assembler) has a
latency of 60 cycles. This makes no sense to me.

I would like to ask if anyone knows a fast method for multiplying a
floating-point number by a constant power of 2, or if I should use standard
multiplication. I must use floating-point arithmetic because the input data
can span several orders of magnitude and the program also computes square
roots and exponentials. Speed is important, because the program should
process high-resolution video.


Two very generic, very vague, and probably useless suggestions:

(a) Could you perform a chain of calculations knowing what error is
accumulating and correct the result only once at the end?

(b) May be you could get faster results using fixed point / multiple
precision arithmetic, and converting to floating point only when
needed. (Doubtful, but worth checking nevertheless.)
Jun 27 '06 #4
Lisa Simpson wrote:
Since the program should run on a Pentium 4 processor, I downloaded a manual
from Intel's website
http://www.intel.com/design/pentium4/manuals/248966.htm
and indeed floating-point multiplication ("fmul" in assembler) has a latency
of 8 clock cycles, while changing the exponent ("fscale" in assembler) has a
latency of 60 cycles. This makes no sense to me.


It's probably a case of intel optomizing their processors to handle
common operations at the expense of uncommon ones. The benchmark
programs will only use the common operations themselves, and it
translates into better performance results against the competition.

Inmatarian.
Jun 27 '06 #5
Lisa Simpson (in e7**********@news.cineca.it) said:

| Since the program should run on a Pentium 4 processor, I downloaded
| a manual from Intel's website
| http://www.intel.com/design/pentium4/manuals/248966.htm
| and indeed floating-point multiplication ("fmul" in assembler) has
| a latency of 8 clock cycles, while changing the exponent ("fscale"
| in assembler) has a latency of 60 cycles. This makes no sense to me.

I suspect that you're more qualified to answer this question than I;
but it does appear that 8 clocks constitute a "bargain".

--
Morris Dovey
DeSoto Solar
DeSoto, Iowa USA
http://www.iedu.com/DeSoto
Jun 28 '06 #6
hi,
u search for Q15/Q31 formats for this operations.
hope u will do best by using this formats.
Reddy

Jun 28 '06 #7
> If one of the operands of the multiplication operator
is a constant expression with a value of 2,
then your compiler has all the information it needs
to do any optimization that there is to be done.


I examined the assembly code generated by the compiler (GCC 4.1.0, with
optimization turned on). If I multiply by 2, the compiler is smart enough to
replace the multiplication with an addition, but if I multiply by 0.5 (or
divide by 2) it uses standard floating-point multiplication. Unfortunately,
we can't afford to buy a commercial compiler.
Jun 28 '06 #8
Lisa Simpson wrote:
If one of the operands of the multiplication operator
is a constant expression with a value of 2,
then your compiler has all the information it needs
to do any optimization that there is to be done.


I examined the assembly code generated by the
compiler (GCC 4.1.0, with
optimization turned on). If I multiply by 2,
the compiler is smart enough to
replace the multiplication with an addition,
but if I multiply by 0.5 (or divide by 2)
it uses standard floating-point multiplication. Unfortunately,
we can't afford to buy a commercial compiler.


If you examine the bit representation
of your floating point numbers,
then maybe you can figure something out.

/* BEGIN output from bitstr.c */

0.001100 = 00111010100100000010110111100000
0.002200 = 00111011000100000010110111100000
0.004400 = 00111011100100000010110111100000
0.008800 = 00111100000100000010110111100000
0.017600 = 00111100100100000010110111100000
0.035200 = 00111101000100000010110111100000
0.070400 = 00111101100100000010110111100000
0.140800 = 00111110000100000010110111100000
0.281600 = 00111110100100000010110111100000
0.563200 = 00111111000100000010110111100000
1.126400 = 00111111100100000010110111100000
2.252800 = 01000000000100000010110111100000
4.505600 = 01000000100100000010110111100000
9.011200 = 01000001000100000010110111100000
18.022400 = 01000001100100000010110111100000
36.044800 = 01000010000100000010110111100000
72.089600 = 01000010100100000010110111100000
144.179199 = 01000011000100000010110111100000
288.358398 = 01000011100100000010110111100000
576.716797 = 01000100000100000010110111100000
1153.433594 = 01000100100100000010110111100000
2306.867188 = 01000101000100000010110111100000
4613.734375 = 01000101100100000010110111100000
9227.468750 = 01000110000100000010110111100000
18454.937500 = 01000110100100000010110111100000

/* END output from bitstr.c */

/* BEGIN bitstr.c */

#include <limits.h>
#include <stdio.h>

#define STRING "%12f = %s\n"
#define E_TYPE float
#define P_TYPE double
#define INITIAL 0.0011f
#define FINAL 10000
#define INC(E) ((E) *= 2)

typedef E_TYPE e_type;
typedef P_TYPE p_type;

void bitstr(char *str, const void *obj, size_t n);

int main(void)
{
e_type e;
char ebits[CHAR_BIT * sizeof e + 1];

puts("\n/* BEGIN output from bitstr.c */\n");
e = INITIAL;
bitstr(ebits, &e, sizeof e);
printf(STRING, (p_type)e, ebits);
while (FINAL > e) {
INC(e);
bitstr(ebits, &e, sizeof e);
printf(STRING, (p_type)e, ebits);
}
puts("\n/* END output from bitstr.c */");
return 0;
}

void bitstr(char *str, const void *obj, size_t n)
{
unsigned mask;
const unsigned char *byte = obj;

while (n-- != 0) {
mask = ((unsigned char)-1 >> 1) + 1;
do {
*str++ = (char)(mask & byte[n] ? '1' : '0');
mask >>= 1;
} while (mask != 0);
}
*str = '\0';
}

/* END bitstr.c */
--
pete
Jun 28 '06 #9
Lisa Simpson (in e7**********@news.cineca.it) said:

|| If one of the operands of the multiplication operator
|| is a constant expression with a value of 2,
|| then your compiler has all the information it needs
|| to do any optimization that there is to be done.
|
| I examined the assembly code generated by the compiler (GCC 4.1.0,
| with optimization turned on). If I multiply by 2, the compiler is
| smart enough to replace the multiplication with an addition, but if
| I multiply by 0.5 (or divide by 2) it uses standard floating-point
| multiplication. Unfortunately, we can't afford to buy a commercial
| compiler.

Very interesting. Have you considered working with the maintainer(s)
of the relevant gcc modules to complete the symmetry? Getting what
you'd like to have /may/ be easier than you might expect - and the
benefits could extend to a great many people...

--
Morris Dovey
DeSoto Solar
DeSoto, Iowa USA
http://www.iedu.com/DeSoto
Jun 28 '06 #10
Lisa Simpson wrote:
If one of the operands of the multiplication operator
is a constant expression with a value of 2,
then your compiler has all the information it needs
to do any optimization that there is to be done.


I examined the assembly code generated by the compiler (GCC 4.1.0, with
optimization turned on). If I multiply by 2, the compiler is smart enough to
replace the multiplication with an addition, but if I multiply by 0.5 (or
divide by 2) it uses standard floating-point multiplication. Unfortunately,
we can't afford to buy a commercial compiler.

Doesn't that look like a reasonable thing for gcc to do? In some
special circumstances, if you are doing no other floating point
operations on the data, and you know there will be no exponent range
problems, you might get slightly faster results by integer addition. It
seems unlikely that such situations could be recognized by a compiler;
if you find an advantage in it, you could write the highly non-portable
code yourself.
Jun 28 '06 #11
"Lisa Simpson" <li**********@springfield.com> wrote in message
news:e7**********@news.cineca.it...
If one of the operands of the multiplication operator
is a constant expression with a value of 2,
then your compiler has all the information it needs
to do any optimization that there is to be done.


I examined the assembly code generated by the compiler (GCC 4.1.0, with
optimization turned on). If I multiply by 2, the compiler is smart enough
to replace the multiplication with an addition, but if I multiply by 0.5
(or divide by 2) it uses standard floating-point multiplication.
Unfortunately, we can't afford to buy a commercial compiler.


I'd suggest posting a question on a GCC-specific list, like gnu.gcc.help.
The community is pretty responsive to "simple" optimizations like this once
someone points them out.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin
--
Posted via a free Usenet account from http://www.teranews.com

Jun 28 '06 #12
In article <44**********************@free.teranews.com> "Stephen Sprunk" <st*****@sprunk.org> writes:
"Lisa Simpson" <li**********@springfield.com> wrote in message
news:e7**********@news.cineca.it...

....
I examined the assembly code generated by the compiler (GCC 4.1.0, with
optimization turned on). If I multiply by 2, the compiler is smart enough
to replace the multiplication with an addition, but if I multiply by 0.5
(or divide by 2) it uses standard floating-point multiplication.
Unfortunately, we can't afford to buy a commercial compiler.


I'd suggest posting a question on a GCC-specific list, like gnu.gcc.help.
The community is pretty responsive to "simple" optimizations like this once
someone points them out.


Multiplication by 2 is easy to transform to addition of the number with
itself. There is no such easy transformation for division by 2. You
can try manipulation of the exponent, but that works only if the
exponent is within certain ranges, something the compiler does not
know. I do not think there is any compiler that would division by 2
transform to something else.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Jun 28 '06 #13
On Wed, 28 Jun 2006 11:50:21 +0200, "Lisa Simpson"
<li**********@springfield.com> wrote:
If one of the operands of the multiplication operator
is a constant expression with a value of 2,
then your compiler has all the information it needs
to do any optimization that there is to be done.
I examined the assembly code generated by the compiler (GCC 4.1.0, with
optimization turned on). If I multiply by 2, the compiler is smart enough to
replace the multiplication with an addition, but if I multiply by 0.5 (or
divide by 2) it uses standard floating-point multiplication.


I must be missing something - what did you expect it to do? As you
have already discovered, manipulating the exponent is not efficient.
Unfortunately,
we can't afford to buy a commercial compiler.


--
Al Balmer
Sun City, AZ
Jun 28 '06 #14
"Dik T. Winter" <Di********@cwi.nl> wrote in message
news:J1********@cwi.nl...
In article <44**********************@free.teranews.com> "Stephen Sprunk"
<st*****@sprunk.org> writes:
"Lisa Simpson" <li**********@springfield.com> wrote in message
news:e7**********@news.cineca.it... ...
I examined the assembly code generated by the compiler (GCC 4.1.0,
with
optimization turned on). If I multiply by 2, the compiler is smart
enough
to replace the multiplication with an addition, but if I multiply by
0.5
(or divide by 2) it uses standard floating-point multiplication.
Unfortunately, we can't afford to buy a commercial compiler.


I'd suggest posting a question on a GCC-specific list, like
gnu.gcc.help.
The community is pretty responsive to "simple" optimizations like this
once
someone points them out.


Multiplication by 2 is easy to transform to addition of the number with
itself. There is no such easy transformation for division by 2. You
can try manipulation of the exponent, but that works only if the
exponent is within certain ranges, something the compiler does not
know. I do not think there is any compiler that would division by 2
transform to something else.


How about multiplication by 0.5? (Since we are talking about floating
point).
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland,
+31205924131
home: bovenover 215, 1025 jn amsterdam, nederland;
http://www.cwi.nl/~dik/

Jun 28 '06 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: dsptechie | last post by:
I wanted to how exactly malloc() function works. I came to know that malloc() allocates memory only in powers of 2. i.e if asked for say 17 bytes , in the process, it allocates 32 bytes and returns...
4
by: PengYu.UT | last post by:
Hi All, For example, I want to declare an array double array; Of cause, I can define an macro for the 4th power of VERY_VERY_VERY_LONG_NAME. But I want to know whether there is any build...
10
by: Michel Rouzic | last post by:
I need to determine how long does an addition take and how long does a multiplication takes. So far I've been trying to use the clock() function in my programs in order to find out how long it took...
7
by: Camellia | last post by:
hi all, I wrote a "table of powers" program and made the 5th power the highest, but when I tried to run the program I found a problem. Here's the output: Integer Square power...
4
by: benbridle38 | last post by:
Hi, dont really know much about programming or anything like this. But basicly in my program i am trying to write the following formula: http://img8.imageshack.us/img8/4379/fomula.jpg...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.