When to use std::pow(x,n) instead of times x for n times?

Peng Yu

Hi,

I'm wondering if there is any general guideline on when to using
something like
std::pow(x, n)
rather than
x * x * x * ... * x (n x's).

Thanks,
Peng

Sep 10 '08 #1

Subscribe Post Reply

2207

Jack Klein

On Tue, 9 Sep 2008 17:30:04 -0700 (PDT), Peng Yu <Pe*******@gmail.com>
wrote in comp.lang.c++:

Hi,

I'm wondering if there is any general guideline on when to using
something like
std::pow(x, n)
rather than
x * x * x * ... * x (n x's).

Thanks,
Peng

In addition to what Alf said, here's my advice:

ALWAYS use std::pow(x, n) instead of trying to multiply 'x' by
multiple times when 'n' is not integral.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html

Sep 10 '08 #2

jellybean stonerfish

On Tue, 09 Sep 2008 17:30:04 -0700, Peng Yu wrote:

Hi,

I'm wondering if there is any general guideline on when to using
something like
std::pow(x, n)
rather than
x * x * x * ... * x (n x's).

Thanks,
Peng

It can be hard to represent fractional powers in x*x notation.

sf

Sep 10 '08 #3

Juha Nieminen

Peng Yu wrote:

I'm wondering if there is any general guideline on when to using
something like
std::pow(x, n)
rather than
x * x * x * ... * x (n x's).

If you need to calculate that function millions of times per second,
then using the latter form can be considerably faster up to a certain n
(after which std::pow() becomes faster). The maximum n for which the
latter form is faster than std::pow() can be surprisingly large,
depending on the code (eg. something like n=8 might not be far-fetched).

Of course this is heavily system-dependent so there's no rule.

Sep 10 '08 #4

gpderetta

On Sep 10, 4:24*pm, Juha Nieminen <nos...@thanks.invalidwrote:

Peng Yu wrote:
I'm wondering if there is any general guideline on when to using
something like
std::pow(x, n)
rather than
x * x * x * ... * x (n x's).

* If you need to calculate that function millions of times per second,
then using the latter form can be considerably faster up to a certain n
(after which std::pow() becomes faster).

Why? There is no reason for the compiler not to transform pow(x,
<integral-constant>) to the latter form if it were actually faster
(and in fact some compilers do).

--
gpd

Sep 10 '08 #5

stan

Juha Nieminen wrote:

Peng Yu wrote:
>I'm wondering if there is any general guideline on when to using
something like
std::pow(x, n)
rather than
x * x * x * ... * x (n x's).

If you need to calculate that function millions of times per second,
then using the latter form can be considerably faster up to a certain n
(after which std::pow() becomes faster). The maximum n for which the
latter form is faster than std::pow() can be surprisingly large,
depending on the code (eg. something like n=8 might not be far-fetched).

Of course this is heavily system-dependent so there's no rule.

It's always best to profile prior to optimization attempts. Rules of
thumb and "common sense" are frequently wrong; measurements seldom are
seldom wrong.

Sep 11 '08 #6

Juha Nieminen

gpderetta wrote:

Why? There is no reason for the compiler not to transform pow(x,
<integral-constant>) to the latter form if it were actually faster
(and in fact some compilers do).

Some compilers might be able to do that optimizations, others aren't.
And if n is a variable, then it cannot optimize it. (At most the pow()
function itself might have optimizations in it, but in my experience it
doesn't: With most compilers it just generates the FPU opcodes necessary
to calculate the result.)

But we don't have to speculate about this as it's trivially easy to
test in practice. Go ahead and try it.

Sep 11 '08 #7

gpderetta

On Sep 11, 6:39 pm, Juha Nieminen <nos...@thanks.invalidwrote:

gpderetta wrote:
Why? There is no reason for the compiler not to transform pow(x,
<integral-constant>) to the latter form if it were actually faster
(and in fact some compilers do).

Some compilers might be able to do that optimizations, others aren't.
And if n is a variable, then it cannot optimize it.

and if it is variable, you can't write an explicit expression either.
You could use a for loop

(At most the pow()
function itself might have optimizations in it, but in my experience it
doesn't: With most compilers it just generates the FPU opcodes necessary
to calculate the result.)

Today hand optimizations are tomorrow pessimizations. Let the compiler
do its job.

The usual rule apply: use pow, and only if the profiler tells it is a
bottleneck, try to optimize it by hand.

>
But we don't have to speculate about this as it's trivially easy to
test in practice. Go ahead and try it.

I had already tried. 'pow(x, 16)' is inlined exactly as four
multiplies, at least with a recent gcc.

--
gpd

Sep 11 '08 #8

Juha Nieminen

gpderetta wrote:

> Some compilers might be able to do that optimizations, others aren't.
And if n is a variable, then it cannot optimize it.

and if it is variable, you can't write an explicit expression either.
You could use a for loop

In my experience even performing a set of multiplications in a loop
while interpreting bytecode can be faster than a single std::pow() call,
up to a certain exponent.

I have made a function parser/interpreter, and in practice eg.
interpreting the function "x*x*x*x" (which it bytecompiles to three
multiplications) is faster than "x^4" (which it bytecompiles to one
std::pow() call). std::pow() can be incredibly slow.

Sep 12 '08 #9

Peng Yu

On Sep 11, 4:56 pm, gpderetta <gpdere...@gmail.comwrote:

On Sep 11, 6:39 pm, Juha Nieminen <nos...@thanks.invalidwrote:

gpderetta wrote:
Why? There is no reason for the compiler not to transformpow(x,
<integral-constant>) to the latter form if it were actually faster
(and in fact some compilers do).

Some compilers might be able to do that optimizations, others aren't.
And if n is a variable, then it cannot optimize it.

and if it is variable, you can't write an explicit expression either.
You could use a for loop

(At most thepow()
function itself might have optimizations in it, but in my experience it
doesn't: With most compilers it just generates the FPU opcodes necessary
to calculate the result.)

Today hand optimizations are tomorrow pessimizations. Let the compiler
do its job.

The usual rule apply: usepow, and only if the profiler tells it is a
bottleneck, try to optimize it by hand.

But we don't have to speculate about this as it's trivially easy to
test in practice. Go ahead and try it.

I had already tried. 'pow(x, 16)' is inlined exactly as four
multiplies, at least with a recent gcc.

Would you please let me know the details the procedure on how you
figure this out? Sometimes I want to know what the compiler compile
the code to.

Thanks,
Peng

Sep 13 '08 #10

Peng Yu

On Sep 13, 4:09 pm, "Alf P. Steinbach" <al...@start.nowrote:

* Peng Yu:

Sometimes I want to know what the compiler compile
the code to.

e.g.

g++ -S -masm=intel x.cpp

It is pretty hard to figure out what part of assembly code is
associated with a give portion of source code. For example, the
following C++ and assembly code. How do I figure out where the pow
functions are at in the code?

Thanks,
Peng

$cat main.cc main.s
#include <cmath>
#include <iostream>

int main() {
double x = 1;
std::cout << "pox(x, 1) = " << std::pow(x, 1) << std::endl;
std::cout << "pox(x, 2) = " << std::pow(x, 2) << std::endl;
std::cout << "pox(x, 3) = " << std::pow(x, 3) << std::endl;
std::cout << "pox(x, 4) = " << std::pow(x, 4) << std::endl;
std::cout << "pox(x, 5) = " << std::pow(x, 5) << std::endl;
std::cout << "pox(x, 6) = " << std::pow(x, 6) << std::endl;
std::cout << "pox(x, 7) = " << std::pow(x, 7) << std::endl;
std::cout << "pox(x, 8) = " << std::pow(x, 8) << std::endl;
std::cout << "pox(x, 9) = " << std::pow(x, 9) << std::endl;
std::cout << "pox(x, 10) = " << std::pow(x, 10) << std::endl;
std::cout << "pox(x, 11) = " << std::pow(x, 11) << std::endl;
std::cout << "pox(x, 12) = " << std::pow(x, 12) << std::endl;
std::cout << "pox(x, 13) = " << std::pow(x, 13) << std::endl;
std::cout << "pox(x, 14) = " << std::pow(x, 14) << std::endl;
std::cout << "pox(x, 15) = " << std::pow(x, 15) << std::endl;
std::cout << "pox(x, 16) = " << std::pow(x, 16) << std::endl;
}
.file "main.cc"
.intel_syntax
.section .ctors,"aw",@progbits
.align 8
.quad _GLOBAL__I_main
.text
.align 2
.type _Z41__static_initialization_and_destruction_0ii,
@function
_Z41__static_initialization_and_destruction_0ii:
..LFB1504:
push %rbp
..LCFI0:
mov %rbp, %rsp
..LCFI1:
sub %rsp, 16
..LCFI2:
mov DWORD PTR [%rbp-4], %edi
mov DWORD PTR [%rbp-8], %esi
cmp DWORD PTR [%rbp-4], 1
jne .L5
cmp DWORD PTR [%rbp-8], 65535
jne .L5
mov %edi, OFFSET FLAT:_ZSt8__ioinit
call _ZNSt8ios_base4InitC1Ev
mov %edx, OFFSET FLAT:__dso_handle
mov %esi, 0
mov %edi, OFFSET FLAT:__tcf_0
call __cxa_atexit
..L5:
leave
ret
..LFE1504:
.size _Z41__static_initialization_and_destruction_0ii, .-
_Z41__static_initialization_and_destruction_0ii
..globl __gxx_personality_v0
.align 2
.type _GLOBAL__I_main, @function
_GLOBAL__I_main:
..LFB1506:
push %rbp
..LCFI3:
mov %rbp, %rsp
..LCFI4:
mov %esi, 65535
mov %edi, 1
call _Z41__static_initialization_and_destruction_0ii
leave
ret
..LFE1506:
.size _GLOBAL__I_main, .-_GLOBAL__I_main
.align 2
.type __tcf_0, @function
__tcf_0:
..LFB1505:
push %rbp
..LCFI5:
mov %rbp, %rsp
..LCFI6:
sub %rsp, 16
..LCFI7:
mov QWORD PTR [%rbp-8], %rdi
mov %edi, OFFSET FLAT:_ZSt8__ioinit
call _ZNSt8ios_base4InitD1Ev
leave
ret
..LFE1505:
.size __tcf_0, .-__tcf_0
..globl __powidf2
.section .text._ZSt3powdi,"axG",@progbits,_ZSt3powdi,comdat
.align 2
.weak _ZSt3powdi
.type _ZSt3powdi, @function
_ZSt3powdi:
..LFB54:
push %rbp
..LCFI8:
mov %rbp, %rsp
..LCFI9:
sub %rsp, 32
..LCFI10:
movsd QWORD PTR [%rbp-8], %xmm0
mov DWORD PTR [%rbp-12], %edi
mov %edi, DWORD PTR [%rbp-12]
movlpd %xmm0, QWORD PTR [%rbp-8]
call __powidf2
movsd QWORD PTR [%rbp-24], %xmm0
mov %rax, QWORD PTR [%rbp-24]
mov QWORD PTR [%rbp-24], %rax
movlpd %xmm0, QWORD PTR [%rbp-24]
leave
ret
..LFE54:
.size _ZSt3powdi, .-_ZSt3powdi
.section .rodata
..LC1:
.string "pox(x, 1) = "
..LC2:
.string "pox(x, 2) = "
..LC3:
.string "pox(x, 3) = "
..LC4:
.string "pox(x, 4) = "
..LC5:
.string "pox(x, 5) = "
..LC6:
.string "pox(x, 6) = "
..LC7:
.string "pox(x, 7) = "
..LC8:
.string "pox(x, 8) = "
..LC9:
.string "pox(x, 9) = "
..LC10:
.string "pox(x, 10) = "
..LC11:
.string "pox(x, 11) = "
..LC12:
.string "pox(x, 12) = "
..LC13:
.string "pox(x, 13) = "
..LC14:
.string "pox(x, 14) = "
..LC15:
.string "pox(x, 15) = "
..LC16:
.string "pox(x, 16) = "
.text
.align 2
..globl main
.type main, @function
main:
..LFB1496:
push %rbp
..LCFI11:
mov %rbp, %rsp
..LCFI12:
push %rbx
..LCFI13:
sub %rsp, 24
..LCFI14:
movabs %rax, 4607182418800017408
mov QWORD PTR [%rbp-16], %rax
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 1
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC1
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 2
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC2
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 3
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC3
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 4
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC4
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 5
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC5
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 6
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC6
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 7
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC7
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 8
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC8
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 9
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC9
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 10
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC10
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 11
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC11
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 12
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC12
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 13
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC13
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 14
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC14
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 15
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC15
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 16
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC16
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES 5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostr eamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %eax, 0
add %rsp, 24
pop %rbx
leave
ret
..LFE1496:
.size main, .-main
.local _ZSt8__ioinit
.comm _ZSt8__ioinit,1,1
.weakref _Z20__gthrw_pthread_oncePiPFvvE,pthread_once
.weakref
_Z27__gthrw_pthread_getspecificj,pthread_getspecif ic
.weakref
_Z27__gthrw_pthread_setspecificjPKv,pthread_setspe cific
.weakref
_Z22__gthrw_pthread_createPmPK14pthread_attr_tPFPv S3_ES3_,pthread_create
.weakref _Z22__gthrw_pthread_cancelm,pthread_cancel
.weakref
_Z26__gthrw_pthread_mutex_lockP15pthread_mutex_t,p thread_mutex_lock
.weakref
_Z29__gthrw_pthread_mutex_trylockP15pthread_mutex_ t,pthread_mutex_trylock
.weakref
_Z28__gthrw_pthread_mutex_unlockP15pthread_mutex_t ,pthread_mutex_unlock
.weakref
_Z26__gthrw_pthread_mutex_initP15pthread_mutex_tPK 19pthread_mutexattr_t,pthread_mutex_init
.weakref
_Z26__gthrw_pthread_key_createPjPFvPvE,pthread_key _create
.weakref
_Z26__gthrw_pthread_key_deletej,pthread_key_delete
.weakref
_Z30__gthrw_pthread_mutexattr_initP19pthread_mutex attr_t,pthread_mutexattr_init
.weakref
_Z33__gthrw_pthread_mutexattr_settypeP19pthread_mu texattr_ti,pthread_mutexattr_settype
.weakref
_Z33__gthrw_pthread_mutexattr_destroyP19pthread_mu texattr_t,pthread_mutexattr_destroy
.section .eh_frame,"a",@progbits
..Lframe1:
.long .LECIE1-.LSCIE1
..LSCIE1:
.long 0x0
.byte 0x1
.string "zPR"
.uleb128 0x1
.sleb128 -8
.byte 0x10
.uleb128 0x6
.byte 0x3
.long __gxx_personality_v0
.byte 0x3
.byte 0xc
.uleb128 0x7
.uleb128 0x8
.byte 0x90
.uleb128 0x1
.align 8
..LECIE1:
..LSFDE1:
.long .LEFDE1-.LASFDE1
..LASFDE1:
.long .LASFDE1-.Lframe1
.long .LFB1504
.long .LFE1504-.LFB1504
.uleb128 0x0
.byte 0x4
.long .LCFI0-.LFB1504
.byte 0xe
.uleb128 0x10
.byte 0x86
.uleb128 0x2
.byte 0x4
.long .LCFI1-.LCFI0
.byte 0xd
.uleb128 0x6
.align 8
..LEFDE1:
..LSFDE3:
.long .LEFDE3-.LASFDE3
..LASFDE3:
.long .LASFDE3-.Lframe1
.long .LFB1506
.long .LFE1506-.LFB1506
.uleb128 0x0
.byte 0x4
.long .LCFI3-.LFB1506
.byte 0xe
.uleb128 0x10
.byte 0x86
.uleb128 0x2
.byte 0x4
.long .LCFI4-.LCFI3
.byte 0xd
.uleb128 0x6
.align 8
..LEFDE3:
..LSFDE5:
.long .LEFDE5-.LASFDE5
..LASFDE5:
.long .LASFDE5-.Lframe1
.long .LFB1505
.long .LFE1505-.LFB1505
.uleb128 0x0
.byte 0x4
.long .LCFI5-.LFB1505
.byte 0xe
.uleb128 0x10
.byte 0x86
.uleb128 0x2
.byte 0x4
.long .LCFI6-.LCFI5
.byte 0xd
.uleb128 0x6
.align 8
..LEFDE5:
..LSFDE7:
.long .LEFDE7-.LASFDE7
..LASFDE7:
.long .LASFDE7-.Lframe1
.long .LFB54
.long .LFE54-.LFB54
.uleb128 0x0
.byte 0x4
.long .LCFI8-.LFB54
.byte 0xe
.uleb128 0x10
.byte 0x86
.uleb128 0x2
.byte 0x4
.long .LCFI9-.LCFI8
.byte 0xd
.uleb128 0x6
.align 8
..LEFDE7:
..LSFDE9:
.long .LEFDE9-.LASFDE9
..LASFDE9:
.long .LASFDE9-.Lframe1
.long .LFB1496
.long .LFE1496-.LFB1496
.uleb128 0x0
.byte 0x4
.long .LCFI11-.LFB1496
.byte 0xe
.uleb128 0x10
.byte 0x86
.uleb128 0x2
.byte 0x4
.long .LCFI12-.LCFI11
.byte 0xd
.uleb128 0x6
.byte 0x4
.long .LCFI14-.LCFI12
.byte 0x83
.uleb128 0x3
.align 8
..LEFDE9:
.ident "GCC: (GNU) 4.1.2 20061115 (prerelease) (Debian
4.1.1-21)"
.section .note.GNU-stack,"",@progbits

Sep 14 '08 #11

Kai-Uwe Bux

Peng Yu wrote:

On Sep 13, 4:09 pm, "Alf P. Steinbach" <al...@start.nowrote:
>* Peng Yu:

Sometimes I want to know what the compiler compile
the code to.

e.g.

g++ -S -masm=intel x.cpp

It is pretty hard to figure out what part of assembly code is
associated with a give portion of source code.

Nobody said, it would be easy.

For example, the
following C++ and assembly code. How do I figure out where the pow
functions are at in the code?

[snip]

Did you try modifying one line of code and seeing which portion of the
assembly changed?
Best

Kai-Uwe Bux

Sep 14 '08 #12

Peng Yu

On Sep 13, 7:50 pm, Kai-Uwe Bux <jkherci...@gmx.netwrote:

Peng Yu wrote:
On Sep 13, 4:09 pm, "Alf P. Steinbach" <al...@start.nowrote:
* Peng Yu:

Sometimes I want to know what the compiler compile
the code to.

e.g.

g++ -S -masm=intel x.cpp

It is pretty hard to figure out what part of assembly code is
associated with a give portion of source code.

Nobody said, it would be easy.

For example, the
following C++ and assembly code. How do I figure out where the pow
functions are at in the code?

[snip]

Did you try modifying one line of code and seeing which portion of the
assembly changed?

g++ -O3 -S -masm=intel main.cc

I tried to compile the code and the variant of it. And I got the
difference by diff. But I still have difficulty to understand what it
does. Is there a way to annotate the C++ code in the assembly code?

Thanks,
Peng

Sep 14 '08 #13

=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=

On 2008-09-13 23:06, Peng Yu wrote:

On Sep 11, 4:56 pm, gpderetta <gpdere...@gmail.comwrote:
>On Sep 11, 6:39 pm, Juha Nieminen <nos...@thanks.invalidwrote:

gpderetta wrote:
Why? There is no reason for the compiler not to transformpow(x,
<integral-constant>) to the latter form if it were actually faster
(and in fact some compilers do).

Some compilers might be able to do that optimizations, others aren't.
And if n is a variable, then it cannot optimize it.

and if it is variable, you can't write an explicit expression either.
You could use a for loop

(At most thepow()
function itself might have optimizations in it, but in my experience it
doesn't: With most compilers it just generates the FPU opcodes necessary
to calculate the result.)

Today hand optimizations are tomorrow pessimizations. Let the compiler
do its job.

The usual rule apply: usepow, and only if the profiler tells it is a
bottleneck, try to optimize it by hand.

But we don't have to speculate about this as it's trivially easy to
test in practice. Go ahead and try it.

I had already tried. 'pow(x, 16)' is inlined exactly as four
multiplies, at least with a recent gcc.

Would you please let me know the details the procedure on how you
figure this out? Sometimes I want to know what the compiler compile
the code to.

In Visual Studio you can run the program in the debugger and then bring
up the assembly code and it will show you can step through it and switch
back and forth between the code and assembly code. I would imagine you
can do similar things in other IDEs and in gdb.

--
Erik WikstrÃ¶m

Sep 14 '08 #14

Ian Collins

Peng Yu wrote:

On Sep 13, 7:50 pm, Kai-Uwe Bux <jkherci...@gmx.netwrote:
>Peng Yu wrote:
>>On Sep 13, 4:09 pm, "Alf P. Steinbach" <al...@start.nowrote:
* Peng Yu:
Sometimes I want to know what the compiler compile
the code to.
e.g.
g++ -S -masm=intel x.cpp
It is pretty hard to figure out what part of assembly code is
associated with a give portion of source code.
Nobody said, it would be easy.

>>For example, the
following C++ and assembly code. How do I figure out where the pow
functions are at in the code?
[snip]

Did you try modifying one line of code and seeing which portion of the
assembly changed?

g++ -O3 -S -masm=intel main.cc

I tried to compile the code and the variant of it. And I got the
difference by diff. But I still have difficulty to understand what it
does. Is there a way to annotate the C++ code in the assembly code?

Some compilers (Sun CC for example) do so by default. If you are
working on Solaris or Linux, give it a try.

--
Ian Collins.

Sep 14 '08 #15

Peng Yu

On Sep 14, 4:37 am, Erik Wikström <Erik-wikst...@telia.comwrote:

On 2008-09-13 23:06, Peng Yu wrote:

On Sep 11, 4:56 pm, gpderetta <gpdere...@gmail.comwrote:
On Sep 11, 6:39 pm, Juha Nieminen <nos...@thanks.invalidwrote:

gpderetta wrote:
Why? There is no reason for the compiler not to transformpow(x,
<integral-constant>) to the latter form if it were actually faster
(and in fact some compilers do).

Some compilers might be able to do that optimizations, others aren't.
And if n is a variable, then it cannot optimize it.

and if it is variable, you can't write an explicit expression either.
You could use a for loop

(At most thepow()
function itself might have optimizations in it, but in my experienceit
doesn't: With most compilers it just generates the FPU opcodes necessary
to calculate the result.)

Today hand optimizations are tomorrow pessimizations. Let the compiler
do its job.

The usual rule apply: usepow, and only if the profiler tells it is a
bottleneck, try to optimize it by hand.

But we don't have to speculate about this as it's trivially easy to
test in practice. Go ahead and try it.

I had already tried. 'pow(x, 16)' is inlined exactly as four
multiplies, at least with a recent gcc.

Would you please let me know the details the procedure on how you
figure this out? Sometimes I want to know what the compiler compile
the code to.

In Visual Studio you can run the program in the debugger and then bring
up the assembly code and it will show you can step through it and switch
back and forth between the code and assembly code. I would imagine you
can do similar things in other IDEs and in gdb.

Shall there be a problem if I use -O3 option? The source code and the
assembly code might have one-one relationship.

Thanks,
Peng

Sep 14 '08 #16

=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=

On 2008-09-14 15:47, Peng Yu wrote:

On Sep 14, 4:37 am, Erik WikstrÃ¶m <Erik-wikst...@telia.comwrote:
>On 2008-09-13 23:06, Peng Yu wrote:

On Sep 11, 4:56 pm, gpderetta <gpdere...@gmail.comwrote:
On Sep 11, 6:39 pm, Juha Nieminen <nos...@thanks.invalidwrote:

>Today hand optimizations are tomorrow pessimizations. Let the compiler
do its job.

>The usual rule apply: usepow, and only if the profiler tells it is a
bottleneck, try to optimize it by hand.

But we don't have to speculate about this as it's trivially easy to
test in practice. Go ahead and try it.

>I had already tried. 'pow(x, 16)' is inlined exactly as four
multiplies, at least with a recent gcc.

Would you please let me know the details the procedure on how you
figure this out? Sometimes I want to know what the compiler compile
the code to.

In Visual Studio you can run the program in the debugger and then bring
up the assembly code and it will show you can step through it and switch
back and forth between the code and assembly code. I would imagine you
can do similar things in other IDEs and in gdb.

Shall there be a problem if I use -O3 option? The source code and the
assembly code might have one-one relationship.

There might be a problem if the compiler decides to remove some code
completely, otherwise no.

--
Erik WikstrÃ¶m

Sep 14 '08 #17

Sherm Pendley

Peng Yu <Pe*******@gmail.comwrites:

g++ -O3 -S -masm=intel main.cc

I tried to compile the code and the variant of it. And I got the
difference by diff. But I still have difficulty to understand what it
does. Is there a way to annotate the C++ code in the assembly code?

Yes, and I'd also disable optimization, which can make the generated
asm code more difficult to follow.

g++ -S -masm=intel -fverbose-asm main.cc

sherm--

--
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net

Sep 14 '08 #18

Peng Yu

On Sep 14, 10:49 am, Sherm Pendley <spamt...@dot-app.orgwrote:

Peng Yu <PengYu...@gmail.comwrites:
g++ -O3 -S -masm=intel main.cc

I tried to compile the code and the variant of it. And I got the
difference by diff. But I still have difficulty to understand what it
does. Is there a way to annotate the C++ code in the assembly code?

Yes, and I'd also disable optimization, which can make the generated
asm code more difficult to follow.

g++ -S -masm=intel -fverbose-asm main.cc

But I have to enable the optimization, because I want to know how the
compiler optimize std::pow(x,n).

Thanks,
Peng

Sep 14 '08 #19

When to use std::pow(x,n) instead of times x for n times?

Similar topics