Share this Question
P: n/a

* Alex Vinokur: Consider the following statement: n+i, where i = 1 or 0.
Is there more fast method for computing n+i than direct computing that sum?
That depends on the types involved.
For builtin numeric types, direct computation is probably fastest.
Measure if you're in doubt (and it really matters).

A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Topposting.
Q: What is the most annoying thing on usenet and in email?  
P: n/a

"Alex Vinokur" <al****@bigfoot.com> writes: Consider the following statement: n+i, where i = 1 or 0.
Is there more fast method for computing n+i than direct computing that sum?
The best way to compute n+0 is n.
The best way to compute n+1 is n+1; if the CPU provides something
faster than a general add instruction, the compiler will generate it
for you.

Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.  
P: n/a

In article <37*************@individual.net>,
Alex Vinokur <al****@bigfoot.com> wrote: Consider the following statement: n+i, where i = 1 or 0.
Is there more fast method for computing n+i than direct computing that sum?
Assuming n and i are ints, not on a modern general purpose computer.
Addition typically takes one cycle, once the operands are in
registers.
Any attempt to use a conditional will almost certainly be much slower.
For more details, try a newsgroup for the processor you're interested
in, or maybe comp.arch.
 Richard  
P: n/a

"Richard Tobin" <ri*****@cogsci.ed.ac.uk> wrote in message news:cv***********@pcnews.cogsci.ed.ac.uk... In article <37*************@individual.net>, Alex Vinokur <al****@bigfoot.com> wrote:
Consider the following statement: n+i, where i = 1 or 0.
Is there more fast method for computing n+i than direct computing that sum?
Assuming n and i are ints, not on a modern general purpose computer. Addition typically takes one cycle, once the operands are in registers.
Any attempt to use a conditional will almost certainly be much slower.
For more details, try a newsgroup for the processor you're interested in, or maybe comp.arch.
 Richard
I need that in C/C++ program.

Alex Vinokur
email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn  
P: n/a

Alex Vinokur wrote: "Richard Tobin" <ri*****@cogsci.ed.ac.uk> wrote in message news:cv***********@pcnews.cogsci.ed.ac.uk...
In article <37*************@individual.net>, Alex Vinokur <al****@bigfoot.com> wrote:
Consider the following statement: n+i, where i = 1 or 0.
Is there more fast method for computing n+i than direct computing that sum?
Assuming n and i are ints, not on a modern general purpose computer. Addition typically takes one cycle, once the operands are in registers.
Any attempt to use a conditional will almost certainly be much slower.
For more details, try a newsgroup for the processor you're interested in, or maybe comp.arch.
I need that in C/C++ program.
Well, there is no general truth helping you along to a portable,
always perfect solution.
If you want to optimise your code for speed, use a profiler to
determine which functions are called how often and take how much
time. Then you know _where_ you lose your time.
After that, try to find algorithms which reduce the number
of calls to small functions which take a good part of the overall
time and reduces the time spent in "big" functions taking much time.
If you afterwards really find that optimising code with
'n+0' and 'n+1' would be the best possible microoptimisation
to gain some more cycles, then you should try to write as many
'n+0's/'n's and 'n+1's as possible explicitly in your code
instead of using 'n+i'. The compiler will optimise that if the
code has the potential for optimisation.
Afterwards, use the profiler to determine whether this actually
makes a difference.
Probably not much.
If you think you can do better than the compiler, then follow
Richard's suggestion about comp.arch.*
Cheers
Michael

EMail: Mine is a gmx dot de address.  
P: n/a

In article <37*************@individual.net>,
Alex Vinokur <al****@bigfoot.com> wrote:
:Consider the following statement:
:n+i, where i = 1 or 0.
:Is there more fast method for computing n+i than direct computing that sum?
It depends on the costs you assign to the various operations  a
matter which is architecture dependant. Integer addition is usually one of
the fastest things a computer does. Suppose you were able to find a
two instruction sequence that was faster for that particular case: then
it is very likely to be slower because internally the CPU has
to perform an integer addition in order to find the address of the
second instruction.
Have you perhaps omitted some important facts about the circumstances?
For example, are you microprogramming, or is this a theory question
at the microlevel where each comparison and change of a bit in
the implimentation of the 'addition' operation is to be counted?
Is this an assignment in designing an IC which is faster for these
particular cases than building a fullblown adder circuit would be?

Reviewers should be required to produce a certain number of
negative reviews  like police given quotas for handing out
speeding tickets.  The Audio Anarchist  
P: n/a

Alex Vinokur wrote: Consider the following statement:
n + i, where i = 1 or 0.
Is there more fast method for computing n + i than direct computing that sum?
No.
But a good optimizing compiler should be able to
replace n + 0 with n and replace n + 1 with ++n.  
P: n/a

"Walter Roberson" <ro******@ibd.nrccnrc.gc.ca> wrote in message news:cv**********@canopus.cc.umanitoba.ca... In article <37*************@individual.net>, Alex Vinokur <al****@bigfoot.com> wrote: :Consider the following statement: :n+i, where i = 1 or 0.
:Is there more fast method for computing n+i than direct computing that sum?
It depends on the costs you assign to the various operations  a matter which is architecture dependant. Integer addition is usually one of the fastest things a computer does. Suppose you were able to find a two instruction sequence that was faster for that particular case: then it is very likely to be slower because internally the CPU has to perform an integer addition in order to find the address of the second instruction.
Have you perhaps omitted some important facts about the circumstances? For example, are you microprogramming, or is this a theory question at the microlevel where each comparison and change of a bit in the implimentation of the 'addition' operation is to be counted? Is this an assignment in designing an IC which is faster for these particular cases than building a fullblown adder circuit would be?
I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula.
The algorithm can be seen at http://groupsbeta.google.com/group/...e76b12150613a1
Function AddUnits() contains a line
n1 += (n2 + carry_s); // carry_s == 0 or 1
The question is if is it possible to make that line to work faster?

Alex Vinokur
email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn  
P: n/a

In article <cv**********@canopus.cc.umanitoba.ca>,
Walter Roberson <ro******@ibd.nrccnrc.gc.ca> wrote:
In article <37*************@individual.net>,
Alex Vinokur <al****@bigfoot.com> wrote:
:Consider the following statement:
:n+i, where i = 1 or 0.
:Is there more fast method for computing n+i than direct computing that sum?
It depends on the costs you assign to the various operations  a
matter which is architecture dependant.
There is a possibility that would be slower in any real
architecture that I've ever heard of, but which could be faster
under very narrow circumstances.
(n&1) ? (n+i) : (ni)
The narrow circumstances under which this could be faster are:
 this is within a tight loop that fits within the processor's
primary instruction cache
 the processor has a "move conditional" operation that
avoids taking an actual branch when the operations are
simple enough and the result is being used arithmetically
instead of to control a branch
 at the microcode level, the processor "runs free"
when working from instruction cache, processing each
instruction as fast as possible instead of working
on a buscycle system (which is needed in most cases
when anything outside the primary cache is being referenced)
 the cost of the bitwise AND operation plus the cost of the
comparison to 0 plus the cost of the bitwise OR operation,
are faster than the cost of a full addition
I have heard of one architecture (I don't recall which)
that had a "move conditional" operation that took
a test condition and two arithmetic operations as operands,
and would start doing the two artihmetic operations in
parallel at the same time it was doing the test; when the
result of the test was available, it would abort the false
branch if it was not already finished, with the result
being whichever of the arithmetic expressions was selected
by the condition.
Please note how narrow these conditions are: you would
have to know a LOT about your processor to make this kind
of optimization: the expression I give above will be slower
than a straight addition on nearly every architecture.
Addition is usually hardcoded through a series of
transistors, with the carry circuit taking most of the
landscape. It's hard to beat transistorlevel speeds
by using multiple instructions.
I have heard that some architectures internally
optimize +0 and +1; there would be no way to beat that...
but again you would need to know intimate details of
the architecture.

I don't know if there's destiny,
but there's a decision!  Wim Wenders (WoD)  
P: n/a

"Alex Vinokur" <al****@bigfoot.com> wrote in message
news:37*************@individual.net... Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1
The question is if is it possible to make that line to work faster?
What fraction of your program's total execution time does this statement
consume?
Until you know the answer to this question, you don't know whether it's even
worth trying to change it, let alone the best way of doing so.  
P: n/a

Alex Vinokur wrote: I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula. The algorithm can be seen at
http://groupsbeta.google.com/group/...e76b12150613a1
Function AddUnits() contains a line
n1 += (n2 + carry_s); // carry_s == 0 or 1
The question is if is it possible to make that line to work faster?
No.  
P: n/a

On 20050218 12:54:57 0500, "Alex Vinokur" <al****@bigfoot.com> said: "Walter Roberson" <ro******@ibd.nrccnrc.gc.ca> wrote in message news:cv**********@canopus.cc.umanitoba.ca... In article <37*************@individual.net>, Alex Vinokur <al****@bigfoot.com> wrote: :Consider the following statement: :n+i, where i = 1 or 0.
:Is there more fast method for computing n+i than direct computing that sum?
It depends on the costs you assign to the various operations  a matter which is architecture dependant. Integer addition is usually one of the fastest things a computer does. Suppose you were able to find a two instruction sequence that was faster for that particular case: then it is very likely to be slower because internally the CPU has to perform an integer addition in order to find the address of the second instruction.
Have you perhaps omitted some important facts about the circumstances? For example, are you microprogramming, or is this a theory question at the microlevel where each comparison and change of a bit in the implimentation of the 'addition' operation is to be counted? Is this an assignment in designing an IC which is faster for these particular cases than building a fullblown adder circuit would be?
I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula. The algorithm can be seen at http://groupsbeta.google.com/group/...e76b12150613a1
Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1
The question is if is it possible to make that line to work faster?
No, the question is: "Is that line the bottleneck?" How do you know
that line is the problem? Have you measured the performance of your
code?

Clark S. Cox, III cl*******@gmail.com  
P: n/a

"Alex Vinokur" <al****@bigfoot.com> writes:
[...] I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula. The algorithm can be seen at http://groupsbeta.google.com/group/...e76b12150613a1
Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1
The question is if is it possible to make that line to work faster?
Use the maximum optimization level your compiler provides (you're
probably already doing this). Use a better compiler if you can find
one. Use a faster computer. Kick other users off the system so you
get 100% of the CPU.
As others have mentioned, there's little point in trying to optimize
this one line unless you've actually made measurements that indicate
that it's a bottleneck. Even if you've done that, there's no reliable
portable way in standard C to improve the performance of that line of
code.
It's conceivable that a compiler can generate better code if it
happens to know that carry_s is either 0 or 1. It might be able to
infer this by dataflow analysis, depending on how carry_s is set. If
you have a C99 compiler, making carry_s a _Bool <OT>or bool if you're
using C++</OT> might help (or it might hurt).
The following:
n1 += n2;
if (carry_s) n1++;
might give you better or worse performance, or exactly the same,
depending on the CPU architecture, the compiler, and the phase of the
moon. (The "if" is likely to cause a branch, which can screw up
pipelining  or the compiler may be able to use some special CPU
instruction that does exactly what's needed.)
If this line really is a serious bottleneck, you might consider
writing it in several equivalent ways and choosing among them with a
macro:
#if METHOD == 1
n1 += (n2 + carry_s);
#elif METHOD == 2
n1 += n2;
if (carry_s) n1++;
#elif METHOD == 3
/* something else */
#else
#error METHOD is undefined or invalid.
#endif
For a given platform, try compiling and running your program with each
defined METHOD, and *measure the results*. (You can also examine the
assembly listing; this can tell you if two methods result in the same
code, but won't necessarily tell you which is better unless you're an
expert in the particular CPU.) Expect the tradeoffs to change with
the next release of the compiler or a different version of the CPU.
Or, if you don't care about portability, you can code it in assembly
language (which we can't help you with here). Consider using your
compiler's output as a guide.
Again, all this assumes that that one line really is a serious
bottleneck. The only way to know this is to profile your code. If
it's not a bottleneck, just write it as straightfowardly as possible
and spend your effort elsewhere.

Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.  
P: n/a

In article <2005021814290016807%clarkcox3@gmailcom>,
Clark S. Cox III <cl*******@gmail.com> wrote:
:On 20050218 12:54:57 0500, "Alex Vinokur" <al****@bigfoot.com> said:
:> n1 += (n2 + carry_s); // carry_s == 0 or 1
:> The question is if is it possible to make that line to work faster?
:No, the question is: "Is that line the bottleneck?"
I profiled his code here on a particular platform. The line he is
asking about is the fastest part of that function. The startup code
for the function itself is a hair slower; the line after the above line
takes about 3 times as long as the +=, and the code for the return statement
after that takes a bit longer still.

Entropy is the logarithm of probability  Boltzmann  
P: n/a

* Alex Vinokur: I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula. The algorithm can be seen at http://groupsbeta.google.com/group/...e76b12150613a1
Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1
The question is if is it possible to make that line to work faster?
Attacking an optimization problem at the level of fundamental additions is
seldom a Good Idea.
Thinking about what goes on is almost always a Better Idea.
Almost any way of computing Fibonacci numbers is faster than the recursive
formula. But you're not using the recursive formula directly, you're summing
iteratively, storing results in a std::vector of std::vector. Most of the
time will, I gather, be spent in internal new and delete operations, and in
the operating system's virtual memory swapping to and from disk, so
possibly you can optimize _a lot_ by first computing the approximate Fib
number using double arithmetic (check out the Golden Ratio), then allocate
just what you need of memory for that single number, and then compute the
number exactly.
Hth.,
 Alf

A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Topposting.
Q: What is the most annoying thing on usenet and in email?  
P: n/a

In article <cv**********@canopus.cc.umanitoba.ca>,
Walter Roberson <ro******@ibd.nrccnrc.gc.ca> wrote:
In article <2005021814290016807%clarkcox3@gmailcom>,
Clark S. Cox III <cl*******@gmail.com> wrote:
:No, the question is: "Is that line the bottleneck?"
I profiled his code here on a particular platform. The line he is
asking about is the fastest part of that function.
I recompiled with aggressive optimizations, interprocedural analysis,
loop unrolling, and telling the compiler it was okay to mix code
together in ways that make it difficult to tell exactly which line you
are on.
When I turned on all those optimizations, a sample run with hardware
profiling counted 9724 against the line the OP pointed out,
3195 against the next line, and 637 against the return.
Thus, if you were naive about what the profiling output really means in
the face of high optimization, then you could end up drawing the
conclusion that it was the add that was slow.

Submillibarn resolution biohyperdimensional plasmatic space
polyimaging is just around the corner.  Corry Lee Smith  
P: n/a

In article <42****************@news.individual.net>,
Alf P. Steinbach <al***@start.no> wrote:
:Almost any way of computing Fibonacci numbers is faster than the recursive
:formula. But you're not using the recursive formula directly, you're summing
:iteratively, storing results in a std::vector of std::vector. Most of the
:time will, I gather, be spent in internal new and delete operations, and in
:the operating system's virtual memory swapping to and from disk,
That's a good thought, but my profiling experiments on his code show
that the amount of time spent in those areas is in the noise level,
with the arithmetic functions of the routine the OP indicate
being the bottleneck.
The line he indicated is not the bottleneck, but my experiments show
that if you are using high optimization in combination with profiling,
that the profiler can end up accounting the addition line as if it
was about 3/4 of the execution time. It's an artifact of loop
unrolling and similar.

Cannot open .signature: Permission denied  
P: n/a

* Walter Roberson: In article <42****************@news.individual.net>, Alf P. Steinbach <al***@start.no> wrote: :Almost any way of computing Fibonacci numbers is faster than the recursive :formula. But you're not using the recursive formula directly, you're summing :iteratively, storing results in a std::vector of std::vector. Most of the :time will, I gather, be spent in internal new and delete operations, and in :the operating system's virtual memory swapping to and from disk,
That's a good thought, but my profiling experiments on his code show that the amount of time spent in those areas is in the noise level, with the arithmetic functions of the routine the OP indicate being the bottleneck.
Did you profile for _large_ Fib numbers, numbers much greater than can be
represented by ordinary 'long', which is what the code seems to be all about?
And does your profiler account for outofprocess time such as e.g. swapping?
Profiling is a tricky business, and without analyzing that code in detail
(I just skimmed it) it seemed to me have at least O(n log n) memory
consumption for computation of a Fib number number n...
The line he indicated is not the bottleneck, but my experiments show that if you are using high optimization in combination with profiling, that the profiler can end up accounting the addition line as if it was about 3/4 of the execution time. It's an artifact of loop unrolling and similar.
Yes... ;)

A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Topposting.
Q: What is the most annoying thing on usenet and in email?  
P: n/a

In article <42****************@news.individual.net>,
Alf P. Steinbach <al***@start.no> wrote:
* Walter Roberson:
> That's a good thought, but my profiling experiments on his code show
> that the amount of time spent in those areas is in the noise level,
> with the arithmetic functions of the routine the OP indicate
> being the bottleneck.
Did you profile for _large_ Fib numbers, numbers much greater than can be
represented by ordinary 'long', which is what the code seems to be all about?
75000
:And does your profiler account for outofprocess time such as e.g. swapping?
I was using a hardware program counter sampler, so outofband times
would not have been included.
The system I was running on has over a gigabyte of available memory.. it
would take rather some time to get to the point of swapping.
:Profiling is a tricky business, and without analyzing that code in detail
:(I just skimmed it) it seemed to me have at least O(n log n) memory
:consumption for computation of a Fib number number n...
Experimentally, it is O(n^2) in time.
O(n log n) would take about n = 2^26 to fill a gigabyte.
On the system I am testing on (which is a decade old, 250 MHz),
the lapsed time is fairly close to 1 minute for n = 150000.
At that rate, it would take very close to 139 days of computation
to fill a gigabyte.
Other architectures would, of course, have different constants of
proportionality.

Rome was built one paycheck at a time.  Walter Roberson  
P: n/a

"Clark S. Cox III" <cl*******@gmail.com> wrote in message news:2005021814290016807%clarkcox3@gmailcom... On 20050218 12:54:57 0500, "Alex Vinokur" <al****@bigfoot.com> said:
[snip] I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula. The algorithm can be seen at http://groupsbeta.google.com/group/...e76b12150613a1
Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1
The question is if is it possible to make that line to work faster?
No, the question is: "Is that line the bottleneck?"
I think function AddUnits() is the bottleneck. So, it is worth improving speed of that function.
How do you know that line is the problem? Have you measured the performance of your code?
Yes. The program contains builtin speed measurement's (simple) tool.
[snip]

Alex Vinokur
email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn  
P: n/a

"Alf P. Steinbach" <al***@start.no> wrote in message news:42****************@news.individual.net... * Alex Vinokur: I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula. The algorithm can be seen at http://groupsbeta.google.com/group/...e76b12150613a1
Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1
The question is if is it possible to make that line to work faster? Attacking an optimization problem at the level of fundamental additions is seldom a Good Idea.
Thinking about what goes on is almost always a Better Idea.
Almost any way of computing Fibonacci numbers is faster than the recursive formula.
I am interested in creating fast C++algorithm using the primary recursive formula.
But you're not using the recursive formula directly, you're summing iteratively, storing results in a std::vector of std::vector.
Results are storing in std::vector Fibonacci::fibs_.
Computing Fib(n):
If Fibonacci::fibs_.size() > n then Fib(n) = Fibonacci::fibs_[n];
else Fib(n) is computed according to the primary recursive formula.
[snip]

Alex Vinokur
email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn  
P: n/a

"Walter Roberson" <ro******@ibd.nrccnrc.gc.ca> wrote in message news:cv**********@canopus.cc.umanitoba.ca... In article <cv**********@canopus.cc.umanitoba.ca>, Walter Roberson <ro******@ibd.nrccnrc.gc.ca> wrote: In article <37*************@individual.net>, Alex Vinokur <al****@bigfoot.com> wrote:
:Consider the following statement: :n+i, where i = 1 or 0. :Is there more fast method for computing n+i than direct computing that sum?
It depends on the costs you assign to the various operations  a matter which is architecture dependant.
There is a possibility that would be slower in any real architecture that I've ever heard of, but which could be faster under very narrow circumstances.
(n&1) ? (n+i) : (ni)
[snip]
In g++ 3.3.3 (Mingw) on Windows 2000 that doesn't improve the speed.

Alex Vinokur
email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn  
P: n/a

"Alex Vinokur" <al****@bigfoot.com> wrote: "Richard Tobin" <ri*****@cogsci.ed.ac.uk> wrote in message news:cv***********@pcnews.cogsci.ed.ac.uk... In article <37*************@individual.net>, Alex Vinokur <al****@bigfoot.com> wrote:
n+i, where i = 1 or 0.
Is there more fast method for computing n+i than direct computing that sum?
Assuming n and i are ints, not on a modern general purpose computer. Addition typically takes one cycle, once the operands are in registers.
For more details, try a newsgroup for the processor you're interested in, or maybe comp.arch.
I need that in C/C++ program.
_Need_ that? Are you sure? Have you measured it? Frankly, I doubt that,
even if you _do_ find a highly systemspecific measure to perhaps shave
a millicycle or so off your addition, the effect will be very
noticable. If your program spends a lot of time adding 0 or 1 to an
integer, it is highly probable that either you have an inefficient
(probably multiple) loop in which this is merely the inner statement, or
your problem is just that timeconsuming.
Richard   This discussion thread is closed Replies have been disabled for this discussion.   Question stats  viewed: 2285
 replies: 24
 date asked: Jul 23 '05
