460,023 Members | 1,297 Online
Need help? Post your question and get tips & solutions from a community of 460,023 IT Pros & Developers. It's quick & easy.

# Fast addition for n+1 or n+0

 P: n/a Consider the following statement: n+i, where i = 1 or 0. Is there more fast method for computing n+i than direct computing that sum? -- Alex Vinokur email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn Jul 23 '05 #1
24 Replies

 P: n/a * Alex Vinokur: Consider the following statement: n+i, where i = 1 or 0. Is there more fast method for computing n+i than direct computing that sum? That depends on the types involved. For built-in numeric types, direct computation is probably fastest. Measure if you're in doubt (and it really matters). -- A: Because it messes up the order in which people normally read text. Q: Why is it such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? Jul 23 '05 #2

 P: n/a "Alex Vinokur" writes: Consider the following statement: n+i, where i = 1 or 0. Is there more fast method for computing n+i than direct computing that sum? The best way to compute n+0 is n. The best way to compute n+1 is n+1; if the CPU provides something faster than a general add instruction, the compiler will generate it for you. -- Keith Thompson (The_Other_Keith) ks***@mib.org San Diego Supercomputer Center <*> We must do something. This is something. Therefore, we must do this. Jul 23 '05 #3

 P: n/a Alex Vinokur wrote: Consider the following statement: n+i, where i = 1 or 0. Is there more fast method for computing n+i than direct computing that sum? Assuming integers, hardware addition is implemented simply using full adders, or faster algorithms like carry lookahead. n+0 has no carries is is fast; many compliers will constant fold to n n+1 has potentially m carries in m-bit arithmetic Full adder: http://isweb.redwoods.cc.ca.us/INSTR...logic/full.htm Carry look ahead: http://www.seas.upenn.edu/~ee201/lab...kAheadF01.html gtoomey www.gregorytoomey.com Jul 23 '05 #4

 P: n/a In article <37*************@individual.net>, Alex Vinokur wrote: Consider the following statement:n+i, where i = 1 or 0.Is there more fast method for computing n+i than direct computing that sum? Assuming n and i are ints, not on a modern general purpose computer. Addition typically takes one cycle, once the operands are in registers. Any attempt to use a conditional will almost certainly be much slower. For more details, try a newsgroup for the processor you're interested in, or maybe comp.arch. -- Richard Jul 23 '05 #5

 P: n/a "Richard Tobin" wrote in message news:cv***********@pc-news.cogsci.ed.ac.uk... In article <37*************@individual.net>, Alex Vinokur wrote:Consider the following statement:n+i, where i = 1 or 0.Is there more fast method for computing n+i than direct computing that sum? Assuming n and i are ints, not on a modern general purpose computer. Addition typically takes one cycle, once the operands are in registers. Any attempt to use a conditional will almost certainly be much slower. For more details, try a newsgroup for the processor you're interested in, or maybe comp.arch. -- Richard I need that in C/C++ program. -- Alex Vinokur email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn Jul 23 '05 #6

 P: n/a Alex Vinokur wrote: "Richard Tobin" wrote in message news:cv***********@pc-news.cogsci.ed.ac.uk...In article <37*************@individual.net>,Alex Vinokur wrote:Consider the following statement:n+i, where i = 1 or 0.Is there more fast method for computing n+i than direct computing that sum?Assuming n and i are ints, not on a modern general purpose computer.Addition typically takes one cycle, once the operands are inregisters.Any attempt to use a conditional will almost certainly be much slower.For more details, try a newsgroup for the processor you're interestedin, or maybe comp.arch. I need that in C/C++ program. Well, there is no general truth helping you along to a portable, always perfect solution. If you want to optimise your code for speed, use a profiler to determine which functions are called how often and take how much time. Then you know _where_ you lose your time. After that, try to find algorithms which reduce the number of calls to small functions which take a good part of the overall time and reduces the time spent in "big" functions taking much time. If you afterwards really find that optimising code with 'n+0' and 'n+1' would be the best possible micro-optimisation to gain some more cycles, then you should try to write as many 'n+0's/'n's and 'n+1's as possible explicitly in your code instead of using 'n+i'. The compiler will optimise that if the code has the potential for optimisation. Afterwards, use the profiler to determine whether this actually makes a difference. Probably not much. If you think you can do better than the compiler, then follow Richard's suggestion about comp.arch.* Cheers Michael -- E-Mail: Mine is a gmx dot de address. Jul 23 '05 #7

 P: n/a In article <37*************@individual.net>, Alex Vinokur wrote: :Consider the following statement: :n+i, where i = 1 or 0. :Is there more fast method for computing n+i than direct computing that sum? It depends on the costs you assign to the various operations -- a matter which is architecture dependant. Integer addition is usually one of the fastest things a computer does. Suppose you were able to find a two instruction sequence that was faster for that particular case: then it is very likely to be slower because internally the CPU has to perform an integer addition in order to find the address of the second instruction. Have you perhaps omitted some important facts about the circumstances? For example, are you microprogramming, or is this a theory question at the micro-level where each comparison and change of a bit in the implimentation of the 'addition' operation is to be counted? Is this an assignment in designing an IC which is faster for these particular cases than building a full-blown adder circuit would be? -- Reviewers should be required to produce a certain number of negative reviews - like police given quotas for handing out speeding tickets. -- The Audio Anarchist Jul 23 '05 #8

 P: n/a Alex Vinokur wrote: Consider the following statement: n + i, where i = 1 or 0. Is there more fast method for computing n + i than direct computing that sum? No. But a good optimizing compiler should be able to replace n + 0 with n and replace n + 1 with ++n. Jul 23 '05 #9

 P: n/a "Walter Roberson" wrote in message news:cv**********@canopus.cc.umanitoba.ca... In article <37*************@individual.net>, Alex Vinokur wrote: :Consider the following statement: :n+i, where i = 1 or 0. :Is there more fast method for computing n+i than direct computing that sum? It depends on the costs you assign to the various operations -- a matter which is architecture dependant. Integer addition is usually one of the fastest things a computer does. Suppose you were able to find a two instruction sequence that was faster for that particular case: then it is very likely to be slower because internally the CPU has to perform an integer addition in order to find the address of the second instruction. Have you perhaps omitted some important facts about the circumstances? For example, are you microprogramming, or is this a theory question at the micro-level where each comparison and change of a bit in the implimentation of the 'addition' operation is to be counted? Is this an assignment in designing an IC which is faster for these particular cases than building a full-blown adder circuit would be? I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula. The algorithm can be seen at http://groups-beta.google.com/group/...e76b12150613a1 Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1 The question is if is it possible to make that line to work faster? -- Alex Vinokur email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn Jul 23 '05 #10

 P: n/a In article , Walter Roberson wrote: |In article <37*************@individual.net>, |Alex Vinokur wrote: |:Consider the following statement: |:n+i, where i = 1 or 0. |:Is there more fast method for computing n+i than direct computing that sum? |It depends on the costs you assign to the various operations -- a |matter which is architecture dependant. There is a possibility that would be slower in any real architecture that I've ever heard of, but which could be faster under very narrow circumstances. (n&1) ? (n+i) : (n|i) The narrow circumstances under which this could be faster are: - this is within a tight loop that fits within the processor's primary instruction cache - the processor has a "move conditional" operation that avoids taking an actual branch when the operations are simple enough and the result is being used arithmetically instead of to control a branch - at the microcode level, the processor "runs free" when working from instruction cache, processing each instruction as fast as possible instead of working on a bus-cycle system (which is needed in most cases when anything outside the primary cache is being referenced) - the cost of the bitwise AND operation plus the cost of the comparison to 0 plus the cost of the bitwise OR operation, are faster than the cost of a full addition I have heard of one architecture (I don't recall which) that had a "move conditional" operation that took a test condition and two arithmetic operations as operands, and would start doing the two artihmetic operations in parallel at the same time it was doing the test; when the result of the test was available, it would abort the false branch if it was not already finished, with the result being whichever of the arithmetic expressions was selected by the condition. Please note how narrow these conditions are: you would have to know a LOT about your processor to make this kind of optimization: the expression I give above will be slower than a straight addition on nearly every architecture. Addition is usually hard-coded through a series of transistors, with the carry circuit taking most of the landscape. It's hard to beat transistor-level speeds by using multiple instructions. I have heard that some architectures internally optimize +0 and +1; there would be no way to beat that... but again you would need to know intimate details of the architecture. -- I don't know if there's destiny, but there's a decision! -- Wim Wenders (WoD) Jul 23 '05 #11

 P: n/a "Alex Vinokur" wrote in message news:37*************@individual.net... Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1 The question is if is it possible to make that line to work faster? What fraction of your program's total execution time does this statement consume? Until you know the answer to this question, you don't know whether it's even worth trying to change it, let alone the best way of doing so. Jul 23 '05 #12

 P: n/a Alex Vinokur wrote: I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula. The algorithm can be seen at http://groups-beta.google.com/group/...e76b12150613a1 Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1 The question is if is it possible to make that line to work faster? No. Jul 23 '05 #13

 P: n/a On 2005-02-18 12:54:57 -0500, "Alex Vinokur" said: "Walter Roberson" wrote in message news:cv**********@canopus.cc.umanitoba.ca... In article <37*************@individual.net>, Alex Vinokur wrote: :Consider the following statement: :n+i, where i = 1 or 0. :Is there more fast method for computing n+i than direct computing that sum? It depends on the costs you assign to the various operations -- a matter which is architecture dependant. Integer addition is usually one of the fastest things a computer does. Suppose you were able to find a two instruction sequence that was faster for that particular case: then it is very likely to be slower because internally the CPU has to perform an integer addition in order to find the address of the second instruction. Have you perhaps omitted some important facts about the circumstances? For example, are you microprogramming, or is this a theory question at the micro-level where each comparison and change of a bit in the implimentation of the 'addition' operation is to be counted? Is this an assignment in designing an IC which is faster for these particular cases than building a full-blown adder circuit would be? I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula. The algorithm can be seen at http://groups-beta.google.com/group/...e76b12150613a1 Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1 The question is if is it possible to make that line to work faster? No, the question is: "Is that line the bottleneck?" How do you know that line is the problem? Have you measured the performance of your code? -- Clark S. Cox, III cl*******@gmail.com Jul 23 '05 #14

 P: n/a In article <2005021814290016807%clarkcox3@gmailcom>, Clark S. Cox III wrote: :On 2005-02-18 12:54:57 -0500, "Alex Vinokur" said: :> n1 += (n2 + carry_s); // carry_s == 0 or 1 :> The question is if is it possible to make that line to work faster? :No, the question is: "Is that line the bottleneck?" I profiled his code here on a particular platform. The line he is asking about is the -fastest- part of that function. The startup code for the function itself is a hair slower; the line after the above line takes about 3 times as long as the +=, and the code for the return statement after that takes a bit longer still. -- Entropy is the logarithm of probability -- Boltzmann Jul 23 '05 #16

 P: n/a * Alex Vinokur: I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula. The algorithm can be seen at http://groups-beta.google.com/group/...e76b12150613a1 Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1 The question is if is it possible to make that line to work faster? Attacking an optimization problem at the level of fundamental additions is seldom a Good Idea. Thinking about what goes on is almost always a Better Idea. Almost any way of computing Fibonacci numbers is faster than the recursive formula. But you're not using the recursive formula directly, you're summing iteratively, storing results in a std::vector of std::vector. Most of the time will, I gather, be spent in internal new and delete operations, and in the operating system's virtual memory swapping to and from disk, so possibly you can optimize _a lot_ by first computing the approximate Fib number using double arithmetic (check out the Golden Ratio), then allocate just what you need of memory for that single number, and then compute the number exactly. Hth., - Alf -- A: Because it messes up the order in which people normally read text. Q: Why is it such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? Jul 23 '05 #17

 P: n/a In article , Walter Roberson wrote: |In article <2005021814290016807%clarkcox3@gmailcom>, |Clark S. Cox III wrote: |:No, the question is: "Is that line the bottleneck?" |I profiled his code here on a particular platform. The line he is |asking about is the -fastest- part of that function. I recompiled with aggressive optimizations, interprocedural analysis, loop unrolling, and telling the compiler it was okay to mix code together in ways that make it difficult to tell exactly which line you are on. When I turned on all those optimizations, a sample run with hardware profiling counted 9724 against the line the OP pointed out, 3195 against the next line, and 637 against the return. Thus, if you were naive about what the profiling output really means in the face of high optimization, then you could end up drawing the conclusion that it was the add that was slow. -- Sub-millibarn resolution bio-hyperdimensional plasmatic space polyimaging is just around the corner. -- Corry Lee Smith Jul 23 '05 #18

 P: n/a In article <42****************@news.individual.net>, Alf P. Steinbach wrote: :Almost any way of computing Fibonacci numbers is faster than the recursive :formula. But you're not using the recursive formula directly, you're summing :iteratively, storing results in a std::vector of std::vector. Most of the :time will, I gather, be spent in internal new and delete operations, and in :the operating system's virtual memory swapping to and from disk, That's a good thought, but my profiling experiments on his code show that the amount of time spent in those areas is in the noise level, with the arithmetic functions of the routine the OP indicate being the bottleneck. The line he indicated is not the bottleneck, but my experiments show that if you are using high optimization in combination with profiling, that the profiler can end up accounting the addition line as if it was about 3/4 of the execution time. It's an artifact of loop unrolling and similar. -- Cannot open .signature: Permission denied Jul 23 '05 #19

 P: n/a * Walter Roberson: In article <42****************@news.individual.net>, Alf P. Steinbach wrote: :Almost any way of computing Fibonacci numbers is faster than the recursive :formula. But you're not using the recursive formula directly, you're summing :iteratively, storing results in a std::vector of std::vector. Most of the :time will, I gather, be spent in internal new and delete operations, and in :the operating system's virtual memory swapping to and from disk, That's a good thought, but my profiling experiments on his code show that the amount of time spent in those areas is in the noise level, with the arithmetic functions of the routine the OP indicate being the bottleneck. Did you profile for _large_ Fib numbers, numbers much greater than can be represented by ordinary 'long', which is what the code seems to be all about? And does your profiler account for out-of-process time such as e.g. swapping? Profiling is a tricky business, and without analyzing that code in detail (I just skimmed it) it seemed to me have at least O(n log n) memory consumption for computation of a Fib number number n... The line he indicated is not the bottleneck, but my experiments show that if you are using high optimization in combination with profiling, that the profiler can end up accounting the addition line as if it was about 3/4 of the execution time. It's an artifact of loop unrolling and similar. Yes... ;-) -- A: Because it messes up the order in which people normally read text. Q: Why is it such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? Jul 23 '05 #20

 P: n/a In article <42****************@news.individual.net>, Alf P. Steinbach wrote: |* Walter Roberson: |> That's a good thought, but my profiling experiments on his code show |> that the amount of time spent in those areas is in the noise level, |> with the arithmetic functions of the routine the OP indicate |> being the bottleneck. |Did you profile for _large_ Fib numbers, numbers much greater than can be |represented by ordinary 'long', which is what the code seems to be all about? 75000 :And does your profiler account for out-of-process time such as e.g. swapping? I was using a hardware program counter sampler, so out-of-band times would not have been included. The system I was running on has over a gigabyte of available memory.. it would take rather some time to get to the point of swapping. :Profiling is a tricky business, and without analyzing that code in detail :(I just skimmed it) it seemed to me have at least O(n log n) memory :consumption for computation of a Fib number number n... Experimentally, it is O(n^2) in time. O(n log n) would take about n = 2^26 to fill a gigabyte. On the system I am testing on (which is a decade old, 250 MHz), the lapsed time is fairly close to 1 minute for n = 150000. At that rate, it would take very close to 139 days of computation to fill a gigabyte. Other architectures would, of course, have different constants of proportionality. -- Rome was built one paycheck at a time. -- Walter Roberson Jul 23 '05 #21

 P: n/a "Clark S. Cox III" wrote in message news:2005021814290016807%clarkcox3@gmailcom... On 2005-02-18 12:54:57 -0500, "Alex Vinokur" said: [snip] I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula. The algorithm can be seen at http://groups-beta.google.com/group/...e76b12150613a1 Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1 The question is if is it possible to make that line to work faster? No, the question is: "Is that line the bottleneck?" I think function AddUnits() is the bottleneck. So, it is worth improving speed of that function. How do you know that line is the problem? Have you measured the performance of your code? Yes. The program contains built-in speed measurement's (simple) tool. [snip] -- Alex Vinokur email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn Jul 23 '05 #22

 P: n/a "Alf P. Steinbach" wrote in message news:42****************@news.individual.net... * Alex Vinokur: I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula. The algorithm can be seen at http://groups-beta.google.com/group/...e76b12150613a1 Function AddUnits() contains a line n1 += (n2 + carry_s); // carry_s == 0 or 1 The question is if is it possible to make that line to work faster? Attacking an optimization problem at the level of fundamental additions is seldom a Good Idea. Thinking about what goes on is almost always a Better Idea. Almost any way of computing Fibonacci numbers is faster than the recursive formula. I am interested in creating fast C++-algorithm using the primary recursive formula. But you're not using the recursive formula directly, you're summing iteratively, storing results in a std::vector of std::vector. Results are storing in std::vector Fibonacci::fibs_. Computing Fib(n): If Fibonacci::fibs_.size() > n then Fib(n) = Fibonacci::fibs_[n]; else Fib(n) is computed according to the primary recursive formula. [snip] -- Alex Vinokur email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn Jul 23 '05 #23

 P: n/a "Walter Roberson" wrote in message news:cv**********@canopus.cc.umanitoba.ca... In article , Walter Roberson wrote: |In article <37*************@individual.net>, |Alex Vinokur wrote: |:Consider the following statement: |:n+i, where i = 1 or 0. |:Is there more fast method for computing n+i than direct computing that sum? |It depends on the costs you assign to the various operations -- a |matter which is architecture dependant. There is a possibility that would be slower in any real architecture that I've ever heard of, but which could be faster under very narrow circumstances. (n&1) ? (n+i) : (n|i) [snip] In g++ 3.3.3 (Mingw) on Windows 2000 that doesn't improve the speed. -- Alex Vinokur email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn Jul 23 '05 #24

 P: n/a "Alex Vinokur" wrote: "Richard Tobin" wrote in message news:cv***********@pc-news.cogsci.ed.ac.uk... In article <37*************@individual.net>, Alex Vinokur wrote:n+i, where i = 1 or 0.Is there more fast method for computing n+i than direct computing that sum? Assuming n and i are ints, not on a modern general purpose computer. Addition typically takes one cycle, once the operands are in registers. For more details, try a newsgroup for the processor you're interested in, or maybe comp.arch. I need that in C/C++ program. _Need_ that? Are you sure? Have you measured it? Frankly, I doubt that, even if you _do_ find a highly system-specific measure to perhaps shave a milli-cycle or so off your addition, the effect will be very noticable. If your program spends a lot of time adding 0 or 1 to an integer, it is highly probable that either you have an inefficient (probably multiple) loop in which this is merely the inner statement, or your problem is just that time-consuming. Richard Jul 23 '05 #25

### This discussion thread is closed

Replies have been disabled for this discussion.