473,811 Members | 3,256 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Faster for() loops?

Neo
Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7 ,8,9 If you don't care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2 ,1,0, and the loop should be faster. This
works because it is quicker to process "i--" as the test condition, which
says "is i non-zero? If so, decrement it and continue.". For the original
code, the processor has to calculate "subtract i from 10. Is the result
non-zero? if so, increment i and continue.". In tight loops, this make a
considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???

Thanks,
-Neo
"Do U really think, what U think real is really real?"
Nov 15 '05 #1
109 4289
Hi,

Neo wrote:
[...] In tight loops, this make a
considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???


There is nothing like an experiment to test a theory. I just tried with
AVRGCC

void countDown(void) {
int i;
for(i=10; i!=0; i--) doSomething();
}
void countUp(void){
int i;
for(i=0;i<10;i+ +) doSomething();
}

The generated code is

000000ce <countDown>:
}

void countDown(void) {
ce: cf 93 push r28
d0: df 93 push r29
int i;
for(i=10; i!=0; i--) doSomething();
d2: ca e0 ldi r28, 0x0A ; 10
d4: d0 e0 ldi r29, 0x00 ; 0
d6: 0e 94 5d 00 call 0xba
da: 21 97 sbiw r28, 0x01 ; 1
dc: e1 f7 brne .-8 ; 0xd6
de: df 91 pop r29
e0: cf 91 pop r28
e2: 08 95 ret

000000e4 <countUp>:
}
void countUp(void){
e4: cf 93 push r28
e6: df 93 push r29
e8: c9 e0 ldi r28, 0x09 ; 9
ea: d0 e0 ldi r29, 0x00 ; 0
int i;
for(i=0;i<10;i+ +) doSomething();
ec: 0e 94 5d 00 call 0xba
f0: 21 97 sbiw r28, 0x01 ; 1
f2: d7 ff sbrs r29, 7
f4: fb cf rjmp .-10 ; 0xec
f6: df 91 pop r29
f8: cf 91 pop r28
fa: 08 95 ret

Counting down instead of up saves one whole instruction. It could make a
difference I suppose.

However, the compiler cannot optimise as well if anything in the loop
depends on the value of 'i'.
void countDown(void) {
int i;
for(i=10; i!=0; i--) doSomething(i);
}
void countUp(void){
int i;
for(i=0;i<10;i+ +) doSomething(i);
}

Becomes

void countDown(void) {
ce: cf 93 push r28
d0: df 93 push r29
int i;
for(i=10; i!=0; i--) doSomething(i);
d2: ca e0 ldi r28, 0x0A ; 10
d4: d0 e0 ldi r29, 0x00 ; 0
d6: ce 01 movw r24, r28
d8: 0e 94 5d 00 call 0xba
dc: 21 97 sbiw r28, 0x01 ; 1
de: d9 f7 brne .-10 ; 0xd6
e0: df 91 pop r29
e2: cf 91 pop r28
e4: 08 95 ret

000000e6 <countUp>:
}
void countUp(void){
e6: cf 93 push r28
e8: df 93 push r29
int i;
for(i=0;i<10;i+ +) doSomething(i);
ea: c0 e0 ldi r28, 0x00 ; 0
ec: d0 e0 ldi r29, 0x00 ; 0
ee: ce 01 movw r24, r28
f0: 0e 94 5d 00 call 0xba
f4: 21 96 adiw r28, 0x01 ; 1
f6: ca 30 cpi r28, 0x0A ; 10
f8: d1 05 cpc r29, r1
fa: cc f3 brlt .-14 ; 0xee
fc: df 91 pop r29
fe: cf 91 pop r28
100: 08 95 ret

This time there are a whole 2 extra instructions. I don't think this is
such a big deal. Unrolling the loop would give a better result.

cheers,

Al
Nov 15 '05 #2
Neo wrote:
Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7 ,8,9 If you don't care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2 ,1,0, and the loop should be faster.
This works because it is quicker to process "i--" as the test condition,
which says "is i non-zero? If so, decrement it and continue.". For the
original code, the processor has to calculate "subtract i from 10. Is the
result non-zero? if so, increment i and continue.". In tight loops, this
make a considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???


Many micros have a decrement jmp if zero (or non zero) machine instruction
so a decent optimising compiler should know this and use it in count down
to zero loops. Counting up often needs a compare followed by a jmp zero (or
non zero) which will be a tad slower.

Ian

Nov 15 '05 #3
"Neo" <ti************ *********@yahoo .com> wrote in message
news:43******@n ews.microsoft.c om...
Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7 ,8,9 If you don't care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2 ,1,0, and the loop should be faster.
This works because it is quicker to process "i--" as the test condition,
which says "is i non-zero? If so, decrement it and continue.". For the
original code, the processor has to calculate "subtract i from 10. Is the
result non-zero? if so, increment i and continue.". In tight loops, this
make a considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???

Thanks,
-Neo
"Do U really think, what U think real is really real?"


The answer is "implementa tion-dependent".

A major advantage of writing in C is that you can, if you choose, write
understandable, maintainable code. This kind of hand-optimisation has the
opposite effect. If you really need to care about exactly how many
instruction cycle a loop takes, code it in assembly language. Otherwise, for
the sake of those that come after you, please write your C readably and
leave the compiler to do the optimisation. These days, most compilers can
optimise almost as well as you can, for most "normal" operations.

Regards,
--
Peter Bushell
http://www.software-integrity.com/
Nov 15 '05 #4
Neo wrote:
Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7 ,8,9 If you don't care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2 ,1,0, and the loop should be faster. This
works because it is quicker to process "i--" as the test condition, which
says "is i non-zero? If so, decrement it and continue.". For the original
code, the processor has to calculate "subtract i from 10. Is the result
non-zero? if so, increment i and continue.". In tight loops, this make a
considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???

Regardless of the performance issue, I'd like to point out that after
for( i=10; i--; ) finishes, i will have the value -1, since the
decrement is performed even if i is zero. This is counterintuitiv e, so
it's worth noting. It also means the following is not equivalent:

for (i = 10; i != 0; --i)

Since here one less decrement is performed. Incidentally, my
compiler/platform generates better code with this version -- it compares
i to -1 in the other, which is no better than comparing it to 10! If you
want to count down, I suggest writing what you mean and separating the
test and decrement parts -- it has the added bonus of making things more
readable. The rest is best left to the compiler.

S.
Nov 15 '05 #5
Neo wrote On 09/25/05 23:41,:
Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7 ,8,9 If you don't care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2 ,1,0, and the loop should be faster. This
works because it is quicker to process "i--" as the test condition, which
says "is i non-zero? If so, decrement it and continue.". For the original
code, the processor has to calculate "subtract i from 10. Is the result
non-zero? if so, increment i and continue.". In tight loops, this make a
considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???

Thanks,
-Neo
"Do U really think, what U think real is really real?"


Unroll it completely.

Nov 15 '05 #6
"Neo" wrote:
Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6, 7,8,9 If you don't care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2 ,1,0, and the loop should be faster.
....
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???


It may or not save a couple of assembly language instructions, (of
course depending on the compiler and processor used,) but I doubt this
"noptimizat ion" will make any noticeable change in the performance of
a program, unless your code consist mainly of empty for() loops.

What impact can a minuscule reduction in the time required to decide
if the loop has ended or not have, if the body of the loop, for
example, call functions that format a CAN message, deliver it, wait
for a response, retry if there were errors or timeouts, decode the
response, store the values in a serial EEPROM, and based on them start
a few motors, open pneumatic valves, optionally sending an email
message to Katmandu.

That is not an optimization, but a total waste of time. Read the first
example in "Elements of programming style" and learn...

Roberto Waltman

[ Please reply to the group, ]
[ return address is invalid. ]
Nov 15 '05 #7
>
That is not an optimization, but a total waste of time. Read the first
example in "Elements of programming style" and learn...


What if the difference is between fitting into memory and not?


Nov 15 '05 #8
On Mon, 26 Sep 2005 12:11:23 +0530, in comp.lang.c , "Neo"
<ti************ *********@yahoo .com> wrote:
Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states
(that reversing loop order is faster)

The page is talking rot. It *may* be faster. It *may* be slower. The
only way to know is to benchmark your particular implementation in the
specific case you're examining.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???


Benchmark.
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt >

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Nov 15 '05 #9
Depends what you're doing. If you're accessing a large chunk of memory on a system with
cache, you want to go through incrementing addresses to maximize the use of cache.
Decrementing through memory is generally pessimal.

--
#include <standard.discl aimer>
_
Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up
Nov 15 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
4064
by: Kamilche | last post by:
I was looking for a way to speed up detecting invalid characters in my TCP string, and thought of yet another use for the translate function! If you were to 'translate out' the bad characters, and compare string lengths afterwards, you would know whether or not the line contained invalid characters. The new method is more than 10x faster than the standard 'if char in string' test! So - here's the code plus sample timings: ''' Translate...
1
1898
by: Rudy Koento | last post by:
Hi, I've created an index but it's not being used by postgresql when doing a query. But doing an "explain analyze" shows that with index, it's faster. Here's the output: ------------------------ SET enable_seqscan = off; EXPLAIN ANALYZE SELECT S.* FROM sales S, staff ST WHERE S.staff_no=ST.staff_no AND ST.name='Rudy';
34
3452
by: sushant | last post by:
hi all, suppose i have 2 loops one inside the other, like this 1) for(i=0;i<=100;i++) { for(j=0;j<=10;j++) { some code; }
11
1266
by: bill | last post by:
I am trying to figure out if I can use sse to help execute arithmetic operations faster. I have 900 values that must each be scaled with a divide and multiply. This happens repeatedly. Any examples I can be pointed to would be greatly appreciatted. I realize I could do just one multiply (instead of multiply and divide) but I still want to do 900 (or as many as I can) at once. Any ideas would be appreciatted. Bill
2
1586
by: Jeffrey Melloy | last post by:
I have been using tsearch2 for quite a while with a fair amount of success. The other day I was playiing around with a query, and randomly changed a few things. I noticed a 10 times speedup and didn't know why. Both queries return identical results. The idea was to do a proximity search, where one word appears within 10 minutes of the other.
11
1551
by: Farel | last post by:
Which is Faster in Python and Why? jc = {}; m = x = ,,.......] # upwards of 10000 entries def mcountb(): for item in x: b = item; b.sort(); bc = 0 for bitem in b: bc += int(bitem) try: m = jc
5
3544
by: sololoquist | last post by:
#define COUNT_UP #include <stdio.h> #define N 10 int main() { int i; #ifdef COUNT_UP for (i = 0; i < N; i++)
23
13337
by: AndersWang | last post by:
Hi, dose anybody here explain to me why memset would be faster than a simple loop. I doubt about it! In an int array scenario: int array; for(int i=0;i<10;i++) //ten loops
17
1977
by: onkar | last post by:
which one runs faster ?? for(i=0;i<100;i++) for(j=0;j<10;j++) a=0; OR for(j=0;j<10;j++) for(i=0;i<100;i++)
0
9731
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10393
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10405
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10136
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7671
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5697
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4342
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3871
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3020
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.