Hi Folks, http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7 ,8,9 If you don't care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2 ,1,0, and the loop should be faster. This
works because it is quicker to process "i--" as the test condition, which
says "is i non-zero? If so, decrement it and continue.". For the original
code, the processor has to calculate "subtract i from 10. Is the result
non-zero? if so, increment i and continue.". In tight loops, this make a
considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???
Thanks,
-Neo
"Do U really think, what U think real is really real?" 109 4289
Hi,
Neo wrote:
[...] In tight loops, this make a considerable difference. How far it holds true.. in the light of modern optimizing compilers? and will it make a significant difference in case of embedded systems???
There is nothing like an experiment to test a theory. I just tried with
AVRGCC
void countDown(void) {
int i;
for(i=10; i!=0; i--) doSomething();
}
void countUp(void){
int i;
for(i=0;i<10;i+ +) doSomething();
}
The generated code is
000000ce <countDown>:
}
void countDown(void) {
ce: cf 93 push r28
d0: df 93 push r29
int i;
for(i=10; i!=0; i--) doSomething();
d2: ca e0 ldi r28, 0x0A ; 10
d4: d0 e0 ldi r29, 0x00 ; 0
d6: 0e 94 5d 00 call 0xba
da: 21 97 sbiw r28, 0x01 ; 1
dc: e1 f7 brne .-8 ; 0xd6
de: df 91 pop r29
e0: cf 91 pop r28
e2: 08 95 ret
000000e4 <countUp>:
}
void countUp(void){
e4: cf 93 push r28
e6: df 93 push r29
e8: c9 e0 ldi r28, 0x09 ; 9
ea: d0 e0 ldi r29, 0x00 ; 0
int i;
for(i=0;i<10;i+ +) doSomething();
ec: 0e 94 5d 00 call 0xba
f0: 21 97 sbiw r28, 0x01 ; 1
f2: d7 ff sbrs r29, 7
f4: fb cf rjmp .-10 ; 0xec
f6: df 91 pop r29
f8: cf 91 pop r28
fa: 08 95 ret
Counting down instead of up saves one whole instruction. It could make a
difference I suppose.
However, the compiler cannot optimise as well if anything in the loop
depends on the value of 'i'.
void countDown(void) {
int i;
for(i=10; i!=0; i--) doSomething(i);
}
void countUp(void){
int i;
for(i=0;i<10;i+ +) doSomething(i);
}
Becomes
void countDown(void) {
ce: cf 93 push r28
d0: df 93 push r29
int i;
for(i=10; i!=0; i--) doSomething(i);
d2: ca e0 ldi r28, 0x0A ; 10
d4: d0 e0 ldi r29, 0x00 ; 0
d6: ce 01 movw r24, r28
d8: 0e 94 5d 00 call 0xba
dc: 21 97 sbiw r28, 0x01 ; 1
de: d9 f7 brne .-10 ; 0xd6
e0: df 91 pop r29
e2: cf 91 pop r28
e4: 08 95 ret
000000e6 <countUp>:
}
void countUp(void){
e6: cf 93 push r28
e8: df 93 push r29
int i;
for(i=0;i<10;i+ +) doSomething(i);
ea: c0 e0 ldi r28, 0x00 ; 0
ec: d0 e0 ldi r29, 0x00 ; 0
ee: ce 01 movw r24, r28
f0: 0e 94 5d 00 call 0xba
f4: 21 96 adiw r28, 0x01 ; 1
f6: ca 30 cpi r28, 0x0A ; 10
f8: d1 05 cpc r29, r1
fa: cc f3 brlt .-14 ; 0xee
fc: df 91 pop r29
fe: cf 91 pop r28
100: 08 95 ret
This time there are a whole 2 extra instructions. I don't think this is
such a big deal. Unrolling the loop would give a better result.
cheers,
Al
Neo wrote: Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page states:for( i=0; i<10; i++){ ... }i loops through the values 0,1,2,3,4,5,6,7 ,8,9 If you don't care about the order of the loop counter, you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops through the values 9,8,7,6,5,4,3,2 ,1,0, and the loop should be faster. This works because it is quicker to process "i--" as the test condition, which says "is i non-zero? If so, decrement it and continue.". For the original code, the processor has to calculate "subtract i from 10. Is the result non-zero? if so, increment i and continue.". In tight loops, this make a considerable difference. How far it holds true.. in the light of modern optimizing compilers? and will it make a significant difference in case of embedded systems???
Many micros have a decrement jmp if zero (or non zero) machine instruction
so a decent optimising compiler should know this and use it in count down
to zero loops. Counting up often needs a compare followed by a jmp zero (or
non zero) which will be a tad slower.
Ian
"Neo" <ti************ *********@yahoo .com> wrote in message
news:43******@n ews.microsoft.c om... Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page states:for( i=0; i<10; i++){ ... }i loops through the values 0,1,2,3,4,5,6,7 ,8,9 If you don't care about the order of the loop counter, you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops through the values 9,8,7,6,5,4,3,2 ,1,0, and the loop should be faster. This works because it is quicker to process "i--" as the test condition, which says "is i non-zero? If so, decrement it and continue.". For the original code, the processor has to calculate "subtract i from 10. Is the result non-zero? if so, increment i and continue.". In tight loops, this make a considerable difference. How far it holds true.. in the light of modern optimizing compilers? and will it make a significant difference in case of embedded systems???
Thanks, -Neo "Do U really think, what U think real is really real?"
The answer is "implementa tion-dependent".
A major advantage of writing in C is that you can, if you choose, write
understandable, maintainable code. This kind of hand-optimisation has the
opposite effect. If you really need to care about exactly how many
instruction cycle a loop takes, code it in assembly language. Otherwise, for
the sake of those that come after you, please write your C readably and
leave the compiler to do the optimisation. These days, most compilers can
optimise almost as well as you can, for most "normal" operations.
Regards,
--
Peter Bushell http://www.software-integrity.com/
Neo wrote: Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page states:for( i=0; i<10; i++){ ... }i loops through the values 0,1,2,3,4,5,6,7 ,8,9 If you don't care about the order of the loop counter, you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops through the values 9,8,7,6,5,4,3,2 ,1,0, and the loop should be faster. This works because it is quicker to process "i--" as the test condition, which says "is i non-zero? If so, decrement it and continue.". For the original code, the processor has to calculate "subtract i from 10. Is the result non-zero? if so, increment i and continue.". In tight loops, this make a considerable difference. How far it holds true.. in the light of modern optimizing compilers? and will it make a significant difference in case of embedded systems???
Regardless of the performance issue, I'd like to point out that after
for( i=10; i--; ) finishes, i will have the value -1, since the
decrement is performed even if i is zero. This is counterintuitiv e, so
it's worth noting. It also means the following is not equivalent:
for (i = 10; i != 0; --i)
Since here one less decrement is performed. Incidentally, my
compiler/platform generates better code with this version -- it compares
i to -1 in the other, which is no better than comparing it to 10! If you
want to count down, I suggest writing what you mean and separating the
test and decrement parts -- it has the added bonus of making things more
readable. The rest is best left to the compiler.
S.
Neo wrote On 09/25/05 23:41,: Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page states:for( i=0; i<10; i++){ ... }i loops through the values 0,1,2,3,4,5,6,7 ,8,9 If you don't care about the order of the loop counter, you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops through the values 9,8,7,6,5,4,3,2 ,1,0, and the loop should be faster. This works because it is quicker to process "i--" as the test condition, which says "is i non-zero? If so, decrement it and continue.". For the original code, the processor has to calculate "subtract i from 10. Is the result non-zero? if so, increment i and continue.". In tight loops, this make a considerable difference. How far it holds true.. in the light of modern optimizing compilers? and will it make a significant difference in case of embedded systems???
Thanks, -Neo "Do U really think, what U think real is really real?"
Unroll it completely.
"Neo" wrote: Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page states:for( i=0; i<10; i++){ ... }i loops through the values 0,1,2,3,4,5,6, 7,8,9 If you don't care about the order of the loop counter, you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops through the values 9,8,7,6,5,4,3,2 ,1,0, and the loop should be faster. .... How far it holds true.. in the light of modern optimizing compilers? and will it make a significant difference in case of embedded systems???
It may or not save a couple of assembly language instructions, (of
course depending on the compiler and processor used,) but I doubt this
"noptimizat ion" will make any noticeable change in the performance of
a program, unless your code consist mainly of empty for() loops.
What impact can a minuscule reduction in the time required to decide
if the loop has ended or not have, if the body of the loop, for
example, call functions that format a CAN message, deliver it, wait
for a response, retry if there were errors or timeouts, decode the
response, store the values in a serial EEPROM, and based on them start
a few motors, open pneumatic valves, optionally sending an email
message to Katmandu.
That is not an optimization, but a total waste of time. Read the first
example in "Elements of programming style" and learn...
Roberto Waltman
[ Please reply to the group, ]
[ return address is invalid. ]
> That is not an optimization, but a total waste of time. Read the first example in "Elements of programming style" and learn...
What if the difference is between fitting into memory and not?
On Mon, 26 Sep 2005 12:11:23 +0530, in comp.lang.c , "Neo"
<ti************ *********@yahoo .com> wrote: Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page states
(that reversing loop order is faster)
The page is talking rot. It *may* be faster. It *may* be slower. The
only way to know is to benchmark your particular implementation in the
specific case you're examining.
How far it holds true.. in the light of modern optimizing compilers? and will it make a significant difference in case of embedded systems???
Benchmark.
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt >
----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Depends what you're doing. If you're accessing a large chunk of memory on a system with
cache, you want to go through incrementing addresses to maximize the use of cache.
Decrementing through memory is generally pessimal.
--
#include <standard.discl aimer>
_
Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Kamilche |
last post by:
I was looking for a way to speed up detecting invalid characters in my
TCP string, and thought of yet another use for the translate function!
If you were to 'translate out' the bad characters, and compare string
lengths afterwards, you would know whether or not the line contained
invalid characters. The new method is more than 10x faster than the
standard 'if char in string' test! So - here's the code plus sample
timings:
'''
Translate...
|
by: Rudy Koento |
last post by:
Hi, I've created an index but it's not being used by
postgresql when doing a query. But doing an "explain
analyze" shows that with index, it's faster. Here's
the output:
------------------------
SET enable_seqscan = off;
EXPLAIN ANALYZE SELECT S.* FROM sales S, staff ST
WHERE S.staff_no=ST.staff_no AND ST.name='Rudy';
|
by: sushant |
last post by:
hi all,
suppose i have 2 loops one inside the other, like this
1) for(i=0;i<=100;i++)
{
for(j=0;j<=10;j++)
{
some code;
}
|
by: bill |
last post by:
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a divide
and multiply. This happens repeatedly. Any examples I can be pointed to
would be greatly appreciatted. I realize I could do just one multiply
(instead of multiply and divide) but I still want to do 900 (or as many as I
can) at once.
Any ideas would be appreciatted.
Bill
|
by: Jeffrey Melloy |
last post by:
I have been using tsearch2 for quite a while with a fair amount of
success.
The other day I was playiing around with a query, and randomly changed
a few things. I noticed a 10 times speedup and didn't know why. Both
queries return identical results.
The idea was to do a proximity search, where one word appears within 10
minutes of the other.
| |
by: Farel |
last post by:
Which is Faster in Python and Why?
jc = {}; m =
x = ,,.......] # upwards of 10000 entries
def mcountb():
for item in x:
b = item; b.sort(); bc = 0
for bitem in b: bc += int(bitem)
try: m = jc
|
by: sololoquist |
last post by:
#define COUNT_UP
#include <stdio.h>
#define N 10
int main()
{
int i;
#ifdef COUNT_UP
for (i = 0; i < N; i++)
|
by: AndersWang |
last post by:
Hi,
dose anybody here explain to me why memset would be faster than a
simple loop. I doubt about it!
In an int array scenario:
int array;
for(int i=0;i<10;i++) //ten loops
|
by: onkar |
last post by:
which one runs faster ??
for(i=0;i<100;i++)
for(j=0;j<10;j++)
a=0;
OR
for(j=0;j<10;j++)
for(i=0;i<100;i++)
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |