I have an array, and an unrolled loop which looks like this:
do_something(A[0]);
do_something(A[1]);
....
do_something(A[255]);
I thought: why should I type so much? I should write a macro.
So I was looking to write something along the lines of:
#define write_256(array) #for(i,0,255,do_something(foo[i]);
But I coudn't find a way to do it...
Is there a way to write a macro that the C pre-processor will expand
to k instructions, and be able to reference the iteration number?
I thought of using recursive macro calls along the lines of:
#define write_more(n,array) \
#if (n>0) do_something(foo[256-n]); \
write_more(n-1,array)
but it seems implementers don't like recursive macro calls :-)
Any insight? 24 40006
Nudge <de*****@kma.eu.org> scribbled the following: I have an array, and an unrolled loop which looks like this:
do_something(A[0]); do_something(A[1]); ... do_something(A[255]);
I thought: why should I type so much? I should write a macro.
I think: Why are you bothering with this? Re-roll your loop into a
genuine for loop, set your compiler's optimisation to "pretty darned
high", and it might unroll the loop for you when generating code.
Remember the rules about optimisation at source code level:
(1) Don't do it.
(2) (For experts only!) Don't do it yet.
--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ---------------------------\
| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste W++ B OP+ |
\----------------------------------------- Finland rules! ------------/
Joona I Palaste wrote: Nudge <de*****@kma.eu.org> scribbled the following:
I have an array, and an unrolled loop which looks like this:
do_something(A[0]); do_something(A[1]); ... do_something(A[255]);
I thought: why should I type so much? I should write a macro.
I think: Why are you bothering with this? Re-roll your loop into a genuine for loop, set your compiler's optimisation to "pretty darned high", and it might unroll the loop for you when generating code. Remember the rules about optimisation at source code level: (1) Don't do it. (2) (For experts only!) Don't do it yet.
ARGH!
I knew I should have said that I was smarter than my compiler :-)
I am not writing any serious code, only fooling around.
Is there a way to beat the preprocessor into submission?
In 'comp.lang.c', Nudge <de*****@kma.eu.org> wrote: I have an array, and an unrolled loop which looks like this:
do_something(A[0]); do_something(A[1]); ... do_something(A[255]);
I thought: why should I type so much? I should write a macro.
So I was looking to write something along the lines of:
#define write_256(array) #for(i,0,255,do_something(foo[i]);
But I coudn't find a way to do it...
There is no portable way do do this directly, but there is a trick anyway (I
got it from c.l.c, BTW):
Define the list of constant values:
/* values.itm */
ITEM(0)
ITEM(1)
ITEM(2)
ITEM(3)
....
ITEM(254)
ITEM(255)
This file can be automatically created with a trivial C program.
Now, in your application, you do the following:
/* myapp.c */
<...>
#define ITEM(a) \
do_something(A[a]);
#include "values.itm"
#undef ITEM
That's all folks.
I use massively this trick to define repetitive code. It's very powerful and
makes solid code when mastered.
--
-ed- em**********@noos.fr [remove YOURBRA before answering me]
The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
<blank line>
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/
"Nudge" <de*****@kma.eu.org> wrote in message
news:3f***********************@news.free.fr... Joona I Palaste wrote:
Nudge <de*****@kma.eu.org> scribbled the following:
I have an array, and an unrolled loop which looks like this:
do_something(A[0]); do_something(A[1]); ... do_something(A[255]);
I thought: why should I type so much? I should write a macro.
I think: Why are you bothering with this? Re-roll your loop into a genuine for loop, set your compiler's optimisation to "pretty darned high", and it might unroll the loop for you when generating code. Remember the rules about optimisation at source code level: (1) Don't do it. (2) (For experts only!) Don't do it yet.
ARGH!
I knew I should have said that I was smarter than my compiler :-)
I am not writing any serious code, only fooling around.
Is there a way to beat the preprocessor into submission?
If you absolutely insist...
#define X0(n) do_something(A[n]);
#define X1(n) X0(n) X0(n+1)
#define X2(n) X1(n) X1(n+2)
#define X3(n) X2(n) X2(n+4)
#define X4(n) X3(n) X3(n+8)
#define X5(n) X4(n) X4(n+16)
#define X6(n) X5(n) X5(n+32)
#define X7(n) X6(n) X6(n+64)
#define X8(n) X7(n) X7(n+128)
X8(0)
If you ask me, Joona's suggestion is considerably more reasonable.
--
poncho
I've never done this kind of stuff, but isn't there the option -funroll-loop
(gcc)?
I've seen lot of software use this option? Joona also suggested something
similar I think?
--
Pushkar Pradhan
"Emmanuel Delahaye" <em**********@noos.fr> wrote in message
news:Xn***************************@130.133.1.4... In 'comp.lang.c', Nudge <de*****@kma.eu.org> wrote:
I have an array, and an unrolled loop which looks like this:
do_something(A[0]); do_something(A[1]); ... do_something(A[255]);
I thought: why should I type so much? I should write a macro.
So I was looking to write something along the lines of:
#define write_256(array) #for(i,0,255,do_something(foo[i]);
But I coudn't find a way to do it... There is no portable way do do this directly, but there is a trick anyway
(I got it from c.l.c, BTW):
Define the list of constant values:
/* values.itm */ ITEM(0) ITEM(1) ITEM(2) ITEM(3) ... ITEM(254) ITEM(255)
This file can be automatically created with a trivial C program.
Now, in your application, you do the following:
/* myapp.c */ <...>
#define ITEM(a) \ do_something(A[a]);
#include "values.itm" #undef ITEM
That's all folks.
I use massively this trick to define repetitive code. It's very powerful
and makes solid code when mastered.
-- -ed- em**********@noos.fr [remove YOURBRA before answering me] The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html <blank line> FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/
Pushker Pradhan wrote: I've never done this kind of stuff, but isn't there the option -funroll-loop (gcc)? I've seen lot of software use this option? Joona also suggested something similar I think?
In gcc, -funroll-loop causes loops to be unrolled into, well, non-loops.
Example:
int counter = 0;
int i;
for(i = 0; i < 5; i++)
counter++;
Becomes:
int counter = 0;
int i;
counter++;
counter++;
counter++;
counter++;
counter++;
i = 5;
Of course, this is all done in assembler, so the result's probably different
somewhat. But that's the idea behind unrolling loops - to eliminate the
looping overhead.
--
Feminists just want the human race to be a tie.
In 'comp.lang.c', "Pushker Pradhan" <pu*****@erc.msstate.edu> wrote: I've never done this kind of stuff, but isn't there the option -funroll-loop (gcc)? I've seen lot of software use this option? Joona also suggested something similar I think?
If your compiler has the option, it's fine, but what if it has not? You have
not always the choice of your compiler. It can belong to a customer's
production process where nothing is allowed to change, or belong to some
peculiar platform... who knows... Better to stay portable. Always (when
possible).
--
-ed- em**********@noos.fr [remove YOURBRA before answering me]
The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
<blank line>
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/
In 'comp.lang.c', "Glen Herrmannsfeldt" <ga*@ugcs.caltech.edu> wrote: The PL/I preprocessor can do this.
%DO I=0 TO 255; do_something(A[i]); %END %DEACTIVATE I
(The last is the equivalent of #undef, except that you can %ACTIVATE it and get the previous value back again. The % indicates preprocessor statements.)
I have wondered about the decrease in preprocessor power as computers get faster and computer memory gets larger. Consider the progression from PL/I to C to Java in terms of preprocessor power.
I recall that the Intel macro assembler for 8051 I used in the 90's was very
powerfull, and did support iterative constructs like that.
--
-ed- em**********@noos.fr [remove YOURBRA before answering me]
The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
<blank line>
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/
"Emmanuel Delahaye" <em**********@noos.fr> wrote in message
news:Xn***************************@130.133.1.4... In 'comp.lang.c', "Glen Herrmannsfeldt" <ga*@ugcs.caltech.edu> wrote:
The PL/I preprocessor can do this.
%DO I=0 TO 255; do_something(A[i]); %END %DEACTIVATE I
(The last is the equivalent of #undef, except that you can %ACTIVATE it
and get the previous value back again. The % indicates preprocessor statements.)
I have wondered about the decrease in preprocessor power as computers
get faster and computer memory gets larger. Consider the progression from
PL/I to C to Java in terms of preprocessor power. I recall that the Intel macro assembler for 8051 I used in the 90's was
very powerfull, and did support iterative constructs like that.
The OS/360 assemblers supported a variety of constructs, including if and
goto. The while loop could be, and often was, written using macros. I used
to know people using the OS/360 compilers to compile for most microcomputers
by writing a macro for each possible opcode, and then post-processing the
output files.
I never used the 8051, though.
-- glen
Scott Fluhrer wrote: "Nudge" <de*****@kma.eu.org> wrote in message news:3f***********************@news.free.fr...
Joona I Palaste wrote:
Nudge <de*****@kma.eu.org> scribbled the following:
I have an array, and an unrolled loop which looks like this:
do_something(A[0]); do_something(A[1]); ... do_something(A[255]);
I thought: why should I type so much? I should write a macro.
I think: Why are you bothering with this? Re-roll your loop into a genuine for loop, set your compiler's optimisation to "pretty darned high", and it might unroll the loop for you when generating code. Remember the rules about optimisation at source code level: (1) Don't do it. (2) (For experts only!) Don't do it yet.
ARGH!
I knew I should have said that I was smarter than my compiler :-)
I am not writing any serious code, only fooling around.
Is there a way to beat the preprocessor into submission?
If you absolutely insist...
#define X0(n) do_something(A[n]); #define X1(n) X0(n) X0(n+1) #define X2(n) X1(n) X1(n+2) #define X3(n) X2(n) X2(n+4) #define X4(n) X3(n) X3(n+8) #define X5(n) X4(n) X4(n+16) #define X6(n) X5(n) X5(n+32) #define X7(n) X6(n) X6(n+64) #define X8(n) X7(n) X7(n+128)
X8(0)
If you ask me, Joona's suggestion is considerably more reasonable.
What about SHA-256's inner loop, where unrolling 8 times and
symbolically renaming the 8 variables for every iteration allows one
to write only 4 assignments?
I don't think there are many optimizing compilers out there which
are THAT smart...
Anyway, thanks for the log2(n) solution. I find it fairly elegant.
"Nudge" <de*****@kma.eu.org> wrote in message
news:3f***********************@news.free.fr... Scott Fluhrer wrote:
"Nudge" <de*****@kma.eu.org> wrote in message news:3f***********************@news.free.fr...
Joona I Palaste wrote:
Nudge <de*****@kma.eu.org> scribbled the following:
>I have an array, and an unrolled loop which looks like this:
>do_something(A[0]); >do_something(A[1]); >... >do_something(A[255]);
>I thought: why should I type so much? I should write a macro.
I think: Why are you bothering with this? Re-roll your loop into a genuine for loop, set your compiler's optimisation to "pretty darned high", and it might unroll the loop for you when generating code. Remember the rules about optimisation at source code level: (1) Don't do it. (2) (For experts only!) Don't do it yet.
ARGH!
I knew I should have said that I was smarter than my compiler :-)
I am not writing any serious code, only fooling around.
Is there a way to beat the preprocessor into submission?
If you absolutely insist...
#define X0(n) do_something(A[n]); #define X1(n) X0(n) X0(n+1) #define X2(n) X1(n) X1(n+2) #define X3(n) X2(n) X2(n+4) #define X4(n) X3(n) X3(n+8) #define X5(n) X4(n) X4(n+16) #define X6(n) X5(n) X5(n+32) #define X7(n) X6(n) X6(n+64) #define X8(n) X7(n) X7(n+128)
X8(0)
If you ask me, Joona's suggestion is considerably more reasonable.
What about SHA-256's inner loop, where unrolling 8 times and symbolically renaming the 8 variables for every iteration allows one to write only 4 assignments?
I don't think there are many optimizing compilers out there which are THAT smart...
Anyway, thanks for the log2(n) solution. I find it fairly elegant.
Actually, elegant it ain't. Some obvious problems:
- A rather annoying amount of changes are required if you change the 256 to,
say, 300
- It's fairly easy to have a typo somewhere in all the #define's. It'll
compile just fine, but perhaps miss do_something(A[153]);
- It is certainly less readable than an explicit for loop
In my opinion, just stick with the explicit for loop
--
poncho
>>What about SHA-256's inner loop, where unrolling 8 times and symbolically renaming the 8 variables for every iteration allows one to write only 4 assignments?
I don't think there are many optimizing compilers out there which are THAT smart...
Anyway, thanks for the log2(n) solution. I find it fairly elegant.
Actually, elegant it ain't. Some obvious problems:
- A rather annoying amount of changes are required if you change the 256 to, say, 300 - It's fairly easy to have a typo somewhere in all the #define's. It'll compile just fine, but perhaps miss do_something(A[153]); - It is certainly less readable than an explicit for loop
In my opinion, just stick with the explicit for loop
Forgive me if I insist :-)
What about SHA-256's inner loop? (You're a cryptographer, you know
what I'm talking about.)
With no unrolling, one must turn a-h into an array (which puts
additional pressure on the compiler) and use the iteration number as
a shifting index, modulo 8.
With 8 bodies of the loop, the modulo operations can go away.
I doubt whether even a few optimizing compilers are smart enough to
do that (I will test with gcc).
>> Anyway, thanks for the log2(n) solution. I find it fairly elegant.
Actually, elegant it ain't. Some obvious problems:
- A rather annoying amount of changes are required if you change the 256 to, say, 300
- It's fairly easy to have a typo somewhere in all the #define's. It'll compile just fine, but perhaps miss do_something(A[153]);
- It is certainly less readable than an explicit for loop
In my opinion, just stick with the explicit for loop
Forgive me if I insist.
What about SHA-256's inner loop? (You're a cryptographer, you know
what I'm talking about.)
I want to write it like this:
T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
T2 = SIGMA0(a) + Maj(a,b,c);
d += T1;
h = T1 + T2;
In the next iteration, I perform symbolic renaming, i.e.
h is now a
a is now b
b is now c
c is now d
d is now e
e is now f
f is now g
g is now h
If I unroll 8 times, symbolic renaming comes for free.
"Nudge" <de*****@kma.eu.org> wrote in message
news:3f***********************@news.free.fr... Anyway, thanks for the log2(n) solution. I find it fairly elegant. Actually, elegant it ain't. Some obvious problems:
- A rather annoying amount of changes are required if you change the 256 to, say, 300
- It's fairly easy to have a typo somewhere in all the #define's. It'll compile just fine, but perhaps miss do_something(A[153]);
- It is certainly less readable than an explicit for loop
In my opinion, just stick with the explicit for loop
Forgive me if I insist.
What about SHA-256's inner loop? (You're a cryptographer, you know what I'm talking about.)
If you're asking whether unrolling a loop is ever justified, well, on
occasion it is. You just have to remember the costs:
- It can be less obvious what's going on, making maintenance harder.
- If you unroll too much, the loop might not fit in cache. This leads to
slower performance. I want to write it like this:
T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h; T2 = SIGMA0(a) + Maj(a,b,c); d += T1; h = T1 + T2;
In the next iteration, I perform symbolic renaming, i.e. h is now a a is now b b is now c c is now d d is now e e is now f f is now g g is now h
If I unroll 8 times, symbolic renaming comes for free.
On the other hand, it's nontrivial to use my hack to do the renaming for
you. I'd suggest that if you unroll it 8 times, you explicitly list the 8
invocations.
--
poncho
Scott Fluhrer wrote: "Nudge" <de*****@kma.eu.org> wrote in message news:3f***********************@news.free.fr...> Anyway, thanks for the log2(n) solution. I find it fairly > elegant.
Actually, elegant it ain't. Some obvious problems:
- A rather annoying amount of changes are required if you change the 256 to, say, 300
- It's fairly easy to have a typo somewhere in all the #define's. It'll compile just fine, but perhaps miss do_something(A[153]);
- It is certainly less readable than an explicit for loop
In my opinion, just stick with the explicit for loop
Forgive me if I insist.
What about SHA-256's inner loop? (You're a cryptographer, you know what I'm talking about.)
If you're asking whether unrolling a loop is ever justified, well, on occasion it is. You just have to remember the costs:
- It can be less obvious what's going on, making maintenance harder. - If you unroll too much, the loop might not fit in cache. This leads to slower performance.
I want to write it like this:
T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h; T2 = SIGMA0(a) + Maj(a,b,c); d += T1; h = T1 + T2;
In the next iteration, I perform symbolic renaming, i.e. h is now a a is now b b is now c c is now d d is now e e is now f f is now g g is now h
If I unroll 8 times, symbolic renaming comes for free.
On the other hand, it's nontrivial to use my hack to do the renaming for you. I'd suggest that if you unroll it 8 times, you explicitly list the 8 invocations.
For the case in point, you could write something like
#define STEP(a,b,c,d,e,f,g,h) \
T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h; \
T2 = SIGMA0(a) + Maj(a,b,c); \
d += T1; \
h = T1 + T2
...
while (whatever) {
STEP(a,b,c,d,e,f,g,h);
STEP(b,c,d,e,f,g,h,a);
...
STEP(h,a,b,c,d,e,f,g);
}
(Hmmm: If speed is the objective, couldn't the `W[t] + K[t]'
piece be replaced by a precomputed `WplusK[t]'?)
-- Er*********@sun.com
>> What about SHA-256's inner loop? (You're a cryptographer, you know what I'm talking about.)
If you're asking whether unrolling a loop is ever justified, well, on occasion it is. You just have to remember the costs:
- It can be less obvious what's going on, making maintenance harder. - If you unroll too much, the loop might not fit in cache. This leads to slower performance.
Point taken. I want to write it like this:
T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h; T2 = SIGMA0(a) + Maj(a,b,c); d += T1; h = T1 + T2;
In the next iteration, I perform symbolic renaming, i.e. h is now a a is now b b is now c c is now d d is now e e is now f f is now g g is now h
If I unroll 8 times, symbolic renaming comes for free.
On the other hand, it's nontrivial to use my hack to do the renaming for you. I'd suggest that if you unroll it 8 times, you explicitly list the 8 invocations.
Well, if I change a-h to V[8] then
a is V[(64-t)%8]
b is V[(65-t)%8]
...
h is V[(71-t)%8]
All the indices can be computed at compile time. A good compiler
should be able to scalarize my array back to a-h.
Yes, 8 is not too bad, but a complete unroll requires 64
(approximately 6 KB of code, it will fit inside the L1 I$ of my
Athlon with room to spare).
> For the case in point, you could write something like #define STEP(a,b,c,d,e,f,g,h) \ T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h; \ T2 = SIGMA0(a) + Maj(a,b,c); \ d += T1; \ h = T1 + T2 ... while (whatever) { STEP(a,b,c,d,e,f,g,h); STEP(b,c,d,e,f,g,h,a); ... STEP(h,a,b,c,d,e,f,g); }
(Hmmm: If speed is the objective, couldn't the `W[t] + K[t]' piece be replaced by a precomputed `WplusK[t]'?)
This is precisely what I am doing. Thanks for the hint :-)
"Nudge" <de*****@kma.eu.org> wrote in message
news:3f***********************@news.free.fr... What about SHA-256's inner loop? (You're a cryptographer, you know what I'm talking about.)
If you're asking whether unrolling a loop is ever justified, well, on occasion it is. You just have to remember the costs:
- It can be less obvious what's going on, making maintenance harder. - If you unroll too much, the loop might not fit in cache. This leads
to slower performance.
Point taken. I want to write it like this:
T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h; T2 = SIGMA0(a) + Maj(a,b,c); d += T1; h = T1 + T2;
In the next iteration, I perform symbolic renaming, i.e. h is now a a is now b b is now c c is now d d is now e e is now f f is now g g is now h
If I unroll 8 times, symbolic renaming comes for free.
On the other hand, it's nontrivial to use my hack to do the renaming for you. I'd suggest that if you unroll it 8 times, you explicitly list the
8 invocations.
Well, if I change a-h to V[8] then a is V[(64-t)%8] b is V[(65-t)%8] ... h is V[(71-t)%8]
All the indices can be computed at compile time. A good compiler should be able to scalarize my array back to a-h.
Yes, 8 is not too bad, but a complete unroll requires 64 (approximately 6 KB of code, it will fit inside the L1 I$ of my Athlon with room to spare).
Yes, it'll fit, but it'll push most everything else out. That means that
once you finish the SHA-256, you'll start generating lots of cache misses
:-(
--
poncho
>> Yes, 8 is not too bad, but a complete unroll requires 64 (approximately 6 KB of code, it will fit inside the L1 I$ of my Athlon with room to spare).
Yes, it'll fit, but it'll push most everything else out. That means that once you finish the SHA-256, you'll start generating lots of cache misses :-(
Hi Scott,
The entire routine weighs approximately 8 KB (75% from the unrolled
inner loop).
What do you mean "once [i] finish the SHA-256"? When the OS switches
the context to a different process? I thought the cache was flushed
anyway on a context switch...
P.S. The Athlon, unlike the P4, has large L1 caches:
64 KB L1 I$
64 KB L1 D$
256 KB L2 I+D$ (512 KB for Barton)
Nudge <de*****@kma.eu.org> wrote in message news:<3f***********************@news.free.fr>... Scott Fluhrer wrote: If you absolutely insist...
#define X0(n) do_something(A[n]); #define X1(n) X0(n) X0(n+1) #define X2(n) X1(n) X1(n+2) #define X3(n) X2(n) X2(n+4) #define X4(n) X3(n) X3(n+8) #define X5(n) X4(n) X4(n+16) #define X6(n) X5(n) X5(n+32) #define X7(n) X6(n) X6(n+64) #define X8(n) X7(n) X7(n+128)
X8(0)
If you ask me, Joona's suggestion is considerably more reasonable. What about SHA-256's inner loop, where unrolling 8 times and symbolically renaming the 8 variables for every iteration allows one to write only 4 assignments?
Another important case is, what if do_something() is defined as follows:
#define do_something(x) do_something_else ((x), #x, __LINE__)
I don't think there are many optimizing compilers out there which are THAT smart...
Anyway, thanks for the log2(n) solution. I find it fairly elegant.
Yeah, but log4(n) is even better, if you are really just trying to save typing,
but its a little annoying to do odd numbers.
--
Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/
"Scott Fluhrer" <sf******@ix.netcom.com> wrote: If you're asking whether unrolling a loop is ever justified, well, on occasion it is. You just have to remember the costs:
- It can be less obvious what's going on, making maintenance harder.
The same can be said of any macro usage.
- If you unroll too much, the loop might not fit in cache. This leads to slower performance.
Humans themselves generally don't unroll to a point where the I-cache
is affected. This is usually only an issue if the *compiler* unrolls
everything, since it might unroll with wild abandon.
--
Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/
"Glen Herrmannsfeldt" <ga*@ugcs.caltech.edu> writes:
[...] I have wondered about the decrease in preprocessor power as computers get faster and computer memory gets larger. Consider the progression from PL/I to C to Java in terms of preprocessor power.
I once heard about a language with a generics facility (closer to C++
templates than to macros) that could do the Towers of Hanoi problem at
compile time.
There is, of course, no rule against using preprocessors for other than the intended language.
You can run into some nasty problems with tokenization. Try using a C
preprocessor on a language that has a standalone apostrophe token.
Getting back to C, I tend to think that C's preprocessor is powerful
enough, perhaps too powerful. Or maybe it's just not integrated into
the language cleanly enough.
As for the original question (using the preprocessor to manually
unroll a loop), I don't believe there's any clean way to do that.
Some of the suggested solutions work reasonably well for powers of 2,
but not for arbitrary numbers.
If you really want to to do this kind of thing, you might consider
writing your own program that generates C source code. Personally,
I'd probably use Perl for the job, but that's just me. If you're on a
Unix-like system (more precisely, if you don't care about portability
to non-Unix-like systems), you might also look into the m4 macro
processor.
--
Keith Thompson (The_Other_Keith) ks*@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
"Keith Thompson" <ks*@cts.com> wrote in message
news:lz************@cts.com... "Glen Herrmannsfeldt" <ga*@ugcs.caltech.edu> writes: [...] I have wondered about the decrease in preprocessor power as computers
get faster and computer memory gets larger. Consider the progression from
PL/I to C to Java in terms of preprocessor power. I once heard about a language with a generics facility (closer to C++ templates than to macros) that could do the Towers of Hanoi problem at compile time.
There is, of course, no rule against using preprocessors for other than
the intended language.
You can run into some nasty problems with tokenization. Try using a C preprocessor on a language that has a standalone apostrophe token.
At least some versions of Fortran have one. For direct access files, IBM
Fortran has traditionally used READ(unit'block).
Many years ago I was working with a generic preprocessor called STEP,
written in Fortran, and trying to use it to preprocess Fortran. I had much
trouble with that one.
Getting back to C, I tend to think that C's preprocessor is powerful enough, perhaps too powerful. Or maybe it's just not integrated into the language cleanly enough.
As for the original question (using the preprocessor to manually unroll a loop), I don't believe there's any clean way to do that. Some of the suggested solutions work reasonably well for powers of 2, but not for arbitrary numbers.
The PL/I preprocessor also has compile time procedure calls, among other
features.
Also, compile time %IF, so one can conditionally unroll a loop based on a
compile time constant. Early PL/I compilers were designed to run on small
machines, so machine size can't be the reason for the C preprocessor being
the way it is.
(snip)
-- glen
On Sun, 14 Sep 2003 23:18:27 +0200, Nudge <de*****@kma.eu.org> wrote:
<snip> So I was looking to write something along the lines of:
#define write_256(array) #for(i,0,255,do_something(foo[i]);
But I coudn't find a way to do it...
Is there a way to write a macro that the C pre-processor will expand to k instructions, and be able to reference the iteration number?
Only the (clumsy) ways already given by Fluhrer and Delahaye.
#if outside_of_C /* especially if you need this often or vitally */
You could consider using another macro processor (like m4) or
text-processing utility (like sed, awk, perl), perhaps driven
automatically by your makefile or equivalent.
#endif
#ifdef __cplusplus
You can try (an invocation of) a template inline function that
"iterates" down to a partial specialization that terminates it; but
the C++ standard doesn't require inlines to actually be inlined, and
allows an implementation limit on nested/recursive invocation which is
only recommended to be at least 17.
#endif
Plus of course compiler features like -funroll-loops.
I thought of using recursive macro calls along the lines of:
#define write_more(n,array) \ #if (n>0) do_something(foo[256-n]); \ write_more(n-1,array)
but it seems implementers don't like recursive macro calls :-)
Not just implementers; the standard requires that recursive macro
invocations (direct or indirect) not be expanded. And even if they
could be and were, there's no format of #if that works like that.
- David.Thompson1 at worldnet.att.net This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Itay |
last post by:
Hi all ,
Suppose i have structurs defines as follows :
struct S1
{
int n1 ;
int n2 ;
int n3 ;
};
|
by: babuyama |
last post by:
Hi,
Is there a way to obtain library name at compile/preprocessor time?
Assuming that the compilation unit, myfile.c is part of mylib.a, from
myfile.c code at compile/preprocessor time, I would...
|
by: spasmous |
last post by:
I just found out MSVC++ doesn't unroll loops on any optimization
setting. So I manually unrolled the inner loop of a slow part of my
program... The result was a 2X gain in speed. So I'm wondering...
|
by: /* frank */ |
last post by:
My teacher said that array in C is managed by preprocessor.
Preprocesser replace all array occurences (i.e. int a ) with
something that I don't understand/remember well.
What's exactly happens...
|
by: songie D |
last post by:
would it be possible to sort of engineer some sort of preprocessor macro
that does a 'for' loop.
i.e. for where you would normally use a normal for loop, but when it is
known ay compile time whay...
|
by: =?ISO-8859-2?Q?Boris_Du=B9ek?= |
last post by:
Hi,
I have trouble defining a macro - see the following code:
#define LETTER_STRAIGHT(let) let = L'#let'
enum Letter {
LETTER_STRAIGHT(A),
LETTER_STRAIGHT(B),
LETTER_STRAIGHT(C),
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |