For loop equivalent with the preprocessor

Nudge

I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
....
do_something(A[255]);

I thought: why should I type so much? I should write a macro.

So I was looking to write something along the lines of:

#define write_256(array) #for(i,0,255,do_something(foo[i]);

But I coudn't find a way to do it...

Is there a way to write a macro that the C pre-processor will expand
to k instructions, and be able to reference the iteration number?

I thought of using recursive macro calls along the lines of:

#define write_more(n,array) \
#if (n>0) do_something(foo[256-n]); \
write_more(n-1,array)

but it seems implementers don't like recursive macro calls :-)

Any insight?

Nov 13 '05 #1

Subscribe Post Reply

40006

Joona I Palaste

Nudge <de*****@kma.eu.org> scribbled the following:

I have an array, and an unrolled loop which looks like this: do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]); I thought: why should I type so much? I should write a macro.

I think: Why are you bothering with this? Re-roll your loop into a
genuine for loop, set your compiler's optimisation to "pretty darned
high", and it might unroll the loop for you when generating code.
Remember the rules about optimisation at source code level:
(1) Don't do it.
(2) (For experts only!) Don't do it yet.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ---------------------------\
| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste W++ B OP+ |
\----------------------------------------- Finland rules! ------------/

Nov 13 '05 #2

Nudge

Joona I Palaste wrote:

Nudge <de*****@kma.eu.org> scribbled the following:
I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);

I thought: why should I type so much? I should write a macro.

I think: Why are you bothering with this? Re-roll your loop into a
genuine for loop, set your compiler's optimisation to "pretty darned
high", and it might unroll the loop for you when generating code.
Remember the rules about optimisation at source code level:
(1) Don't do it.
(2) (For experts only!) Don't do it yet.

ARGH!

I knew I should have said that I was smarter than my compiler :-)

I am not writing any serious code, only fooling around.

Is there a way to beat the preprocessor into submission?

Nov 13 '05 #3

Emmanuel Delahaye

In 'comp.lang.c', Nudge <de*****@kma.eu.org> wrote:

I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);

I thought: why should I type so much? I should write a macro.

So I was looking to write something along the lines of:

#define write_256(array) #for(i,0,255,do_something(foo[i]);

But I coudn't find a way to do it...

There is no portable way do do this directly, but there is a trick anyway (I
got it from c.l.c, BTW):

Define the list of constant values:

/* values.itm */
ITEM(0)
ITEM(1)
ITEM(2)
ITEM(3)
....
ITEM(254)
ITEM(255)

This file can be automatically created with a trivial C program.

Now, in your application, you do the following:

/* myapp.c */
<...>

#define ITEM(a) \
do_something(A[a]);

#include "values.itm"
#undef ITEM

That's all folks.

I use massively this trick to define repetitive code. It's very powerful and
makes solid code when mastered.

--
-ed- em**********@noos.fr [remove YOURBRA before answering me]
The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
<blank line>
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/

Nov 13 '05 #4

Scott Fluhrer

"Nudge" <de*****@kma.eu.org> wrote in message
news:3f***********************@news.free.fr...

Joona I Palaste wrote:
Nudge <de*****@kma.eu.org> scribbled the following:
I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);

I thought: why should I type so much? I should write a macro.

I think: Why are you bothering with this? Re-roll your loop into a
genuine for loop, set your compiler's optimisation to "pretty darned
high", and it might unroll the loop for you when generating code.
Remember the rules about optimisation at source code level:
(1) Don't do it.
(2) (For experts only!) Don't do it yet.

ARGH!

I knew I should have said that I was smarter than my compiler :-)

I am not writing any serious code, only fooling around.

Is there a way to beat the preprocessor into submission?

If you absolutely insist...

#define X0(n) do_something(A[n]);
#define X1(n) X0(n) X0(n+1)
#define X2(n) X1(n) X1(n+2)
#define X3(n) X2(n) X2(n+4)
#define X4(n) X3(n) X3(n+8)
#define X5(n) X4(n) X4(n+16)
#define X6(n) X5(n) X5(n+32)
#define X7(n) X6(n) X6(n+64)
#define X8(n) X7(n) X7(n+128)

X8(0)

If you ask me, Joona's suggestion is considerably more reasonable.

--
poncho

Nov 13 '05 #5

Pushker Pradhan

I've never done this kind of stuff, but isn't there the option -funroll-loop
(gcc)?
I've seen lot of software use this option? Joona also suggested something
similar I think?

--
Pushkar Pradhan
"Emmanuel Delahaye" <em**********@noos.fr> wrote in message
news:Xn***************************@130.133.1.4...

In 'comp.lang.c', Nudge <de*****@kma.eu.org> wrote:
I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);

I thought: why should I type so much? I should write a macro.

So I was looking to write something along the lines of:

#define write_256(array) #for(i,0,255,do_something(foo[i]);

But I coudn't find a way to do it...
There is no portable way do do this directly, but there is a trick anyway

(I got it from c.l.c, BTW):

Define the list of constant values:

/* values.itm */
ITEM(0)
ITEM(1)
ITEM(2)
ITEM(3)
...
ITEM(254)
ITEM(255)

This file can be automatically created with a trivial C program.

Now, in your application, you do the following:

/* myapp.c */
<...>

#define ITEM(a) \
do_something(A[a]);

#include "values.itm"
#undef ITEM

That's all folks.

I use massively this trick to define repetitive code. It's very powerful and makes solid code when mastered.

--
-ed- em**********@noos.fr [remove YOURBRA before answering me]
The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
<blank line>
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/

Nov 13 '05 #6

Pushker Pradhan wrote:

I've never done this kind of stuff, but isn't there the option
-funroll-loop (gcc)?
I've seen lot of software use this option? Joona also suggested something
similar I think?

In gcc, -funroll-loop causes loops to be unrolled into, well, non-loops.
Example:
int counter = 0;
int i;
for(i = 0; i < 5; i++)
counter++;

Becomes:
int counter = 0;
int i;
counter++;
counter++;
counter++;
counter++;
counter++;
i = 5;

Of course, this is all done in assembler, so the result's probably different
somewhat. But that's the idea behind unrolling loops - to eliminate the
looping overhead.
--
Feminists just want the human race to be a tie.

Nov 13 '05 #7

Emmanuel Delahaye

In 'comp.lang.c', "Pushker Pradhan" <pu*****@erc.msstate.edu> wrote:

I've never done this kind of stuff, but isn't there the option
-funroll-loop (gcc)?
I've seen lot of software use this option? Joona also suggested
something similar I think?

If your compiler has the option, it's fine, but what if it has not? You have
not always the choice of your compiler. It can belong to a customer's
production process where nothing is allowed to change, or belong to some
peculiar platform... who knows... Better to stay portable. Always (when
possible).
--
-ed- em**********@noos.fr [remove YOURBRA before answering me]
The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
<blank line>
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/

Nov 13 '05 #8

Emmanuel Delahaye

In 'comp.lang.c', "Glen Herrmannsfeldt" <ga*@ugcs.caltech.edu> wrote:

The PL/I preprocessor can do this.

%DO I=0 TO 255;
do_something(A[i]);
%END
%DEACTIVATE I

(The last is the equivalent of #undef, except that you can %ACTIVATE it and
get the previous value back again. The % indicates preprocessor
statements.)

I have wondered about the decrease in preprocessor power as computers get
faster and computer memory gets larger. Consider the progression from PL/I
to C to Java in terms of preprocessor power.

I recall that the Intel macro assembler for 8051 I used in the 90's was very
powerfull, and did support iterative constructs like that.

--
-ed- em**********@noos.fr [remove YOURBRA before answering me]
The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
<blank line>
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/

Nov 13 '05 #9

Glen Herrmannsfeldt

"Emmanuel Delahaye" <em**********@noos.fr> wrote in message
news:Xn***************************@130.133.1.4...

In 'comp.lang.c', "Glen Herrmannsfeldt" <ga*@ugcs.caltech.edu> wrote:
The PL/I preprocessor can do this.

%DO I=0 TO 255;
do_something(A[i]);
%END
%DEACTIVATE I

(The last is the equivalent of #undef, except that you can %ACTIVATE it and get the previous value back again. The % indicates preprocessor
statements.)

I have wondered about the decrease in preprocessor power as computers get faster and computer memory gets larger. Consider the progression from PL/I to C to Java in terms of preprocessor power.
I recall that the Intel macro assembler for 8051 I used in the 90's was

very powerfull, and did support iterative constructs like that.

The OS/360 assemblers supported a variety of constructs, including if and
goto. The while loop could be, and often was, written using macros. I used
to know people using the OS/360 compilers to compile for most microcomputers
by writing a macro for each possible opcode, and then post-processing the
output files.

I never used the 8051, though.

-- glen

Nov 13 '05 #10

Nudge

Scott Fluhrer wrote:

"Nudge" <de*****@kma.eu.org> wrote in message
news:3f***********************@news.free.fr...
Joona I Palaste wrote:

Nudge <de*****@kma.eu.org> scribbled the following:
I have an array, and an unrolled loop which looks like this:
do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);
I thought: why should I type so much? I should write a macro.
I think: Why are you bothering with this? Re-roll your loop into a
genuine for loop, set your compiler's optimisation to "pretty darned
high", and it might unroll the loop for you when generating code.
Remember the rules about optimisation at source code level:
(1) Don't do it.
(2) (For experts only!) Don't do it yet.

ARGH!

I knew I should have said that I was smarter than my compiler :-)

I am not writing any serious code, only fooling around.

Is there a way to beat the preprocessor into submission?

If you absolutely insist...

#define X0(n) do_something(A[n]);
#define X1(n) X0(n) X0(n+1)
#define X2(n) X1(n) X1(n+2)
#define X3(n) X2(n) X2(n+4)
#define X4(n) X3(n) X3(n+8)
#define X5(n) X4(n) X4(n+16)
#define X6(n) X5(n) X5(n+32)
#define X7(n) X6(n) X6(n+64)
#define X8(n) X7(n) X7(n+128)

X8(0)

If you ask me, Joona's suggestion is considerably more reasonable.

What about SHA-256's inner loop, where unrolling 8 times and
symbolically renaming the 8 variables for every iteration allows one
to write only 4 assignments?

I don't think there are many optimizing compilers out there which
are THAT smart...

Anyway, thanks for the log2(n) solution. I find it fairly elegant.

Nov 13 '05 #11

Scott Fluhrer

"Nudge" <de*****@kma.eu.org> wrote in message
news:3f***********************@news.free.fr...

Scott Fluhrer wrote:
"Nudge" <de*****@kma.eu.org> wrote in message
news:3f***********************@news.free.fr...
Joona I Palaste wrote:
Nudge <de*****@kma.eu.org> scribbled the following:
>I have an array, and an unrolled loop which looks like this:
>do_something(A[0]);
>do_something(A[1]);
>...
>do_something(A[255]);
>I thought: why should I type so much? I should write a macro.
I think: Why are you bothering with this? Re-roll your loop into a
genuine for loop, set your compiler's optimisation to "pretty darned
high", and it might unroll the loop for you when generating code.
Remember the rules about optimisation at source code level:
(1) Don't do it.
(2) (For experts only!) Don't do it yet.

ARGH!

I knew I should have said that I was smarter than my compiler :-)

I am not writing any serious code, only fooling around.

Is there a way to beat the preprocessor into submission?

If you absolutely insist...

#define X0(n) do_something(A[n]);
#define X1(n) X0(n) X0(n+1)
#define X2(n) X1(n) X1(n+2)
#define X3(n) X2(n) X2(n+4)
#define X4(n) X3(n) X3(n+8)
#define X5(n) X4(n) X4(n+16)
#define X6(n) X5(n) X5(n+32)
#define X7(n) X6(n) X6(n+64)
#define X8(n) X7(n) X7(n+128)

X8(0)

If you ask me, Joona's suggestion is considerably more reasonable.

What about SHA-256's inner loop, where unrolling 8 times and
symbolically renaming the 8 variables for every iteration allows one
to write only 4 assignments?

I don't think there are many optimizing compilers out there which
are THAT smart...

Anyway, thanks for the log2(n) solution. I find it fairly elegant.

Actually, elegant it ain't. Some obvious problems:

- A rather annoying amount of changes are required if you change the 256 to,
say, 300
- It's fairly easy to have a typo somewhere in all the #define's. It'll
compile just fine, but perhaps miss do_something(A[153]);
- It is certainly less readable than an explicit for loop

In my opinion, just stick with the explicit for loop

--
poncho

Nov 13 '05 #12

Nudge

>>What about SHA-256's inner loop, where unrolling 8 times and

symbolically renaming the 8 variables for every iteration allows one
to write only 4 assignments?

I don't think there are many optimizing compilers out there which
are THAT smart...

Anyway, thanks for the log2(n) solution. I find it fairly elegant.

Actually, elegant it ain't. Some obvious problems:

- A rather annoying amount of changes are required if you change the 256 to,
say, 300
- It's fairly easy to have a typo somewhere in all the #define's. It'll
compile just fine, but perhaps miss do_something(A[153]);
- It is certainly less readable than an explicit for loop

In my opinion, just stick with the explicit for loop

Forgive me if I insist :-)

What about SHA-256's inner loop? (You're a cryptographer, you know
what I'm talking about.)

With no unrolling, one must turn a-h into an array (which puts
additional pressure on the compiler) and use the iteration number as
a shifting index, modulo 8.

With 8 bodies of the loop, the modulo operations can go away.

I doubt whether even a few optimizing compilers are smart enough to
do that (I will test with gcc).

Nov 13 '05 #13

Nudge

>> Anyway, thanks for the log2(n) solution. I find it fairly

elegant.

Actually, elegant it ain't. Some obvious problems:

- A rather annoying amount of changes are required if you change
the 256 to, say, 300

- It's fairly easy to have a typo somewhere in all the #define's.
It'll compile just fine, but perhaps miss do_something(A[153]);

- It is certainly less readable than an explicit for loop

In my opinion, just stick with the explicit for loop

Forgive me if I insist.

What about SHA-256's inner loop? (You're a cryptographer, you know
what I'm talking about.)

I want to write it like this:

T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
T2 = SIGMA0(a) + Maj(a,b,c);
d += T1;
h = T1 + T2;

In the next iteration, I perform symbolic renaming, i.e.
h is now a
a is now b
b is now c
c is now d
d is now e
e is now f
f is now g
g is now h

If I unroll 8 times, symbolic renaming comes for free.

Nov 13 '05 #14

Scott Fluhrer

"Nudge" <de*****@kma.eu.org> wrote in message
news:3f***********************@news.free.fr...

Anyway, thanks for the log2(n) solution. I find it fairly
elegant.
Actually, elegant it ain't. Some obvious problems:

- A rather annoying amount of changes are required if you change
the 256 to, say, 300

- It's fairly easy to have a typo somewhere in all the #define's.
It'll compile just fine, but perhaps miss do_something(A[153]);

- It is certainly less readable than an explicit for loop

In my opinion, just stick with the explicit for loop

Forgive me if I insist.

What about SHA-256's inner loop? (You're a cryptographer, you know
what I'm talking about.)

If you're asking whether unrolling a loop is ever justified, well, on
occasion it is. You just have to remember the costs:

- It can be less obvious what's going on, making maintenance harder.
- If you unroll too much, the loop might not fit in cache. This leads to
slower performance.

I want to write it like this:

T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
T2 = SIGMA0(a) + Maj(a,b,c);
d += T1;
h = T1 + T2;

In the next iteration, I perform symbolic renaming, i.e.
h is now a
a is now b
b is now c
c is now d
d is now e
e is now f
f is now g
g is now h

If I unroll 8 times, symbolic renaming comes for free.

On the other hand, it's nontrivial to use my hack to do the renaming for
you. I'd suggest that if you unroll it 8 times, you explicitly list the 8
invocations.

--
poncho

Nov 13 '05 #15

Eric Sosman

Scott Fluhrer wrote:

"Nudge" <de*****@kma.eu.org> wrote in message
news:3f***********************@news.free.fr...
> Anyway, thanks for the log2(n) solution. I find it fairly
> elegant.

Actually, elegant it ain't. Some obvious problems:

- A rather annoying amount of changes are required if you change
the 256 to, say, 300

- It's fairly easy to have a typo somewhere in all the #define's.
It'll compile just fine, but perhaps miss do_something(A[153]);

- It is certainly less readable than an explicit for loop

In my opinion, just stick with the explicit for loop

Forgive me if I insist.

What about SHA-256's inner loop? (You're a cryptographer, you know
what I'm talking about.)

If you're asking whether unrolling a loop is ever justified, well, on
occasion it is. You just have to remember the costs:

- It can be less obvious what's going on, making maintenance harder.
- If you unroll too much, the loop might not fit in cache. This leads to
slower performance.

I want to write it like this:

T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
T2 = SIGMA0(a) + Maj(a,b,c);
d += T1;
h = T1 + T2;

In the next iteration, I perform symbolic renaming, i.e.
h is now a
a is now b
b is now c
c is now d
d is now e
e is now f
f is now g
g is now h

If I unroll 8 times, symbolic renaming comes for free.

On the other hand, it's nontrivial to use my hack to do the renaming for
you. I'd suggest that if you unroll it 8 times, you explicitly list the 8
invocations.

For the case in point, you could write something like

#define STEP(a,b,c,d,e,f,g,h) \
T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h; \
T2 = SIGMA0(a) + Maj(a,b,c); \
d += T1; \
h = T1 + T2
...
while (whatever) {
STEP(a,b,c,d,e,f,g,h);
STEP(b,c,d,e,f,g,h,a);
...
STEP(h,a,b,c,d,e,f,g);
}

(Hmmm: If speed is the objective, couldn't the `W[t] + K[t]'
piece be replaced by a precomputed `WplusK[t]'?)

--
Er*********@sun.com

Nov 13 '05 #16

Nudge

>> What about SHA-256's inner loop? (You're a cryptographer, you know

what I'm talking about.)

If you're asking whether unrolling a loop is ever justified, well, on
occasion it is. You just have to remember the costs:

- It can be less obvious what's going on, making maintenance harder.
- If you unroll too much, the loop might not fit in cache. This leads to
slower performance.

Point taken.

I want to write it like this:

T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
T2 = SIGMA0(a) + Maj(a,b,c);
d += T1;
h = T1 + T2;

In the next iteration, I perform symbolic renaming, i.e.
h is now a
a is now b
b is now c
c is now d
d is now e
e is now f
f is now g
g is now h

If I unroll 8 times, symbolic renaming comes for free.

On the other hand, it's nontrivial to use my hack to do the renaming for
you. I'd suggest that if you unroll it 8 times, you explicitly list the 8
invocations.

Well, if I change a-h to V[8] then
a is V[(64-t)%8]
b is V[(65-t)%8]
...
h is V[(71-t)%8]

All the indices can be computed at compile time. A good compiler
should be able to scalarize my array back to a-h.

Yes, 8 is not too bad, but a complete unroll requires 64
(approximately 6 KB of code, it will fit inside the L1 I$ of my
Athlon with room to spare).

Nov 13 '05 #17

Nudge

> For the case in point, you could write something like

#define STEP(a,b,c,d,e,f,g,h) \
T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h; \
T2 = SIGMA0(a) + Maj(a,b,c); \
d += T1; \
h = T1 + T2
...
while (whatever) {
STEP(a,b,c,d,e,f,g,h);
STEP(b,c,d,e,f,g,h,a);
...
STEP(h,a,b,c,d,e,f,g);
}

(Hmmm: If speed is the objective, couldn't the `W[t] + K[t]'
piece be replaced by a precomputed `WplusK[t]'?)

This is precisely what I am doing. Thanks for the hint :-)

Nov 13 '05 #18

Scott Fluhrer

"Nudge" <de*****@kma.eu.org> wrote in message
news:3f***********************@news.free.fr...

What about SHA-256's inner loop? (You're a cryptographer, you know
what I'm talking about.)

If you're asking whether unrolling a loop is ever justified, well, on
occasion it is. You just have to remember the costs:

- It can be less obvious what's going on, making maintenance harder.
- If you unroll too much, the loop might not fit in cache. This leads to
slower performance.

Point taken.
I want to write it like this:

T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
T2 = SIGMA0(a) + Maj(a,b,c);
d += T1;
h = T1 + T2;

In the next iteration, I perform symbolic renaming, i.e.
h is now a
a is now b
b is now c
c is now d
d is now e
e is now f
f is now g
g is now h

If I unroll 8 times, symbolic renaming comes for free.

On the other hand, it's nontrivial to use my hack to do the renaming for
you. I'd suggest that if you unroll it 8 times, you explicitly list the 8 invocations.

Well, if I change a-h to V[8] then
a is V[(64-t)%8]
b is V[(65-t)%8]
...
h is V[(71-t)%8]

All the indices can be computed at compile time. A good compiler
should be able to scalarize my array back to a-h.

Yes, 8 is not too bad, but a complete unroll requires 64
(approximately 6 KB of code, it will fit inside the L1 I$ of my
Athlon with room to spare).

Yes, it'll fit, but it'll push most everything else out. That means that
once you finish the SHA-256, you'll start generating lots of cache misses
:-(

--
poncho

Nov 13 '05 #19

Nudge

>> Yes, 8 is not too bad, but a complete unroll requires 64

(approximately 6 KB of code, it will fit inside the L1 I$
of my Athlon with room to spare).

Yes, it'll fit, but it'll push most everything else out. That
means that once you finish the SHA-256, you'll start generating
lots of cache misses :-(

Hi Scott,

The entire routine weighs approximately 8 KB (75% from the unrolled
inner loop).

What do you mean "once [i] finish the SHA-256"? When the OS switches
the context to a different process? I thought the cache was flushed
anyway on a context switch...

P.S. The Athlon, unlike the P4, has large L1 caches:
64 KB L1 I$
64 KB L1 D$
256 KB L2 I+D$ (512 KB for Barton)

Nov 13 '05 #20

Paul Hsieh

Nudge <de*****@kma.eu.org> wrote in message news:<3f***********************@news.free.fr>...

Scott Fluhrer wrote:
If you absolutely insist...

#define X0(n) do_something(A[n]);
#define X1(n) X0(n) X0(n+1)
#define X2(n) X1(n) X1(n+2)
#define X3(n) X2(n) X2(n+4)
#define X4(n) X3(n) X3(n+8)
#define X5(n) X4(n) X4(n+16)
#define X6(n) X5(n) X5(n+32)
#define X7(n) X6(n) X6(n+64)
#define X8(n) X7(n) X7(n+128)

X8(0)

If you ask me, Joona's suggestion is considerably more reasonable.
What about SHA-256's inner loop, where unrolling 8 times and
symbolically renaming the 8 variables for every iteration allows one
to write only 4 assignments?

Another important case is, what if do_something() is defined as follows:

#define do_something(x) do_something_else ((x), #x, __LINE__)
I don't think there are many optimizing compilers out there which
are THAT smart...

Anyway, thanks for the log2(n) solution. I find it fairly elegant.

Yeah, but log4(n) is even better, if you are really just trying to save typing,
but its a little annoying to do odd numbers.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #21

Paul Hsieh

"Scott Fluhrer" <sf******@ix.netcom.com> wrote:

If you're asking whether unrolling a loop is ever justified, well, on
occasion it is. You just have to remember the costs:

- It can be less obvious what's going on, making maintenance harder.
The same can be said of any macro usage.
- If you unroll too much, the loop might not fit in cache. This leads to
slower performance.

Humans themselves generally don't unroll to a point where the I-cache
is affected. This is usually only an issue if the *compiler* unrolls
everything, since it might unroll with wild abandon.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 13 '05 #22

Keith Thompson

"Glen Herrmannsfeldt" <ga*@ugcs.caltech.edu> writes:
[...]

I have wondered about the decrease in preprocessor power as computers get
faster and computer memory gets larger. Consider the progression from PL/I
to C to Java in terms of preprocessor power.
I once heard about a language with a generics facility (closer to C++
templates than to macros) that could do the Towers of Hanoi problem at
compile time.
There is, of course, no rule against using preprocessors for other than the
intended language.

You can run into some nasty problems with tokenization. Try using a C
preprocessor on a language that has a standalone apostrophe token.

Getting back to C, I tend to think that C's preprocessor is powerful
enough, perhaps too powerful. Or maybe it's just not integrated into
the language cleanly enough.

As for the original question (using the preprocessor to manually
unroll a loop), I don't believe there's any clean way to do that.
Some of the suggested solutions work reasonably well for powers of 2,
but not for arbitrary numbers.

If you really want to to do this kind of thing, you might consider
writing your own program that generates C source code. Personally,
I'd probably use Perl for the job, but that's just me. If you're on a
Unix-like system (more precisely, if you don't care about portability
to non-Unix-like systems), you might also look into the m4 macro
processor.

--
Keith Thompson (The_Other_Keith) ks*@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Nov 13 '05 #23

Glen Herrmannsfeldt

"Keith Thompson" <ks*@cts.com> wrote in message
news:lz************@cts.com...

"Glen Herrmannsfeldt" <ga*@ugcs.caltech.edu> writes:
[...]
I have wondered about the decrease in preprocessor power as computers get faster and computer memory gets larger. Consider the progression from PL/I to C to Java in terms of preprocessor power.
I once heard about a language with a generics facility (closer to C++
templates than to macros) that could do the Towers of Hanoi problem at
compile time.
There is, of course, no rule against using preprocessors for other than the intended language.

You can run into some nasty problems with tokenization. Try using a C
preprocessor on a language that has a standalone apostrophe token.

At least some versions of Fortran have one. For direct access files, IBM
Fortran has traditionally used READ(unit'block).

Many years ago I was working with a generic preprocessor called STEP,
written in Fortran, and trying to use it to preprocess Fortran. I had much
trouble with that one.
Getting back to C, I tend to think that C's preprocessor is powerful
enough, perhaps too powerful. Or maybe it's just not integrated into
the language cleanly enough. As for the original question (using the preprocessor to manually
unroll a loop), I don't believe there's any clean way to do that.
Some of the suggested solutions work reasonably well for powers of 2,
but not for arbitrary numbers.

The PL/I preprocessor also has compile time procedure calls, among other
features.

Also, compile time %IF, so one can conditionally unroll a loop based on a
compile time constant. Early PL/I compilers were designed to run on small
machines, so machine size can't be the reason for the C preprocessor being
the way it is.

(snip)

-- glen

Nov 13 '05 #24

Dave Thompson

On Sun, 14 Sep 2003 23:18:27 +0200, Nudge <de*****@kma.eu.org> wrote:
<snip>

So I was looking to write something along the lines of:

#define write_256(array) #for(i,0,255,do_something(foo[i]);

But I coudn't find a way to do it...

Is there a way to write a macro that the C pre-processor will expand
to k instructions, and be able to reference the iteration number?
Only the (clumsy) ways already given by Fluhrer and Delahaye.

#if outside_of_C /* especially if you need this often or vitally */
You could consider using another macro processor (like m4) or
text-processing utility (like sed, awk, perl), perhaps driven
automatically by your makefile or equivalent.
#endif

#ifdef __cplusplus
You can try (an invocation of) a template inline function that
"iterates" down to a partial specialization that terminates it; but
the C++ standard doesn't require inlines to actually be inlined, and
allows an implementation limit on nested/recursive invocation which is
only recommended to be at least 17.
#endif

Plus of course compiler features like -funroll-loops.
I thought of using recursive macro calls along the lines of:

#define write_more(n,array) \
#if (n>0) do_something(foo[256-n]); \
write_more(n-1,array)

but it seems implementers don't like recursive macro calls :-)

Not just implementers; the standard requires that recursive macro
invocations (direct or indirect) not be expanded. And even if they
could be and were, there's no format of #if that works like that.

- David.Thompson1 at worldnet.att.net

Nov 13 '05 #25

For loop equivalent with the preprocessor

Similar topics