Hundreds of cases in a switch, optimization? - Page 2

He Shiming

Hi,

I just wrote a function that has over 200 "cases" wrapped in a "switch"
statement. I'm wondering if there are performance issues in such
implementation. Do I need to optimize it some way?

In terms of generated machine code, how does hundreds of cases in a switch
differ from hundreds of if-elses? Do compilers or processors do any
optimization on such structured code? Do I need to worry about the
performance, usually?

Thanks,
--
He Shiming

Nov 14 '05

Subscribe Post Reply

6588

REH

"Walter Roberson" <ro******@ibd.nrc-cnrc.gc.ca> wrote in message
news:d9**********@canopus.cc.umanitoba.ca...

I just created a simple test program with four non-consequative
case labels. SGI's "cc" compiler and gcc generated different looking
code for the same program, but at key points, they both did multiple
comparisons and not just against the boundary values. [Oddly, for
both, the first pair of comparisions they did was against the
first case label and then against the second-largest case label -- not
against the largest.]

That seems odd and is very suprising to me (though admittingly I am not a
compiler writer--IANACR?). I think I do some research....

REH

Nov 14 '05 #51

Jean-Claude Arbaut

On 21/06/2005 18:43, icub3d wrote:

Unfortunately, we can't determine ahead of time if the compiler will do
this work for us. As Walter suggested, two major compilers don't do
this optimization, so it's hard to assume that many others will.

Just to compare: gcc4 -O4 on OSX

* with 4 consecutive cases: comparisons, and at most 2 jumps
* with 200 consecutive cases: jump table

Nov 14 '05 #52

David Mathog

He Shiming wrote:

Hi,

I just wrote a function that has over 200 "cases" wrapped in a "switch"
statement. I'm wondering if there are performance issues in such
implementation. Do I need to optimize it some way?

There are some applications that can't be done any other way but
frequently those 200 cases have some natural heirarchy to them and your
code may run better if you can make that heirarchy explicit to the
compiler. For instance, if 10% of the cases are "geometry operations"
and you index the cases by an integer, then by having an outer switch
which has a case for "geometry operations" (perhaps represented by
a bit in the index) it narrows down the internal switch to
only 20 cases. And so forth. My point being that it is a rare
program where there is not some fairly obvious way to cluster
the operations. If nothing else you could run the program for
a while, measure usage of each case, and then arrange the cases
such that the most frequently used cases are up at the top of
the switch.

Elsewhere in this thread someone mentioned using an array of pointers to
functions. That can be efficient _IF_ each of those functions runs for
a time significantly longer than the overhead of the function
call. However I've (unfortunately) often encountered code where
the "function" does nothing more than:

int blah(int a,int b){
if(a<b)return a;
return b;
}

or something equally trivial. That is, something which might
as well have been a define or just coded into the case
explicitly. Just to pour salt into the wound these are
invariably 3 levels down inside nested loops.

Regards,

David Mathog
ma****@caltech.edu

Nov 14 '05 #53

Lawrence Kirby

On Tue, 21 Jun 2005 15:49:23 +0000, Walter Roberson wrote:

In article <d9*********@cui1.lmms.lmco.com>, REH <bo***@nowhere.net> wrote:
You are assuming that a single switch statement does multiple checks. I
doubt any implementation does so.

A decent implementation will do so when that is appropriate.
I just created a simple test program with four non-consequative
case labels. SGI's "cc" compiler and gcc generated different looking
code for the same program, but at key points, they both did multiple
comparisons and not just against the boundary values. [Oddly, for
both, the first pair of comparisions they did was against the
first case label and then against the second-largest case label -- not
against the largest.]

There are some circumstances where a jump table isn't the best appraoch.
All you seem to be saying here is that the compilers are intelligent
enough to use a different approach when that is more appropriate.

The specific strategy could make sense if

1. There's an assumption that the first case listed is likely to be more
commonly taken than the others

2. Testing the 2nd to last nicely partitions the remaining cases: equal
and you've found a case, less and it just needs to test the 2nd case,
grerater and it just needs to test the 3rd case. OK I'm making some
assumptions about the ordering of values in the cases.

It looks like a good strategy.

Lawrence

Nov 14 '05 #54

pete

He Shiming wrote:

Oh that's definitely a great idea, I'll try, thanks.

"pete" <pf*****@mindspring.com> wrote in message
news:42***********@mindspring.com...

If the cases are all consecutive integers,
then you could rewrite it as functions
with an array of function pointers
to just index the correct function in one shot.

I worked on an embedded project where several
CPU's were sending messages to each other.
The message handling functions
were called from an array of function pointers.
I think there were somewhere between 200 and 300
different types of messages.
Each message had a type number and that was used as the index.
That's the code that they had when I showed up;
I didn't think it was difficult to work with.

--
pete

Nov 14 '05 #55

Alan Balmer

On Wed, 22 Jun 2005 00:46:09 +0100, Lawrence Kirby
<lk****@netactive.co.uk> wrote:

On Tue, 21 Jun 2005 15:49:23 +0000, Walter Roberson wrote:
In article <d9*********@cui1.lmms.lmco.com>, REH <bo***@nowhere.net> wrote:
You are assuming that a single switch statement does multiple checks. I
doubt any implementation does so.

A decent implementation will do so when that is appropriate.
I just created a simple test program with four non-consequative
case labels. SGI's "cc" compiler and gcc generated different looking
code for the same program, but at key points, they both did multiple
comparisons and not just against the boundary values. [Oddly, for
both, the first pair of comparisions they did was against the
first case label and then against the second-largest case label -- not
against the largest.]

There are some circumstances where a jump table isn't the best appraoch.
All you seem to be saying here is that the compilers are intelligent
enough to use a different approach when that is more appropriate.

The specific strategy could make sense if

1. There's an assumption that the first case listed is likely to be more
commonly taken than the others

2. Testing the 2nd to last nicely partitions the remaining cases: equal
and you've found a case, less and it just needs to test the 2nd case,
grerater and it just needs to test the 3rd case. OK I'm making some
assumptions about the ordering of values in the cases.

I don't think any assumptions are needed - the compiler can reorder
the values any way it wants.
It looks like a good strategy.

Lawrence

--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 15 '05 #56

Jean-Claude Arbaut

On 22/06/2005 17:28, Alan Balmer wrote:

I don't think any assumptions are needed - the compiler can reorder
the values any way it wants.

Of course, but it can do a much better job if it knows that one of the 4
possibilities will be chosen in 99% of cases. Then it's best to test for
that one first. Actually the reason why the compiler does not use a jump
table for only 4 values is probably to avoid a load. I you know your
processor very well, you can check with a "pipeline sketch" which solution
is best in worst case.

Nov 15 '05 #57

CBFalconer

Jean-Claude Arbaut wrote:

On 22/06/2005 17:28, Alan Balmer wrote:
I don't think any assumptions are needed - the compiler can
reorder the values any way it wants.

Of course, but it can do a much better job if it knows that one of
the 4 possibilities will be chosen in 99% of cases. Then it's best
to test for that one first. Actually the reason why the compiler
does not use a jump table for only 4 values is probably to avoid a
load. I you know your processor very well, you can check with a
"pipeline sketch" which solution is best in worst case.

Consider the following silly code fragment:

int ix;
....
switch (ix) {
case 1: foo(); break;
case 100: bar(); break;
case 1000: fie(); break;
default: fum();
}

At some point the code generator will probably consider a jump
table with 1000 entries, most of which point to fum(). Then a
smarter code generator will consider transforming the code to
something like:

if (1000 == ix) fie();
else if (100 == ix) bar();
else if (1 == ix) foo();
else fum();

However, if that code is written in the first place, and the
switched indices happen to be 1, 2, and 3, the code generator is
very unlikely to even consider a jump table. The optimum final
code probably lies somewhere between the extremes, and can be
discovered because the language insists that the cases be compile
time constants.

Note that I am talking about a code generator, not a compiler. The
compiler will have transformed the source into something that
allows the equivalence of the two sequences to be detected. The
code generator may well be common to a plethora of language
specific compilers, just as a compiler front end can feed a
multitude of machine specific code generators.

--
Some informative links:
news:news.announce.newusers
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html

Nov 15 '05 #58

lawrence.jones

REH <bo***@nowhere.net> wrote:

You are assuming that a single switch statement does multiple checks. I
doubt any implementation does so.

Any reasonable implementation does multiple checks when it makes sense to do
so. There are lots of ways to handle switch statements:

1) A directly indexed jump table
3) A hash-function indexed jump table
2) A decision tree
4) Some combination of the above

Which is most efficient depends on the range and density of the case
labels and the characteristics of the hardware. The compiler should
know all of those things, portable code cannot know the last. As I
recall, DMR's original PDP-11 compiler used all of them 30 years ago, so
I would be very surprised if any modern compiler did worse.

-Larry Jones

Archaeologists have the most mind-numbing job on the planet. -- Calvin

Nov 15 '05 #59

forayer

David Mathog wrote:

Elsewhere in this thread someone mentioned using an array of pointers to
functions. That can be efficient _IF_ each of those functions runs for
a time significantly longer than the overhead of the function
call. However I've (unfortunately) often encountered code where
the "function" does nothing more than:

int blah(int a,int b){
if(a<b)return a;
return b;
}

or something equally trivial. That is, something which might
as well have been a define or just coded into the case
explicitly. Just to pour salt into the wound these are
invariably 3 levels down inside nested loops.

Although the point David makes is valid, consider a swap function.
Wouldn't it be more readable to use a funtion to do the swapping than
write out the code for the same in a switch-case structure? Also,
consider functions where you rely on previous values of local variables
(static variables) such as sequence generators? Or recursive functions?
(I am thinking of fibonacci as an example in both cases). These
functions are usually small, but are better off as functions. If you
must expand, why not just inline them (though u are at the complier's
mercy here). And again if the switch is something like:
switch(x) {
case 1: f1(); break;
case 2: f2(); break;
case ...
}
then, i think, function pointers serve the purpose better. No
additional jump table look-ups, or (if no jump table is used) multiple
comparisons.

cheers,
forayer

Nov 15 '05 #60

Jean-Claude Arbaut

On 23/06/2005 09:18, forayer wrote:

David Mathog wrote:
Elsewhere in this thread someone mentioned using an array of pointers to
functions. That can be efficient _IF_ each of those functions runs for
a time significantly longer than the overhead of the function
call. However I've (unfortunately) often encountered code where
the "function" does nothing more than:

int blah(int a,int b){
if(a<b)return a;
return b;
}

or something equally trivial. That is, something which might
as well have been a define or just coded into the case
explicitly. Just to pour salt into the wound these are
invariably 3 levels down inside nested loops.
Although the point David makes is valid, consider a swap function.
Wouldn't it be more readable to use a funtion to do the swapping than
write out the code for the same in a switch-case structure? Also,
consider functions where you rely on previous values of local variables
(static variables) such as sequence generators? Or recursive functions?
(I am thinking of fibonacci as an example in both cases). These
functions are usually small, but are better off as functions. If you
must expand, why not just inline them (though u are at the complier's
mercy here). And again if the switch is something like:
switch(x) {
case 1: f1(); break;
case 2: f2(); break;
case ...
}
then, i think, function pointers serve the purpose better. No
additional jump table look-ups,

You need at least one !
or (if no jump table is used) multiple
comparisons.

cheers,
forayer

Nov 15 '05 #61

forayer

Jean-Claude Arbaut wrote:

On 23/06/2005 09:18, forayer wrote:
then, i think, function pointers serve the purpose better. No
additional jump table look-ups,

You need at least one !

Yes, you're right. In case of an array of function pointers, I'll be
explicitly maintaining a "jump table", right? But in that case, there
is no ambiguity as to whether a jump table is used or not.

forayer

Nov 15 '05 #62

Jean-Claude Arbaut

On 23/06/2005 15:56, forayer wrote:

Jean-Claude Arbaut wrote:
On 23/06/2005 09:18, forayer wrote:
then, i think, function pointers serve the purpose better. No
additional jump table look-ups,

You need at least one !

Yes, you're right. In case of an array of function pointers, I'll be
explicitly maintaining a "jump table", right? But in that case, there
is no ambiguity as to whether a jump table is used or not.

Yes. When you only call functions with your switch, an array of pointers
cannot be worse, and when you only "do some things" in your switch, it's
better not to use an array of pointers *if* the switch is correctly
optimized.

Nov 15 '05 #63

Christian Bau

In article <BE******************************@laposte.net>,
Jean-Claude Arbaut <je****************@laposte.net> wrote:

On 23/06/2005 15:56, forayer wrote:
Jean-Claude Arbaut wrote:
On 23/06/2005 09:18, forayer wrote:
then, i think, function pointers serve the purpose better. No
additional jump table look-ups,

You need at least one !

Yes, you're right. In case of an array of function pointers, I'll be
explicitly maintaining a "jump table", right? But in that case, there
is no ambiguity as to whether a jump table is used or not.

Yes. When you only call functions with your switch, an array of pointers
cannot be worse, and when you only "do some things" in your switch, it's
better not to use an array of pointers *if* the switch is correctly
optimized.

An array of pointers can be worse. Functions calls in a switch case
could be inlined, function pointers are much harder to inline. All the
functions must have the same interface, otherwise there is more trouble.
Function calls through a function pointer could be less efficient than
direct function calls.

Nov 15 '05 #64

Jean-Claude Arbaut

On 24/06/2005 00:39, Christian Bau wrote:

In article <BE******************************@laposte.net>,
Jean-Claude Arbaut <je****************@laposte.net> wrote:
On 23/06/2005 15:56, forayer wrote:
Jean-Claude Arbaut wrote:
On 23/06/2005 09:18, forayer wrote:
> then, i think, function pointers serve the purpose better. No
> additional jump table look-ups,

You need at least one !

Yes, you're right. In case of an array of function pointers, I'll be
explicitly maintaining a "jump table", right? But in that case, there
is no ambiguity as to whether a jump table is used or not.
Yes. When you only call functions with your switch, an array of pointers
cannot be worse, and when you only "do some things" in your switch, it's
better not to use an array of pointers *if* the switch is correctly
optimized.

An array of pointers can be worse. Functions calls in a switch case
could be inlined, function pointers are much harder to inline.

I forgot that point. But even if they are inlined, you need to read a value
in the "jump table" and jump there. Actually it's a call, but sometimes it
can be replaced with a jump: if the switch is the last statement, in some
cases a jump to the new function suffices (the return from that function
will return to the caller of the switch function). It's exactly the same
behaviour with an array of pointers: you read, then call (or jump if
enough). I don't see how inlined functions may help. And when the optimal
switch consists in a test chain, then it's already better than the array of
pointers (otherwise the switch would have been optimized another way).
Actually, there is no true inlining possible: the switch will always jump at
some point, so instead of a jump+call when the switch is not optimized, you
avoid the call and the jump remains. With an array of pointers there is only
a call. Not much difference.
All the functions must have the same interface, otherwise there is more trouble.

When doing a switch, not necessary, but very likely to be better optimized.
When using an array of pointers, I don't see how it would be possible to
have different interfaces.
Function calls through a function pointer could be less efficient than
direct function calls.

You only need to load a value, if it's loaded long enough before the call,
the indirect call may be almost as fast as if it were a direct call (not
taking the load into account). But in the context of the switch, that does
make sense only if there are few cases: then the test chain is optimal, and
calls can be direct. If there are more cases, the switch uses a jump table,
so the call is necessarily indirect.

Nov 15 '05 #65

Christian Bau

In article <BE******************************@laposte.net>,
Jean-Claude Arbaut <je****************@laposte.net> wrote:

On 24/06/2005 00:39, Christian Bau wrote:
Function calls through a function pointer could be less efficient than
direct function calls.

You only need to load a value, if it's loaded long enough before the call,
the indirect call may be almost as fast as if it were a direct call (not
taking the load into account). But in the context of the switch, that does
make sense only if there are few cases: then the test chain is optimal, and
calls can be direct. If there are more cases, the switch uses a jump table,
so the call is necessarily indirect.

Have you ever seen a PowerPC/CFMG implementation of an indirect function
call? It's fun. (CFMG is the Code Fragment Manager, used on MacOS 9 and
to some extent on MacOS X).

Nov 15 '05 #66