Array for speed or...

Schraalhans Keukenmeester

I have a substantial random stream of (numerical, long int) data (coming
from several SCADA sources) that needs to be passed thru a function that
tallies the occurrences of several different bit patterns in each number
(or in fact 32 bit row, of which a few (leftmost) are bogus).

Speediness of execution is high on the priority list.

I thought of a solution along these lines, avoiding relatively slow
switch/case or several if/elseif combinations:

If the data would have been short (or int) I'd be able to do this:

#include <stdio.h>
#define BOGUS_BIT_MASK SHORT_MAX & (SHORTMAX >3)
//assume 3 bogus bits in case we use short

#define BATCH_SIZE 1024

typedef unsigned long long ull;
typedef FILE* fileptr;

ull frequency[SHORT_MAX];
int batch[BATCH_SIZE];
fileptr stream;

void tally (const ull batch[], const int batch_size, int freq[]);
void zero_freq (ull freq);
int getdata (int batch[], const int BATCH_SIZE, const fileptr fp);

int main (void)
{
// zero all cells in frequency array (omitted)
while (getdata(batch,BATCH_SIZE,stream)){
// function definition omitted here
tally (batch,BATCH_SIZE,frequency);
}
process_results (frequency);
// function definition omitted here
return 0;
}

void tally (const ull batch[], const int batch_size, int freq[]);
{
int i;
for(i=0;i<batch_size;i++) {
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_01]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_02]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_03]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_04]++;
// etc. etc. in reality bitpatterns will be stored in an array
// and bogus mask is applied once each run of the loop.
}
}

Unfortunately, with int32 data using a mostly empty array of the desired
size (largest bitpattern value comes pretty close to 2**29) is
challenging, if not impossible on a given machine/os.

In total, it's likely up to a few hundred bitpatterns may be used, but I
can't guarantee yet it won't expand later.

I quickly abandoned the if/then approach, as it became obvious it would
result in a messy, slow, hard to debug/adapt spaghetti.

Has anyone got a better idea for the huge array ? Is there an easy (to
maintain) way of squeezing out all the non-used cells in the array? Some
kind of mapping perhaps ? I tried figuring out a list approach, but that
more or less stranded. Besides I could not see an easy way to target the
right chain in the list in a way similar to the array approach. If
indeed my approach deserves further developing, do you expect a big
further speed gain from coding the taly function in a OS-specific ASM
version? I know this is platform specific, just like to get a feel of
what your opninions are on how succesful modern compilers are in optimizing.

I am fully aware I may be completely off on the wrong foot here, in
which case I am glad to be steered in the right direction. Please don't
let a possible typo in above semi-code put you off, it's not the real
thing, as it still is in experimental stage, rather more complicated,
partly handed to my by others, and messy and illegible as hell.

Thanks for your help in advance, much appreciated.
Sh.
PS in case you're interested, the final program will be put to use as
data hub for an (in-house developed) logistics planning package and
feedback to both SCADA devices and an accounting database. All efforts
sofar have stranded, now I am asked to burn my fingers... Interesting
yet challenging.

Aug 23 '06 #1

Subscribe Post Reply

1599

Eric Sosman

Schraalhans Keukenmeester wrote On 08/23/06 17:44,:

[... much snipped ...]
void tally (const ull batch[], const int batch_size, int freq[]);
{
int i;
for(i=0;i<batch_size;i++) {
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_01]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_02]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_03]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_04]++;
// etc. etc. in reality bitpatterns will be stored in an array
// and bogus mask is applied once each run of the loop.
}
}

Unfortunately, with int32 data using a mostly empty array of the desired
size (largest bitpattern value comes pretty close to 2**29) is
challenging, if not impossible on a given machine/os.

In total, it's likely up to a few hundred bitpatterns may be used, but I
can't guarantee yet it won't expand later.

Exactly what are you trying to do? You said you were
trying to "tally the occurrences" of bit patterns, but that
is not quite what the code does. For example, suppose there
is just one bit pattern, BIT_PATTERN_01 = 0x1. Then freq[0]
will count all the even data values and freq[1] will count
all the odd values. The freq[1] count will include not only
the data value 0x1 (matching BIT_PATTERN_01), but also all the
values 0x3, 0x51, 0xffff, ...

Things get even more confused if one BIT_PATTERN_xx is a
"subset" of another: BIT_PATTERN_01 = 0x1, BIT_PATTERN_02 = 0x3,
for example.

Please clarify what kind of "tally" you want to compute.

--
Er*********@sun.com

Aug 23 '06 #2

Walter Roberson

In article <44**********************@news.xs4all.nl>,
Schraalhans Keukenmeester <fi********************@xsfourall.ennelwrote:

>I have a substantial random stream of (numerical, long int) data (coming
from several SCADA sources) that needs to be passed thru a function that
tallies the occurrences of several different bit patterns in each number

>Speediness of execution is high on the priority list.

>Unfortunately, with int32 data using a mostly empty array of the desired
size (largest bitpattern value comes pretty close to 2**29) is
challenging, if not impossible on a given machine/os.

>In total, it's likely up to a few hundred bitpatterns may be used, but I
can't guarantee yet it won't expand later.

Is it necessary that the bit patterns be easily changable?

Two approaches come to mind:

1) If it is acceptable to spin a new executable when the patterns
change, then write a program that reads the bit patterns,
and spits out code that matches appropriately. When the
pattern file changes, rebuild that one bit of code and relink.

2) If it is not acceptable to spin a new executable, then you might
be able to find a data table representation suitable for loading
a match function from a data file. This might end up
slower than the above, as you might end up "interpreting" a bit.

freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_01]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_02]++;

That doesn't seem right -- for example, the all-0 bit pattern
is going to increment freq[0] for every BIT_PATTERN_* .
If you have "a few hundred" bit patterns, you are definitely going
to end up with these kind of clashes.

Don't you need something closer to:

freq[1] += batch[i] & BIT_MASK_01 == BIT_PATTERN_01;
freq[2] += batch[i] & BIT_MASK_02 == BIT_PATTERN_02;
--
Okay, buzzwords only. Two syllables, tops. -- Laurie Anderson

Aug 23 '06 #3

Schraalhans Keukenmeester

Eric Sosman wrote:

>
Schraalhans Keukenmeester wrote On 08/23/06 17:44,:
>[... much snipped ...]
void tally (const ull batch[], const int batch_size, int freq[]);
{
int i;
for(i=0;i<batch_size;i++) {
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_01]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_02]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_03]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_04]++;
// etc. etc. in reality bitpatterns will be stored in an array
// and bogus mask is applied once each run of the loop.
}
}

Unfortunately, with int32 data using a mostly empty array of the desired
size (largest bitpattern value comes pretty close to 2**29) is
challenging, if not impossible on a given machine/os.

In total, it's likely up to a few hundred bitpatterns may be used, but I
can't guarantee yet it won't expand later.

Exactly what are you trying to do? You said you were
trying to "tally the occurrences" of bit patterns, but that
is not quite what the code does. For example, suppose there
is just one bit pattern, BIT_PATTERN_01 = 0x1. Then freq[0]
will count all the even data values and freq[1] will count
all the odd values. The freq[1] count will include not only
the data value 0x1 (matching BIT_PATTERN_01), but also all the
values 0x3, 0x51, 0xffff, ...

Things get even more confused if one BIT_PATTERN_xx is a
"subset" of another: BIT_PATTERN_01 = 0x1, BIT_PATTERN_02 = 0x3,
for example.

Please clarify what kind of "tally" you want to compute.

You are right, yet in practice either the actual bitmasks are mutually
exclusive or data not conforming (to some extra requirements) will be
reworked into another form. As I said, it's more complicated in reality.
In the processing stage only the 'cells' indexed by the bitmasks are
used, the rest is discarded. I am fully aware additional cells will be
incremented. I admit, my statement the data was random isn't completely
true.

If the above requirements cannot be met for all data/masks and some
combinations lead to other valid pattern counts being incremented, the
stream will have to be split in substreams. Actually the data is already
sorted in substreams based on identifiers in the raw data stream (which
comes in the form source_identifier, packet_id, packetsize, data, data,
[...,] data, alignment, crc)

Right now I am mainly interested in the usability -in general- of the
suggested algoritm with speed in mind. I hope you can help me with that!

Thanks for your words of wisdom and your time invested!
Rgds
Sh.

Aug 23 '06 #4

Schraalhans Keukenmeester

Walter Roberson wrote:

In article <44**********************@news.xs4all.nl>,
Schraalhans Keukenmeester <fi********************@xsfourall.ennelwrote:
>I have a substantial random stream of (numerical, long int) data (coming
from several SCADA sources) that needs to be passed thru a function that
tallies the occurrences of several different bit patterns in each number

>Speediness of execution is high on the priority list.

>Unfortunately, with int32 data using a mostly empty array of the desired
size (largest bitpattern value comes pretty close to 2**29) is
challenging, if not impossible on a given machine/os.

>In total, it's likely up to a few hundred bitpatterns may be used, but I
can't guarantee yet it won't expand later.

Is it necessary that the bit patterns be easily changable?

Two approaches come to mind:

1) If it is acceptable to spin a new executable when the patterns
change, then write a program that reads the bit patterns,
and spits out code that matches appropriately. When the
pattern file changes, rebuild that one bit of code and relink.

2) If it is not acceptable to spin a new executable, then you might
be able to find a data table representation suitable for loading
a match function from a data file. This might end up
slower than the above, as you might end up "interpreting" a bit.

> freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_01]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_02]++;

That doesn't seem right -- for example, the all-0 bit pattern
is going to increment freq[0] for every BIT_PATTERN_* .
If you have "a few hundred" bit patterns, you are definitely going
to end up with these kind of clashes.

Don't you need something closer to:

freq[1] += batch[i] & BIT_MASK_01 == BIT_PATTERN_01;
freq[2] += batch[i] & BIT_MASK_02 == BIT_PATTERN_02;

I am afraid it may end up being a necessity indeed, which would impact
speed quite seriously. I had not fully thought through the consequences
of just incrementing based on masking given a large(r) set of masks. Too
bad, can't have it all, but this DOES help me limit the array-size. I
think you helped me here. THANKS! Curious how gcc or VisualC are gonna
code/optimize this...

Aug 23 '06 #5

tedu

Schraalhans Keukenmeester wrote:

I have a substantial random stream of (numerical, long int) data (coming
from several SCADA sources) that needs to be passed thru a function that
tallies the occurrences of several different bit patterns in each number
(or in fact 32 bit row, of which a few (leftmost) are bogus).

Has anyone got a better idea for the huge array ? Is there an easy (to
maintain) way of squeezing out all the non-used cells in the array? Some
kind of mapping perhaps ? I tried figuring out a list approach, but that
more or less stranded. Besides I could not see an easy way to target the
right chain in the list in a way similar to the array approach. If

i assume you have some idea what possible bit patterns are coming.
following is a slightly more readable (to some) variant of if/else that
may work. it's fairly easy to add a new pattern to the top (or make it
patnum-- and insert at the bottom).

int patnum = 0;
switch (pattern) {
case pat3:
patnum++;
case pat2:
patnum++;
case pat1:
patnum++;
}
pattern_counter[patnum]++;

Aug 23 '06 #6

Eric Sosman

Schraalhans Keukenmeester wrote On 08/23/06 19:05,:

Walter Roberson wrote:
>>[...]
Don't you need something closer to:

freq[1] += batch[i] & BIT_MASK_01 == BIT_PATTERN_01;
freq[2] += batch[i] & BIT_MASK_02 == BIT_PATTERN_02;

I am afraid it may end up being a necessity indeed, which would impact
speed quite seriously. I had not fully thought through the consequences
of just incrementing based on masking given a large(r) set of masks. Too
bad, can't have it all, but this DOES help me limit the array-size. I
think you helped me here. THANKS! Curious how gcc or VisualC are gonna
code/optimize this...

Maybe you could just tally the data values as they arrive,
letting the "uninteresting" values increment counters you don't
care about:

for (i = 0; i < value_count; ++i)
freq[ value[i] & BOGUS_BIT_MASK ]++;

for (i = 0; i < pattern_count; ++i)
printf ("Pattern %X occurred %d times\n",
bit_pattern[i], freq[ bit_pattern[i] ]);

Needs a big freq[] array (a big freakin' array?), but it has
the virtue of simplicity.

Take a step back for a moment, though: How much speed do
you actually need? If you've got a large quantity of values
to screen for patterns, they're probably being supplied from
an I/O source of some kind. I/O is usually several orders of
magnitude slower than the CPU (disk service times are in the
milliseconds, CPU cycle times are in the *nano*seconds). If
you squeeze the last drop of performance from your screening
code, you may find that you've done little more than increase
the iteration count of the system idle loop ... It's well to
do things efficiently, but speeding up something that is not
the bottleneck will make only a small overall improvement.

--
Er*********@sun.com

Aug 24 '06 #7

Ancient_Hacker

Eric Sosman wrote:

Take a step back for a moment, though: How much speed do
you actually need? If you've got a large quantity of values
to screen for patterns, they're probably being supplied from
an I/O source of some kind. I/O is usually several orders of
magnitude slower than the CPU (disk service times are in the
milliseconds, CPU cycle times are in the *nano*seconds).

Eric's got a good point. CPU's these days are much faster than most
I/O devices.
Disks take milliseconds to find a file, then transfer at anywhere from
10 to 200 megabytes/sec. CPU's can usually keep up with all but the
fastest disks.
But you mentioned SCADA data, which may be coming in from most any
port.

What is the speed of your SCADA data flow? If it's less than a
megabyte a second, the CPU tallying time is probably not the
bottleneck.

Aug 24 '06 #8

Schraalhans Keukenmeester

Ancient_Hacker wrote:

Eric Sosman wrote:

> Take a step back for a moment, though: How much speed do
you actually need? If you've got a large quantity of values
to screen for patterns, they're probably being supplied from
an I/O source of some kind. I/O is usually several orders of
magnitude slower than the CPU (disk service times are in the
milliseconds, CPU cycle times are in the *nano*seconds).

Eric's got a good point. CPU's these days are much faster than most
I/O devices.
Disks take milliseconds to find a file, then transfer at anywhere from
10 to 200 megabytes/sec. CPU's can usually keep up with all but the
fastest disks.
But you mentioned SCADA data, which may be coming in from most any
port.

What is the speed of your SCADA data flow? If it's less than a
megabyte a second, the CPU tallying time is probably not the
bottleneck.

It's not completely clear right now (to me) what the combined stream
from the data collector will be, I've received some specs of the devices
it gets its data from, I believe the highest POSSIBLE output rate of the
newest single device is 128kb/sec, but as the pilot plant engineers
assured me is never remotely reached. Actual measurements of the
avg/min/max data load on the collector have not been made available to
me yet. Can you spell slow moving organisations, near-pensioners and
bureaucracy ;-)?
The older machines have been used with ancient 14k4 modems over crappy
POTS lines, so that should not be something to fear.

But it would seem, as far as I can judge from here, the collector never
receives as much as 1Mbyte/sec. Have to see the measurements report yet
however on how much comes OUT of the collector, since it adds overhead
to the data adding source identifiers and repackaging the mostly antique
protocols its being talked to into ip packets. 1Mbyte/sec seems quite a
huge amount, it would surely show up as a significant load on the
10mbit/sec ethernet segment the collector's on, and a quick glance at HP
NetView showed it's not very busy at all.

I may indeed have been overwhelmed initially by the amount of probes
throughout the plant, the stack of paper dumped on my desk and the
urgency management shows wanting our response (can/will we do it) .
Thanks for putting my feet back on earth. Now let me go back on the
floor oozing coolness and confidence around the place from every pore...

Rgds
Sh

Aug 25 '06 #9

Eric Sosman

Schraalhans Keukenmeester wrote On 08/25/06 08:06,:

[various data rate estimates, all fairly inexact]

Sounds like it's time to get better information about
the data rates. Judging from what's known, though, I don't
think it's likely that your CPU will be a bottleneck and I
doubt that you need to worry about "heroic" performance
tricks.

Of course, the only way to be sure is to measure the
performance you actually get, and this raises something of
a chicken-and-egg problem: You won't know how fast the program
runs until you write it, and you won't know whether you should
write simply or heroically until you know how fast it runs!
You might want to try writing just the "core" of the program
and putting it in a test harness that feeds it a lot of data
(10MB of random numbers 100 times, say) and measures how long
it takes with different pattern counts.

Good luck -- but at the moment, I don't think you need
a lot of C help.

--
Er*********@sun.com

Aug 25 '06 #10

Schraalhans Keukenmeester

Eric Sosman wrote:

>
Schraalhans Keukenmeester wrote On 08/25/06 08:06,:
>[various data rate estimates, all fairly inexact]

Of course, the only way to be sure is to measure the
performance you actually get, and this raises something of
a chicken-and-egg problem: You won't know how fast the program
runs until you write it, and you won't know whether you should
write simply or heroically until you know how fast it runs!
You might want to try writing just the "core" of the program
and putting it in a test harness that feeds it a lot of data
(10MB of random numbers 100 times, say) and measures how long
it takes with different pattern counts.

Good luck -- but at the moment, I don't think you need
a lot of C help.

Point and advice taken! Thx.
Sh.

Aug 25 '06 #11

Dave Thompson

On Wed, 23 Aug 2006 22:49:14 +0000 (UTC), ro******@ibd.nrc-cnrc.gc.ca
(Walter Roberson) wrote:

In article <44**********************@news.xs4all.nl>,
Schraalhans Keukenmeester <fi********************@xsfourall.ennelwrote:

freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_01]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_02]++;

That doesn't seem right -- for example, the all-0 bit pattern
is going to increment freq[0] for every BIT_PATTERN_* .
If you have "a few hundred" bit patterns, you are definitely going
to end up with these kind of clashes.

Don't you need something closer to:

freq[1] += batch[i] & BIT_MASK_01 == BIT_PATTERN_01;
freq[2] += batch[i] & BIT_MASK_02 == BIT_PATTERN_02;

Probably even closer to
freq[1] += ( batch[i] & BIT_MASK_01 ) == BIT_PATTERN_01;
etc. as without the parentheses you always & with constant 1
(which is almost certainly wrong) or 0 (which is useless).

- David.Thompson1 at worldnet.att.net

Sep 4 '06 #12

Array for speed or...

Similar topics