On Feb 8, 9:17 am, "Daneel" <michael.ransb...@gmail.comwrote:

Hello!

I'm looking for an algorithm which finds all occurences of a bit

sequence (e.g., "0001") in a file. This sequence can start at any bit

in the file (it is not byte aligned).

I have some ideas of how to approach the problem (1) reading file into

unsigned char buffer, 2) defining bit structure, 3) comparing the

first 4 bits of the buffer with the bit structure, 4) shifting the

char buffer one left, 5) repeat at step 3)) but I'm not sure if this

is a good or the right approach?

It would be great to get some input, possibly with sample source code.

Many thanks,

Michael

(Assuming an 8-bit byte) if your pattern is 8 bits or less, it might

help

you to first fill in these two arrays:

unsigned char unsplit[256], split[256][256];

where unsplit[val] gives the number of occurences of the pattern

in the binary representation of "val", and split[val1][val2] gives the

number of occurences of the pattern in the (16-bit) concatenation

of the binary representations of "val1"and "val2", minus

(unsplit[val1] + unsplit[val2]). If b1, b2, b3 ... are the bytes in

the file, then the solution is:

unsplit[b1] + split[b1][b2] + unsplit[b2] + split[b2][b3]

+ unsplit[b3] + split[b3][b4] + ...

The file might have to be several hundred Kbytes for the overhead

of generating the arrays to be worth it. The rate of cache evictions

of the arrays would also probably have a big influence on

performance.