By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,905 Members | 1,283 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,905 IT Pros & Developers. It's quick & easy.

Q: Recoding some masking routine

P: n/a
Hi,

Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF123456789ABCDEF",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF1234x6789ABCDEx".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.

Thank-you.

May 23 '07 #1
Share this Question
Share on Google+
9 Replies


P: n/a
What is the relationship between x and the mask?

May 23 '07 #2

P: n/a
Quentin Godfroy <qu*************@hotmail.comwrites:
What is the relationship between x and the mask?
Imagine reading the above without having seen the previous article.

Please provide context when you post a followup.
See <http://cfaj.freeshell.org/google/>.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
May 24 '07 #3

P: n/a
On 23 May 2007 16:35:34 -0700, Quentin Godfroy
<qu*************@hotmail.comwrote:
>What is the relationship between x and the mask?
if that byte is masked, the field can either be it's original value
or one other pre-defined character
so string to compare against = "12345678", mask = 2 (00000010),
pre-defined char = "Z"

Strings to compare:-
"12345678" = match
"123456Z8" = match

"12345Z78" = NO match
"12345578" = NO match

Hope that makes it a little clearer.

May 24 '07 #4

P: n/a
On 24 May, 00:09, f...@off.spammers.org wrote:
Hi,

Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF123456789ABCDEF",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF1234x6789ABCDEx".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.
As far as I understand the required behaviour, you have a string
("original") of up to 80 bytes which you are examining, and another
string ("pattern") of the same size (?) against which you are
comparing it. In addition to simple character equality, you need to
test bytes at specific locations - bytes at these locations in
"original" can either match the corresponding byte in "pattern" or
match a single specific other byte. Locations for this dual matching
are specified by bit masks in an additional byte array, with one byte
in the array corresponding to 8 bytes in the original string. Is that
correct so far?

My natural instinct here is to split the matching down into 8-byte
"chunks". This was fairly easy to do and IMHO didn't seem hugely ugly
(I'm not going to post my code yet, as it's far from clear that I've
understood the requirement). How ugly is your code? Could you post a
small self-contained example to show what you've currently tried?

I haven't measured speed in my case. Why do you feel it's slow in
yours? What measures have you used? Have you profiled to the code to
see where the time is going?
May 24 '07 #5

P: n/a
fe**@off.spammers.org writes:
Hi,

Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF123456789ABCDEF",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF1234x6789ABCDEx".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.
It would help more if you posted what *you* have. We could then
decide if we see anything better. Anyway, in case it help, I'd do it
like this in the first instance:

#include <limits.h>

#define BIT_SET(mask, x) ((mask)[(x) / CHAR_BIT] & (1 << (x) % CHAR_BIT))

int quasi_match(const char *s1, const char *s2,
const unsigned char *mask, char optional)
{
int i = 0;
while (s1[i] && (s1[i] == s2[i] || BIT_SET(mask, i) && s2[i] == optional))
i += 1;
return s1[i] == s2[i];
}

The meaning (numbering) of the mask bits can be changed to reflect
your usage simply by altering the arithmetic in the macro.

There is quite a lot of arithmetic in the macro but:
(a) the compiler should make /CHAR_BIT and %CHAR_BIT fast on most
machines;
(b) it is only used when the characters *don't* match.

--
Ben.
May 24 '07 #6

P: n/a
fe**@off.spammers.org wrote:
>
Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF123456789ABCDEF",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF1234x6789ABCDEx".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.
The optimum will probably depend on your data. I suggest an
initial check with strcmp. If that shows equal, no more fuss. You
might want to write your own str_cmp, and have it return a pointer
to the unequal location. You can then proceed accordingly.

--
If you want to post a followup via groups.google.com, ensure
you quote enough for the article to make sense. Google is only
an interface to Usenet; it's not Usenet itself. Don't assume
your readers can, or ever will, see any previous articles.
More details at: <http://cfaj.freeshell.org/google/>

--
Posted via a free Usenet account from http://www.teranews.com

May 24 '07 #7

P: n/a
On 24 May 2007 02:32:24 -0700, ma**********@pobox.com wrote:
>On 24 May, 00:09, f...@off.spammers.org wrote:
>Hi,

Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF123456789ABCDEF",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF1234x6789ABCDEx".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.

As far as I understand the required behaviour, you have a string
("original") of up to 80 bytes
No, can be up to 40, but actually shorter (37) (5 mask bytes * 8 bits
= 40 'masks').
which you are examining, and another
string ("pattern") of the same size (?) against which you are
comparing it. In addition to simple character equality, you need to
test bytes at specific locations - bytes at these locations in
"original" can either match the corresponding byte in "pattern" or
match a single specific other byte. Locations for this dual matching
are specified by bit masks in an additional byte array, with one byte
in the array corresponding to 8 bytes in the original string. Is that
correct so far?
100% correct. Either I'm good at explaining (*grin*) or you can read
between the lines (more likely *sigh*).
>
My natural instinct here is to split the matching down into 8-byte
"chunks". This was fairly easy to do and IMHO didn't seem hugely ugly
(I'm not going to post my code yet, as it's far from clear that I've
understood the requirement). How ugly is your code?
So ugly the code wears a mask!
>Could you post a small self-contained example to show what you've currently tried?
Of course I can:-
//str1 is the string to compare
//str2 is the const comparison string,
//it has six extra bytes, 5 for the masks and 1 that is the actual
value we need (Action)
//DEF_STR_LEN is the length of str2
//Z is the 'mask' character so the strings can match, or if mask
applies can match to original or the 'Z'.
//Like I said, loads of if's and just looks a mess to me (And I wrote
it!). I HAVEN'T profiled it yet (dreading it!),
//but this is called frequently so the fastest it can be (within
reason).

if (!strncmp((char *)str1,
(char *)str2,
(DEF_STR_LEN - 6))
) Action = str2[(DEF_STR_LEN - 1)];
else if (groupMask)
{
j = 0;
do
{
//Frig for strs that aren't a multiple of 8.
if (j == 4) l = 4; else l = 7;
{
k = 0;
do
{
if (*(str1+(j*8)+k) == str2[((j*8) +
k)]) match = 1;
else if (str2[(DEF_STR_LEN - 2 + j)]
<< k & 0x80) if(*(str1+(j*8)+k) == 'Z') match = 1;
else match = 0;
} while ((k++ < l) && match);
}
} while ((++j < 5) && match);
if (match) action = str2[(DEF_STR_LEN - 1)];
}
>I haven't measured speed in my case. Why do you feel it's slow in
yours? What measures have you used? Have you profiled to the code to
see where the time is going?
Not profiled (yet), but I can't see how it's going to wizz!

Kind Regards.
May 24 '07 #8

P: n/a
On Thu, 24 May 2007 17:31:42 GMT, fe**@off.spammers.org wrote:

SNIP
>
if (!strncmp((char *)str1,
(char *)str2,
(DEF_STR_LEN - 6))
) Action = str2[(DEF_STR_LEN - 1)];
else if (groupMask)
{
j = 0;
do
{
//Frig for strs that aren't a multiple of 8.
if (j == 4) l = 4; else l = 7;
{
k = 0;
do
{
if (*(str1+(j*8)+k) == str2[((j*8) +
k)]) match = 1;
else if (str2[(DEF_STR_LEN - 2 + j)]
<< k & 0x80) if(*(str1+(j*8)+k) == 'Z') match = 1;
Oooops, cut & paste! That should read:-
else if ((str2[(DEF_STR_LEN - 2 + j)] << k & 0x80) && (*(str1+(j*8)+k)
== 'Z')) match = 1;
> else match = 0;
} while ((k++ < l) && match);
}
} while ((++j < 5) && match);
if (match) action = str2[(DEF_STR_LEN - 1)];
}
SNIP
May 24 '07 #9

P: n/a
Here's my off-the-cuff version. I like Ben's version for terseness,
but I wanted to produce something which expressed my understanding of
the problem - which is that the masks naturally split the original and
pattern strings into chunks.

I'd start with this, or something like it, and then look at profiling
it to find out where time was being eaten and focus on how that could
be trimmed.

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int matches(char *original,char *pattern, unsigned char *masks,char
fixed);
int chunkMatches(char *original,char *pattern, unsigned char mask,char
fixed,int chunkSize);

int main(void) {
/* One good and 2 bad strings */
char *original[] = {
"12Z456789ZBCDEF123456789ABCDEF",
"12Z456789ZBCDEF123X56789ABCDEF",
"12Z456789XBCDEF123456789ABCDEF",
NULL
};
char *pattern = "123456789ABCDEF123456789ABCDEF";

/* allow the "fixed" value in char 3 of first 8 byte chunk
char 2 of second 8 byte chunk, but not in the third */
unsigned char masks[3]= { 1 << 5, 1 << 6, 0 };
char fixed = 'Z';
char **o;
for(o=original;*o != NULL;o++) {
printf("[%s] %s match [%s](%c)\n",
*o,matches(*o,pattern,masks,'Z')?"did":"did
not",pattern,fixed);
}
}

int matches(char *original,char *pattern, unsigned char *masks,char
fixed) {
int chunks;
int oLen = strlen(original);
int pLen = strlen(pattern);

if (oLen != pLen) {
return 0;
}

chunks = oLen/CHAR_BIT + ((oLen % CHAR_BIT) != 0);

while(chunks--) {
int chunkSize = strlen(original);
if (chunkSize CHAR_BIT) {
chunkSize = CHAR_BIT;
}
if (!chunkMatches(original,pattern,*masks,fixed,chunk Size)){
return 0;
}
original += chunkSize;
pattern += chunkSize;
masks += 1;
}

return 1;
}

int chunkMatches(char *original,char *pattern, unsigned char mask,char
fixed,int chunkSize) {
unsigned char currentMask = 1<<(CHAR_BIT - 1);

while(chunkSize--) {
if (*original == *pattern) {
;
} else if ((currentMask & mask) && (*original == fixed)) {
;
} else {
return 0;
}
original++;
pattern++;
currentMask >>= 1;
}
}

May 25 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.