473,383 Members | 1,739 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Q: Recoding some masking routine

Hi,

Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF123456789ABCDEF",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF1234x6789ABCDEx".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.

Thank-you.

May 23 '07 #1
9 1414
What is the relationship between x and the mask?

May 23 '07 #2
Quentin Godfroy <qu*************@hotmail.comwrites:
What is the relationship between x and the mask?
Imagine reading the above without having seen the previous article.

Please provide context when you post a followup.
See <http://cfaj.freeshell.org/google/>.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
May 24 '07 #3
On 23 May 2007 16:35:34 -0700, Quentin Godfroy
<qu*************@hotmail.comwrote:
>What is the relationship between x and the mask?
if that byte is masked, the field can either be it's original value
or one other pre-defined character
so string to compare against = "12345678", mask = 2 (00000010),
pre-defined char = "Z"

Strings to compare:-
"12345678" = match
"123456Z8" = match

"12345Z78" = NO match
"12345578" = NO match

Hope that makes it a little clearer.

May 24 '07 #4
On 24 May, 00:09, f...@off.spammers.org wrote:
Hi,

Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF123456789ABCDEF",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF1234x6789ABCDEx".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.
As far as I understand the required behaviour, you have a string
("original") of up to 80 bytes which you are examining, and another
string ("pattern") of the same size (?) against which you are
comparing it. In addition to simple character equality, you need to
test bytes at specific locations - bytes at these locations in
"original" can either match the corresponding byte in "pattern" or
match a single specific other byte. Locations for this dual matching
are specified by bit masks in an additional byte array, with one byte
in the array corresponding to 8 bytes in the original string. Is that
correct so far?

My natural instinct here is to split the matching down into 8-byte
"chunks". This was fairly easy to do and IMHO didn't seem hugely ugly
(I'm not going to post my code yet, as it's far from clear that I've
understood the requirement). How ugly is your code? Could you post a
small self-contained example to show what you've currently tried?

I haven't measured speed in my case. Why do you feel it's slow in
yours? What measures have you used? Have you profiled to the code to
see where the time is going?
May 24 '07 #5
fe**@off.spammers.org writes:
Hi,

Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF123456789ABCDEF",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF1234x6789ABCDEx".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.
It would help more if you posted what *you* have. We could then
decide if we see anything better. Anyway, in case it help, I'd do it
like this in the first instance:

#include <limits.h>

#define BIT_SET(mask, x) ((mask)[(x) / CHAR_BIT] & (1 << (x) % CHAR_BIT))

int quasi_match(const char *s1, const char *s2,
const unsigned char *mask, char optional)
{
int i = 0;
while (s1[i] && (s1[i] == s2[i] || BIT_SET(mask, i) && s2[i] == optional))
i += 1;
return s1[i] == s2[i];
}

The meaning (numbering) of the mask bits can be changed to reflect
your usage simply by altering the arithmetic in the macro.

There is quite a lot of arithmetic in the macro but:
(a) the compiler should make /CHAR_BIT and %CHAR_BIT fast on most
machines;
(b) it is only used when the characters *don't* match.

--
Ben.
May 24 '07 #6
fe**@off.spammers.org wrote:
>
Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF123456789ABCDEF",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF1234x6789ABCDEx".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.
The optimum will probably depend on your data. I suggest an
initial check with strcmp. If that shows equal, no more fuss. You
might want to write your own str_cmp, and have it return a pointer
to the unequal location. You can then proceed accordingly.

--
If you want to post a followup via groups.google.com, ensure
you quote enough for the article to make sense. Google is only
an interface to Usenet; it's not Usenet itself. Don't assume
your readers can, or ever will, see any previous articles.
More details at: <http://cfaj.freeshell.org/google/>

--
Posted via a free Usenet account from http://www.teranews.com

May 24 '07 #7
On 24 May 2007 02:32:24 -0700, ma**********@pobox.com wrote:
>On 24 May, 00:09, f...@off.spammers.org wrote:
>Hi,

Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF123456789ABCDEF",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDEF123456789ABCDEF" == "123456789ABCDEF1234x6789ABCDEx".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.

As far as I understand the required behaviour, you have a string
("original") of up to 80 bytes
No, can be up to 40, but actually shorter (37) (5 mask bytes * 8 bits
= 40 'masks').
which you are examining, and another
string ("pattern") of the same size (?) against which you are
comparing it. In addition to simple character equality, you need to
test bytes at specific locations - bytes at these locations in
"original" can either match the corresponding byte in "pattern" or
match a single specific other byte. Locations for this dual matching
are specified by bit masks in an additional byte array, with one byte
in the array corresponding to 8 bytes in the original string. Is that
correct so far?
100% correct. Either I'm good at explaining (*grin*) or you can read
between the lines (more likely *sigh*).
>
My natural instinct here is to split the matching down into 8-byte
"chunks". This was fairly easy to do and IMHO didn't seem hugely ugly
(I'm not going to post my code yet, as it's far from clear that I've
understood the requirement). How ugly is your code?
So ugly the code wears a mask!
>Could you post a small self-contained example to show what you've currently tried?
Of course I can:-
//str1 is the string to compare
//str2 is the const comparison string,
//it has six extra bytes, 5 for the masks and 1 that is the actual
value we need (Action)
//DEF_STR_LEN is the length of str2
//Z is the 'mask' character so the strings can match, or if mask
applies can match to original or the 'Z'.
//Like I said, loads of if's and just looks a mess to me (And I wrote
it!). I HAVEN'T profiled it yet (dreading it!),
//but this is called frequently so the fastest it can be (within
reason).

if (!strncmp((char *)str1,
(char *)str2,
(DEF_STR_LEN - 6))
) Action = str2[(DEF_STR_LEN - 1)];
else if (groupMask)
{
j = 0;
do
{
//Frig for strs that aren't a multiple of 8.
if (j == 4) l = 4; else l = 7;
{
k = 0;
do
{
if (*(str1+(j*8)+k) == str2[((j*8) +
k)]) match = 1;
else if (str2[(DEF_STR_LEN - 2 + j)]
<< k & 0x80) if(*(str1+(j*8)+k) == 'Z') match = 1;
else match = 0;
} while ((k++ < l) && match);
}
} while ((++j < 5) && match);
if (match) action = str2[(DEF_STR_LEN - 1)];
}
>I haven't measured speed in my case. Why do you feel it's slow in
yours? What measures have you used? Have you profiled to the code to
see where the time is going?
Not profiled (yet), but I can't see how it's going to wizz!

Kind Regards.
May 24 '07 #8
On Thu, 24 May 2007 17:31:42 GMT, fe**@off.spammers.org wrote:

SNIP
>
if (!strncmp((char *)str1,
(char *)str2,
(DEF_STR_LEN - 6))
) Action = str2[(DEF_STR_LEN - 1)];
else if (groupMask)
{
j = 0;
do
{
//Frig for strs that aren't a multiple of 8.
if (j == 4) l = 4; else l = 7;
{
k = 0;
do
{
if (*(str1+(j*8)+k) == str2[((j*8) +
k)]) match = 1;
else if (str2[(DEF_STR_LEN - 2 + j)]
<< k & 0x80) if(*(str1+(j*8)+k) == 'Z') match = 1;
Oooops, cut & paste! That should read:-
else if ((str2[(DEF_STR_LEN - 2 + j)] << k & 0x80) && (*(str1+(j*8)+k)
== 'Z')) match = 1;
> else match = 0;
} while ((k++ < l) && match);
}
} while ((++j < 5) && match);
if (match) action = str2[(DEF_STR_LEN - 1)];
}
SNIP
May 24 '07 #9
Here's my off-the-cuff version. I like Ben's version for terseness,
but I wanted to produce something which expressed my understanding of
the problem - which is that the masks naturally split the original and
pattern strings into chunks.

I'd start with this, or something like it, and then look at profiling
it to find out where time was being eaten and focus on how that could
be trimmed.

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int matches(char *original,char *pattern, unsigned char *masks,char
fixed);
int chunkMatches(char *original,char *pattern, unsigned char mask,char
fixed,int chunkSize);

int main(void) {
/* One good and 2 bad strings */
char *original[] = {
"12Z456789ZBCDEF123456789ABCDEF",
"12Z456789ZBCDEF123X56789ABCDEF",
"12Z456789XBCDEF123456789ABCDEF",
NULL
};
char *pattern = "123456789ABCDEF123456789ABCDEF";

/* allow the "fixed" value in char 3 of first 8 byte chunk
char 2 of second 8 byte chunk, but not in the third */
unsigned char masks[3]= { 1 << 5, 1 << 6, 0 };
char fixed = 'Z';
char **o;
for(o=original;*o != NULL;o++) {
printf("[%s] %s match [%s](%c)\n",
*o,matches(*o,pattern,masks,'Z')?"did":"did
not",pattern,fixed);
}
}

int matches(char *original,char *pattern, unsigned char *masks,char
fixed) {
int chunks;
int oLen = strlen(original);
int pLen = strlen(pattern);

if (oLen != pLen) {
return 0;
}

chunks = oLen/CHAR_BIT + ((oLen % CHAR_BIT) != 0);

while(chunks--) {
int chunkSize = strlen(original);
if (chunkSize CHAR_BIT) {
chunkSize = CHAR_BIT;
}
if (!chunkMatches(original,pattern,*masks,fixed,chunk Size)){
return 0;
}
original += chunkSize;
pattern += chunkSize;
masks += 1;
}

return 1;
}

int chunkMatches(char *original,char *pattern, unsigned char mask,char
fixed,int chunkSize) {
unsigned char currentMask = 1<<(CHAR_BIT - 1);

while(chunkSize--) {
if (*original == *pattern) {
;
} else if ((currentMask & mask) && (*original == fixed)) {
;
} else {
return 0;
}
original++;
pattern++;
currentMask >>= 1;
}
}

May 25 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Piotr | last post by:
Please help me, I can't solve this problem: I have domain: www.main_domain.com/index.php?userid=23&parametr1=2&parametr2=3 (and other site files like...
2
by: Ginchy | last post by:
I have uploaded a small 3 page web using MS Publisher 2003 and after uploading I switched on url masking to cloak the url. I am certain that it worked fine. I simply changed the colour scheme...
6
by: Stefan Kowalski | last post by:
Hi Newsgroup Does anyone here care to share their thoughts/experiences with using masking? I am storing categories of books which libraries stock, so that I might have codes: 015-000-000 =...
1
by: john sutor | last post by:
Is there a way in C# to add masking (for a date format) to a textbox?
0
by: Dana Epp | last post by:
I have a ToolBarButton that when I set it to disabled (button.Enabled = false;) causes a really ugly gray masking effect to take place. This is normal and the intended way of the button, but I...
1
by: Brian Keating EI9FXB | last post by:
Hello there, Does anyone know how to to Bitmap masking in .NET, In c++ i used, CreateCompatibleDC GetBitmap Then on the bitmap and the bitmap mask i called. {SelectBitmap StretchBlt}
1
by: NBB | last post by:
I can't figure this one out. Here's the situation, should be pretty for the pros in here. I have a ListBox that is populated with the DataTextField and DataValueField flags of the DataSet....
3
by: Hai Nguyen | last post by:
Hi All I have a textbox want to mask as phone box. How can I do it? Thanks
9
by: Daniel Smedegaard Buus | last post by:
Hey all :) I was wondering about the $error_types (I particularly notice the 's' suffix when reading the manual) parameter for 'set_error_handler()': Can be used to mask the triggering of the...
2
by: meghla | last post by:
I can not understand how to implement bit masking on a c code?? while working with array it is taking too much time.................. so i need help. i need some suggestion and sample code of C...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.