473,834 Members | 1,912 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Q: Recoding some masking routine

Hi,

Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDE F123456789ABCDE F" == "123456789ABCDE F123456789ABCDE F",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDE F123456789ABCDE F" == "123456789ABCDE F1234x6789ABCDE x".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.

Thank-you.

May 23 '07 #1
9 1430
What is the relationship between x and the mask?

May 23 '07 #2
Quentin Godfroy <qu************ *@hotmail.comwr ites:
What is the relationship between x and the mask?
Imagine reading the above without having seen the previous article.

Please provide context when you post a followup.
See <http://cfaj.freeshell. org/google/>.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
May 24 '07 #3
On 23 May 2007 16:35:34 -0700, Quentin Godfroy
<qu************ *@hotmail.comwr ote:
>What is the relationship between x and the mask?
if that byte is masked, the field can either be it's original value
or one other pre-defined character
so string to compare against = "12345678", mask = 2 (00000010),
pre-defined char = "Z"

Strings to compare:-
"12345678" = match
"123456Z8" = match

"12345Z78" = NO match
"12345578" = NO match

Hope that makes it a little clearer.

May 24 '07 #4
On 24 May, 00:09, f...@off.spamme rs.org wrote:
Hi,

Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDE F123456789ABCDE F" == "123456789ABCDE F123456789ABCDE F",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDE F123456789ABCDE F" == "123456789ABCDE F1234x6789ABCDE x".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.
As far as I understand the required behaviour, you have a string
("original") of up to 80 bytes which you are examining, and another
string ("pattern") of the same size (?) against which you are
comparing it. In addition to simple character equality, you need to
test bytes at specific locations - bytes at these locations in
"original" can either match the corresponding byte in "pattern" or
match a single specific other byte. Locations for this dual matching
are specified by bit masks in an additional byte array, with one byte
in the array corresponding to 8 bytes in the original string. Is that
correct so far?

My natural instinct here is to split the matching down into 8-byte
"chunks". This was fairly easy to do and IMHO didn't seem hugely ugly
(I'm not going to post my code yet, as it's far from clear that I've
understood the requirement). How ugly is your code? Could you post a
small self-contained example to show what you've currently tried?

I haven't measured speed in my case. Why do you feel it's slow in
yours? What measures have you used? Have you profiled to the code to
see where the time is going?
May 24 '07 #5
fe**@off.spamme rs.org writes:
Hi,

Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDE F123456789ABCDE F" == "123456789ABCDE F123456789ABCDE F",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDE F123456789ABCDE F" == "123456789ABCDE F1234x6789ABCDE x".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.
It would help more if you posted what *you* have. We could then
decide if we see anything better. Anyway, in case it help, I'd do it
like this in the first instance:

#include <limits.h>

#define BIT_SET(mask, x) ((mask)[(x) / CHAR_BIT] & (1 << (x) % CHAR_BIT))

int quasi_match(con st char *s1, const char *s2,
const unsigned char *mask, char optional)
{
int i = 0;
while (s1[i] && (s1[i] == s2[i] || BIT_SET(mask, i) && s2[i] == optional))
i += 1;
return s1[i] == s2[i];
}

The meaning (numbering) of the mask bits can be changed to reflect
your usage simply by altering the arithmetic in the macro.

There is quite a lot of arithmetic in the macro but:
(a) the compiler should make /CHAR_BIT and %CHAR_BIT fast on most
machines;
(b) it is only used when the characters *don't* match.

--
Ben.
May 24 '07 #6
fe**@off.spamme rs.org wrote:
>
Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCDE F123456789ABCDE F" == "123456789ABCDE F123456789ABCDE F",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCDE F123456789ABCDE F" == "123456789ABCDE F1234x6789ABCDE x".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.
The optimum will probably depend on your data. I suggest an
initial check with strcmp. If that shows equal, no more fuss. You
might want to write your own str_cmp, and have it return a pointer
to the unequal location. You can then proceed accordingly.

--
If you want to post a followup via groups.google.c om, ensure
you quote enough for the article to make sense. Google is only
an interface to Usenet; it's not Usenet itself. Don't assume
your readers can, or ever will, see any previous articles.
More details at: <http://cfaj.freeshell. org/google/>

--
Posted via a free Usenet account from http://www.teranews.com

May 24 '07 #7
On 24 May 2007 02:32:24 -0700, ma**********@po box.com wrote:
>On 24 May, 00:09, f...@off.spamme rs.org wrote:
>Hi,

Got a query, don't know if anyone would be kind enough to help...

I have to either match a string exactly eg
"123456789ABCD EF123456789ABCD EF" == "123456789ABCDE F123456789ABCDE F",
or if a packed byte bit mask is set mask1 = 1, mask2 = 4,
"123456789ABCD EF123456789ABCD EF" == "123456789ABCDE F1234x6789ABCDE x".
The mask means that the character at x can match either the original
number, or a set character ('x'). The original string can be up to 80
chars long with 10 8-bit bytes containg the masks.

I have done this, by a plodding for loop with lots of ifs, alas it's
untidy and slow. Any pointers to tidy & speed would be a great help.

As far as I understand the required behaviour, you have a string
("original") of up to 80 bytes
No, can be up to 40, but actually shorter (37) (5 mask bytes * 8 bits
= 40 'masks').
which you are examining, and another
string ("pattern") of the same size (?) against which you are
comparing it. In addition to simple character equality, you need to
test bytes at specific locations - bytes at these locations in
"original" can either match the corresponding byte in "pattern" or
match a single specific other byte. Locations for this dual matching
are specified by bit masks in an additional byte array, with one byte
in the array corresponding to 8 bytes in the original string. Is that
correct so far?
100% correct. Either I'm good at explaining (*grin*) or you can read
between the lines (more likely *sigh*).
>
My natural instinct here is to split the matching down into 8-byte
"chunks". This was fairly easy to do and IMHO didn't seem hugely ugly
(I'm not going to post my code yet, as it's far from clear that I've
understood the requirement). How ugly is your code?
So ugly the code wears a mask!
>Could you post a small self-contained example to show what you've currently tried?
Of course I can:-
//str1 is the string to compare
//str2 is the const comparison string,
//it has six extra bytes, 5 for the masks and 1 that is the actual
value we need (Action)
//DEF_STR_LEN is the length of str2
//Z is the 'mask' character so the strings can match, or if mask
applies can match to original or the 'Z'.
//Like I said, loads of if's and just looks a mess to me (And I wrote
it!). I HAVEN'T profiled it yet (dreading it!),
//but this is called frequently so the fastest it can be (within
reason).

if (!strncmp((char *)str1,
(char *)str2,
(DEF_STR_LEN - 6))
) Action = str2[(DEF_STR_LEN - 1)];
else if (groupMask)
{
j = 0;
do
{
//Frig for strs that aren't a multiple of 8.
if (j == 4) l = 4; else l = 7;
{
k = 0;
do
{
if (*(str1+(j*8)+k ) == str2[((j*8) +
k)]) match = 1;
else if (str2[(DEF_STR_LEN - 2 + j)]
<< k & 0x80) if(*(str1+(j*8) +k) == 'Z') match = 1;
else match = 0;
} while ((k++ < l) && match);
}
} while ((++j < 5) && match);
if (match) action = str2[(DEF_STR_LEN - 1)];
}
>I haven't measured speed in my case. Why do you feel it's slow in
yours? What measures have you used? Have you profiled to the code to
see where the time is going?
Not profiled (yet), but I can't see how it's going to wizz!

Kind Regards.
May 24 '07 #8
On Thu, 24 May 2007 17:31:42 GMT, fe**@off.spamme rs.org wrote:

SNIP
>
if (!strncmp((char *)str1,
(char *)str2,
(DEF_STR_LEN - 6))
) Action = str2[(DEF_STR_LEN - 1)];
else if (groupMask)
{
j = 0;
do
{
//Frig for strs that aren't a multiple of 8.
if (j == 4) l = 4; else l = 7;
{
k = 0;
do
{
if (*(str1+(j*8)+k ) == str2[((j*8) +
k)]) match = 1;
else if (str2[(DEF_STR_LEN - 2 + j)]
<< k & 0x80) if(*(str1+(j*8) +k) == 'Z') match = 1;
Oooops, cut & paste! That should read:-
else if ((str2[(DEF_STR_LEN - 2 + j)] << k & 0x80) && (*(str1+(j*8)+k )
== 'Z')) match = 1;
> else match = 0;
} while ((k++ < l) && match);
}
} while ((++j < 5) && match);
if (match) action = str2[(DEF_STR_LEN - 1)];
}
SNIP
May 24 '07 #9
Here's my off-the-cuff version. I like Ben's version for terseness,
but I wanted to produce something which expressed my understanding of
the problem - which is that the masks naturally split the original and
pattern strings into chunks.

I'd start with this, or something like it, and then look at profiling
it to find out where time was being eaten and focus on how that could
be trimmed.

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int matches(char *original,char *pattern, unsigned char *masks,char
fixed);
int chunkMatches(ch ar *original,char *pattern, unsigned char mask,char
fixed,int chunkSize);

int main(void) {
/* One good and 2 bad strings */
char *original[] = {
"12Z456789ZBCDE F123456789ABCDE F",
"12Z456789ZBCDE F123X56789ABCDE F",
"12Z456789XBCDE F123456789ABCDE F",
NULL
};
char *pattern = "123456789ABCDE F123456789ABCDE F";

/* allow the "fixed" value in char 3 of first 8 byte chunk
char 2 of second 8 byte chunk, but not in the third */
unsigned char masks[3]= { 1 << 5, 1 << 6, 0 };
char fixed = 'Z';
char **o;
for(o=original; *o != NULL;o++) {
printf("[%s] %s match [%s](%c)\n",
*o,matches(*o,p attern,masks,'Z ')?"did":"did
not",pattern,fi xed);
}
}

int matches(char *original,char *pattern, unsigned char *masks,char
fixed) {
int chunks;
int oLen = strlen(original );
int pLen = strlen(pattern) ;

if (oLen != pLen) {
return 0;
}

chunks = oLen/CHAR_BIT + ((oLen % CHAR_BIT) != 0);

while(chunks--) {
int chunkSize = strlen(original );
if (chunkSize CHAR_BIT) {
chunkSize = CHAR_BIT;
}
if (!chunkMatches( original,patter n,*masks,fixed, chunkSize)){
return 0;
}
original += chunkSize;
pattern += chunkSize;
masks += 1;
}

return 1;
}

int chunkMatches(ch ar *original,char *pattern, unsigned char mask,char
fixed,int chunkSize) {
unsigned char currentMask = 1<<(CHAR_BIT - 1);

while(chunkSize--) {
if (*original == *pattern) {
;
} else if ((currentMask & mask) && (*original == fixed)) {
;
} else {
return 0;
}
original++;
pattern++;
currentMask >>= 1;
}
}

May 25 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2290
by: Piotr | last post by:
Please help me, I can't solve this problem: I have domain: www.main_domain.com/index.php?userid=23&parametr1=2&parametr2=3 (and other site files like www.main_domain.com/page1.php?userid=23&otherparameter=32 www.main_domain.com/page2.php?userid=23&otherparameter=11 etc.) At this address there is an template site for many users. Personal site is identified by userid.
2
3961
by: Ginchy | last post by:
I have uploaded a small 3 page web using MS Publisher 2003 and after uploading I switched on url masking to cloak the url. I am certain that it worked fine. I simply changed the colour scheme in Publisher and reuploaded the page. The links no longer work? When I switch off url masking it works as it should?
6
1998
by: Stefan Kowalski | last post by:
Hi Newsgroup Does anyone here care to share their thoughts/experiences with using masking? I am storing categories of books which libraries stock, so that I might have codes: 015-000-000 = Languages 015-001-000 = European languages 015-001-001 = English 015-001-002 = German 015-001-003 = French
1
3528
by: john sutor | last post by:
Is there a way in C# to add masking (for a date format) to a textbox?
0
1399
by: Dana Epp | last post by:
I have a ToolBarButton that when I set it to disabled (button.Enabled = false;) causes a really ugly gray masking effect to take place. This is normal and the intended way of the button, but I would like to clean it up. Instead of using the gray mask, I want to use my own grayscaled high quality image. I thought by simply setting button.ImageIndex = new_num; (where new_num is the index of the grayscale image) I could do this. But alas, that...
1
5246
by: Brian Keating EI9FXB | last post by:
Hello there, Does anyone know how to to Bitmap masking in .NET, In c++ i used, CreateCompatibleDC GetBitmap Then on the bitmap and the bitmap mask i called. {SelectBitmap StretchBlt}
1
3106
by: NBB | last post by:
I can't figure this one out. Here's the situation, should be pretty for the pros in here. I have a ListBox that is populated with the DataTextField and DataValueField flags of the DataSet. Everything populates just fine and works great. The DataTextField displays text from a database column called Path, which is the absolute URL of a file. What I would like it to display is just the filename, essentially trimming off everything...
3
5228
by: Hai Nguyen | last post by:
Hi All I have a textbox want to mask as phone box. How can I do it? Thanks
9
2276
by: Daniel Smedegaard Buus | last post by:
Hey all :) I was wondering about the $error_types (I particularly notice the 's' suffix when reading the manual) parameter for 'set_error_handler()': Can be used to mask the triggering of the error_handler function just like the error_reporting ini setting controls which errors are shown. Without this mask set the error_handler will be called for every error regardless to the setting of the error_reporting setting.
2
5334
by: meghla | last post by:
I can not understand how to implement bit masking on a c code?? while working with array it is taking too much time.................. so i need help. i need some suggestion and sample code of C dealing with bit masking..............; its argent............. plz help me.
0
9651
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10802
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10516
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10225
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7763
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6961
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5630
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5802
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
3085
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.