473,799 Members | 3,052 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Finding a substring in a binary string

Hi All,

I need to find a particular substring in a binary string(some text
appended & prepended to binary data).
I cant use strstr since it terminates on receiving '\0'which can be
there in binary data also I cant use memmem. Is there any other
available function to do this.

Regards
Tarun

Nov 15 '05 #1
6 7577
> I need to find a particular substring in a binary string(some text
appended & prepended to binary data).
I cant use strstr since it terminates on receiving '\0'which can be
there in binary data also I cant use memmem. Is there any other
available function to do this.


I assume you don't want to use memmem since it's a gnu extension and thus
would make the resulting code less portable.

If this is case, and if nothing from the standard library fits the bill,
why not just write your own function to do the comparison?

e.g. something like:

/* binstr - searches a block of memory for a given character string
*
* params:
* bin - pointer to region to be searched
* bin_sz - size of region in bytes
* str - null terminated character string to search for
*
* returns: 1 if the string is found, 0 otherwise
*/

int binstr(const void *bin, int bin_sz, const char *str) {
const char *c_bin; int bin_i, str_i; c_bin = bin;
for (bin_i = 0; bin_i < bin_sz; bin_i++) {
for (str_i = 0; c_bin[bin_i+str_i] == str[str_i] && str[str_i]; str_i++);
if (!str[str_i]) return 1;
}
return 0;
}

-Dan

Nov 15 '05 #2
> e.g. something like:
[...]
int binstr(const void *bin, int bin_sz, const char *str) {
const char *c_bin; int bin_i, str_i; c_bin = bin;
for (bin_i = 0; bin_i < bin_sz; bin_i++) {
for (str_i = 0; c_bin[bin_i+str_i] == str[str_i] && str[str_i];
str_i++); if (!str[str_i]) return 1;
}
return 0;
}


that is higly uneffective if the searched pattern is long : complexity of
such an algorithm is O(length(bin) * length(str)) where the KMP algorithm
is O(length(bin)+l ength(str)).

Here is a link I found using google :
http://www-igm.univ-mlv.fr/~lecroq/string/node8.html

The C code seems reasonnable even if the naming of the variables is stupid.
--
·O· Pierre Habouzit
··O ma******@debian .org
OOO http://www.madism.org
Nov 15 '05 #3
"Daniel Cer" <ce*@sdf.lonest ar.org> wrote in message
news:Pi******** *************** ********@otaku. freeshell.org.. .
I need to find a particular substring in a binary string(some text
appended & prepended to binary data).
I cant use strstr since it terminates on receiving '\0'which can be
there in binary data also I cant use memmem. Is there any other
available function to do this.

I assume you don't want to use memmem since it's a gnu extension and

thus would make the resulting code less portable.

If this is case, and if nothing from the standard library fits the bill, why not just write your own function to do the comparison?

e.g. something like:

/* binstr - searches a block of memory for a given character string
*
* params:
* bin - pointer to region to be searched
* bin_sz - size of region in bytes
* str - null terminated character string to search for
*
* returns: 1 if the string is found, 0 otherwise
*/

int binstr(const void *bin, int bin_sz, const char *str) {
const char *c_bin; int bin_i, str_i; c_bin = bin;
for (bin_i = 0; bin_i < bin_sz; bin_i++) {
for (str_i = 0; c_bin[bin_i+str_i] == str[str_i] && str[str_i]; str_i++); if (!str[str_i]) return 1;
}
return 0;
}


You should just go to the local high school and set up a booth with a
sign "I'll do your homework for free!"
Then you'll be really kewl.

--
Mabden

Nov 15 '05 #4
Tarun wrote:

I need to find a particular substring in a binary string(some text
appended & prepended to binary data).
I cant use strstr since it terminates on receiving '\0'which can be
there in binary data also I cant use memmem. Is there any other
available function to do this.
Here is some code from the archives that I wrote last year.

--------------- cut binfsrch.c ----------------
/*
Leor Zolman wrote: On 25 Feb 2004 07:34:40 -0800, jo**@ljungh.se (spike) wrote:
Im trying to write a program that should read through a binary
file searching for the character sequence "\name\"

Then it should read the characters following the "\name\"
sequence until a NULL character is encountered.

But when my program runs it gets a SIGSEGV (Segmentation
vioalation) signal.

Whats wrong? And is there a better way than mine to solve
this task (most likely)


I think so. Here's a version I just threw together:

*/

/* And heres another throw -- binfsrch.c by CBF */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>

/* The difference between a binary and a text file, on read,
is the conversion of end-of-line delimiters. What those
delimiters are does not affect the action. In some cases
the presence of 0x1a EOF markers (MsDos) does.

This is a version of Knuth-Morris-Pratt algorithm. The
point of using this is to avoid any backtracking in file
reading, and thus avoiding any use of buffer arrays.
*/

size_t chrcount; /* debuggery, count of input chars, zeroed */

/* --------------------- */

/* Almost straight out of Sedgewick */
/* The next array indicates what index in id should next be
compared to the current char. Once the (lgh - 1)th char
has been successfully compared, the id has been found.
The array is formed by comparing id to itself. */
void initnext(int *next, const char *id, int lgh)
{
int i, j;

assert(lgh > 0);
next[0] = -1; i = 0; j = -1;
while (i < lgh) {
while ((j >= 0) && (id[i] != id[j])) j = next[j];
i++; j++;
next[i] = j;
}
#if (0)
for (i = 0; i < lgh; i++)
printf("id[%d] = '%c' next[%d] = %d\n",
i, id[i], i, next[i]);
#endif
} /* initnext */

/* --------------------- */

/* reads f without rewinding until either EOF or *marker
has been found. Returns EOF if not found. At exit the
last matching char has been read, and no further. */
int kmpffind(const char *marker, int lgh, int *next, FILE *f)
{
int j; /* char position in marker to check */
int ch; /* current char */

assert(lgh > 0);
j = 0;
while ((j < lgh) && (EOF != (ch = getc(f)))) {
chrcount++;
while ((j >= 0) && (ch != marker[j])) j = next[j];
j++;
}
return ch;
} /* kmpffind */

/* --------------------- */

/* Find marker in f, display following printing chars
up to some non printing character or EOF */
int binfsrch(const char *marker, FILE *f)
{
int *next;
int lgh;
int ch;
int items; /* count of markers found */

lgh = strlen(marker);
if (!(next = malloc(lgh * sizeof *next))) {
puts("No memory");
exit(EXIT_FAILU RE);
}
else {
initnext(next, marker, lgh);
items = 0;
while (EOF != kmpffind(marker , lgh, next, f)) {
/* found, take appropriate action */
items++;
printf("%d %s : \"", items, marker);
while (isprint(ch = getc(f))) {
chrcount++;
putchar(ch);
}
puts("\"");
if (EOF == ch) break;
else chrcount++;
}
free(next);
return items;
}
} /* binfsrch */

/* --------------------- */

int main(int argc, char **argv)
{
FILE *f;

f = stdin;
if (3 == argc) {
if (!(f = fopen(argv[2], "rb"))) {
printf("Can't open %s\n", argv[2]);
exit(EXIT_FAILU RE);
}
argc--;
}
if (2 != argc) {
puts("Usage: binfsrch name [binaryfile]");
puts(" (file defaults to stdin text mode)");
}
else if (binfsrch(argv[1], f)) {
printf("\"%s\" : found\n", argv[1]);
}
else printf("\"%s\" : not found\n", argv[1]);
printf("%lu chars\n", (unsigned long)chrcount);
return 0;
} /* main binfsrch */

--
"If you want to post a followup via groups.google.c om, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson

Nov 15 '05 #5
> that is higly uneffective if the searched pattern is long : complexity of
such an algorithm is O(length(bin) * length(str)) where the KMP algorithm
is O(length(bin)+l ength(str)).

Here is a link I found using google :
http://www-igm.univ-mlv.fr/~lecroq/string/node8.html


For many practical string matching problems, doesn't KMP not perform all
that much better than the (previously given) brute force approach?

For something that's typically much better than both of these,
you might want to check out the Boyer-Moore algorithm. links:

http://www.cs.utexas.edu/users/moore...ing/index.html

http://en.wikipedia.org/wiki/Boyer-M...hing_algorithm

-Dan
Nov 15 '05 #6
Daniel Cer wrote:
.... snip ...
For many practical string matching problems, doesn't KMP not
perform all that much better than the (previously given) brute
force approach?


True. However the brute force method requires backtracking in the
searched stream. KMP totally avoids that, so that you can find a
match immediately without using any buffers.

Earlier today I published, in this thread, a demonstration
program. It can, for example, search input from an unbuffered
serial input. Modifications of the technique can find the longest
match, which can be useful in compression software.

The Boyer-Moore technique requires the entire searched area to be
buffered. When that is feasible it can offer serious run-time
improvement.

--
"If you want to post a followup via groups.google.c om, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson

Nov 15 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
15269
by: yaipa | last post by:
What would be the common sense way of finding a binary pattern in a ..bin file, say some 200 bytes, and replacing it with an updated pattern of the same length at the same offset? Also, the pattern can occur on any byte boundary in the file, so chunking through the code at 16 bytes a frame maybe a problem. The file itself isn't so large, maybe 32 kbytes is all and the need for speed is not so great, but the need for accuracy in the...
4
6573
by: Victor Engmark | last post by:
When looking for a method to fetch unique elements and counting the number of occurences of each of them, I found quite a lot of gross examples of complex XSL. But after realizing the subtle difference between "." and "current()", I found a neat way of doing the same without keys or generate-id(): <xsl:template match="/"> <!-- Selects all "new" elements --> <xsl:for-each select="//Name"> <!-- Display the element -->
5
6252
by: giampiero mu | last post by:
hi everyone my target is implement a function controlla(string - a binary string-) that check if there is in a string two equal not overlapping subsequences at least of length limitem: my code: def controlla(test): # print test
3
19221
by: vaughn | last post by:
Is there a function that'll let me find a substring of a fixed size? in other words, I'd like to get a substring in position X and length Y from a string. Left or Right wouldn't work since the substring's usually in the middle. Thanks.
3
1308
by: kingnl | last post by:
I am a new programmer and I have a file that I need to find the position of a certain set of characters? Can anyone help me? I also need to be able to split the file at that point and write each part to a temp file. Any help would be appreciated. Thanks Nick
2
3179
by: Badass Scotsman | last post by:
Hello, Using VB and ASP,NET I would like to be able to search a STRING for a smaller STRING within, based on the characters which appear before and after. For example: String1 = " That was a tasty burger"
2
2002
by: Extremest | last post by:
Here is the code I have so far. It connects to a db and grabs headers. It then sorts them into groups and then puts all the complete ones into another table. Problem I am having is that for some reason now it is not finding ones that are single posts. Here is an example of a header for a single. (Ask the Dust ) - "atd-ftc-repack.nfo" www.ctjes.com (1/1) (1/1) at the end means it is part 1 of a 1 part post. Any help would be...
14
2820
by: prasadjoshi124 | last post by:
Hi All, I am writing a small tool which is supposed to fill the filesystem to a specified percent. For, that I need to read how much the file system is full in percent, like the output given by df -k lopgod10:~/mycrfile # df -k /mnt/mvdg1/vset Filesystem 1K-blocks Used Available Use% Mounted on
2
2179
by: jewel87 | last post by:
Hello, I've got a problem and would appreciate any help. I have to count the number of integers in a string like this "1.1 some text. 1.2 some text", where it should return 0, as there are no integers, but I still get 2 on the output. Here is the code: void IntegerNumber() { int NumberOfIntegers = 0; string SubString; int flag;
0
9687
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9541
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10482
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10225
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10027
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9072
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7564
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5463
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
3
2938
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.