473,396 Members | 2,017 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

How to find a string in a stream of binary data?


Hello

My code below opens a Word document in binary mode and places the data into
a buffer. I then want to search this buffer for a string. I tried using
strstr but think it stops looking when it reaches first null character or
some control character in data. What C function should I use to be able to
search in a BYTE data buffer?
Code:

#include <stdio.h>

char szPath[MAX_PATH] = "";
strcpy(szPath, "E:\\MyPath\\ahl.doc");
FILE* stream;

FILE *file = fopen(szPath, "rb"); // Open the file
fseek(file, 0, SEEK_END); // Seek to the end
long file_size = ftell(file); // Get the current position
rewind (file); // rewind to start of file

// allocate memory to contain the whole file.
BYTE* byBuffer = (BYTE*) malloc (file_size);
// if (buffer == NULL) exit (2);

// copy the file into the buffer.
fread (byBuffer,1,file_size, file);

const char* szFind = "selected";
//strstr(StrToLookIn, StrToFind);
char* szResult = strstr((char*)byBuffer, szFind);

fclose(file); // Close the file

free( byBuffer );
Angus Comber
an***@NOSPAMiteloffice.com

Nov 14 '05 #1
10 14489
Angus Comber wrote:

My code below opens a Word document in binary mode and places the data into
a buffer. I then want to search this buffer for a string. I tried using
strstr but think it stops looking when it reaches first null character or
some control character in data. What C function should I use to be able to
search in a BYTE data buffer?

Code:

#include <stdio.h>
You need to include more than this (e.g. stdlib.h for malloc).
char szPath[MAX_PATH] = "";
strcpy(szPath, "E:\\MyPath\\ahl.doc");
FILE* stream;

FILE *file = fopen(szPath, "rb"); // Open the file
fseek(file, 0, SEEK_END); // Seek to the end
long file_size = ftell(file); // Get the current position
Hmm, you're assuming C99, where you can use C++-style comments,
and can introduce new variables at any point in the code? For
portability, you're best to stick with C90.
rewind (file); // rewind to start of file

// allocate memory to contain the whole file.
BYTE* byBuffer = (BYTE*) malloc (file_size);
Look at the archive of this newsgroup, and you'll see many good
reasons _not_ to cast the return from malloc, if you're writing
in C.
// if (buffer == NULL) exit (2);


exit(2)?? What sort of code is that?

Well, anyway, here is an ISO/ANSI C program that (I think) will
do what you want, not necessarily with greatest efficiency.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

long find_string_in_buf (unsigned char *buf, size_t len,
const char *s)
{
long i, j;
int slen = strlen(s);
long imax = len - slen - 1;
long ret = -1;
int match;

for (i=0; i<imax; i++) {
match = 1;
for (j=0; j<slen; j++) {
if (buf[i+j] != s[j]) {
match = 0;
break;
}
}
if (match) {
ret = i;
break;
}
}

return ret;
}

int main (int argc, char **argv)
{
const char *targ = "selected";
const char *fname;
FILE *fp;
size_t file_size;
unsigned char *buf;
long loc;

if (argc < 2) {
fputs("Please supply a filename\n", stderr);
exit(EXIT_FAILURE);
}

fname = argv[1];

fp = fopen(fname, "rb");
if (fp == NULL) {
fprintf(stderr, "Couldn't open '%s'\n", fname);
exit(EXIT_FAILURE);
}

fseek(fp, 0, SEEK_END);
file_size = ftell(fp);
rewind(fp);

buf = malloc(file_size);
if (buf == NULL) {
fputs("Out of memory\n", stderr);
exit(EXIT_FAILURE);
}

fread(buf, 1, file_size, fp);

loc = find_string_in_buf(buf, file_size, targ);
if (loc < 0) {
printf("The target string '%s' was not found\n", targ);
} else {
printf("The target string '%s' was found at byte %ld\n",
targ, loc);
}

fclose(fp);
free(buf);

return 0;
}

--
Allin Cottrell
Department of Economics
Wake Forest University, NC
Nov 14 '05 #2

"Angus Comber" <an***@iteloffice.com.PLEASENOSPAM> wrote in message
news:40***********************@mercury.nildram.net ...

Hello

My code below opens a Word document in binary mode and places the data into a buffer. I then want to search this buffer for a string. I tried using
strstr but think it stops looking when it reaches first null character or
some control character in data. What C function should I use to be able to search in a BYTE data buffer?
Code:

#include <stdio.h>

char szPath[MAX_PATH] = "";
strcpy(szPath, "E:\\MyPath\\ahl.doc");
FILE* stream;

FILE *file = fopen(szPath, "rb"); // Open the file
fseek(file, 0, SEEK_END); // Seek to the end
long file_size = ftell(file); // Get the current position
rewind (file); // rewind to start of file

// allocate memory to contain the whole file.
BYTE* byBuffer = (BYTE*) malloc (file_size);
// if (buffer == NULL) exit (2);

// copy the file into the buffer.
fread (byBuffer,1,file_size, file);

const char* szFind = "selected";
//strstr(StrToLookIn, StrToFind);
char* szResult = strstr((char*)byBuffer, szFind);

fclose(file); // Close the file

free( byBuffer );
Angus Comber
an***@NOSPAMiteloffice.com

#include <stdio.h>
#include <string.h>

#define BUFFER_SIZE 200 /* adjust to your needs */
/* nz_str() */
/* */
/* behaves as 'strstr()' but handles input with */
/* embedded zero characters */
/* */
/* 'data' and 'to_find' must be zero-terminated */
char *nz_strstr(char *data, const char *to_find)
{
char *result = 0;

while(!(result = strstr(data, to_find)))
data += strlen(data) + 1;

return result;
}

int main(int argc, char **argv)
{
char buffer[BUFFER_SIZE] = "Hello world\0 this is\0 a test";
char what[] = "test";
char *p = nz_strstr(buffer, what);

printf("string '%s' ", what);

if(p)
printf("found at offset %lu\n", (unsigned long)(p - buffer));
else
printf("not found\n");
return 0;
}

-Mike

Nov 14 '05 #3
Allin Cottrell wrote:
Hmm, you're assuming C99, where you can use C++-style comments,
and can introduce new variables at any point in the code?
For portability, you're best to stick with C90.


When, if ever, would you recommend moving on to C 99?

Nov 14 '05 #4
E. Robert Tisdale wrote:
Allin Cottrell wrote:
Hmm, you're assuming C99, where you can use C++-style comments,
and can introduce new variables at any point in the code?
For portability, you're best to stick with C90.

When, if ever, would you recommend moving on to C 99?


When the more commonly used C compilers offer support for C99
that is comparable to the support they curently offer for C90.

At present, the most widely used C compilers can be made C90-
conforming, if you know the right options to use. At present,
no commonly used C compiler can be made C99-conforming, with
any combination of options.

--
Allin Cottrell
Department of Economics
Wake Forest University, NC
Nov 14 '05 #5

"Mike Wahler" <mk******@mkwahler.net> wrote in message
news:p4*******************@newsread1.news.pas.eart hlink.net...

/* nz_str() */
/* */
/* behaves as 'strstr()' but handles input with */
/* embedded zero characters */
/* */
/* 'data' and 'to_find' must be zero-terminated */
Correction: 'data' must be terminated by at least
*two* consecutive zero characters.

char *nz_strstr(char *data, const char *to_find)
{
char *result = 0;

while(!(result = strstr(data, to_find))) data += strlen(data) + 1;


... because of the "+ 1" used to step *over* the zeros.
BTW, Angus, referring to your original code, there's no such
type as 'BYTE' in C.

-Mike

Nov 14 '05 #6
Angus Comber wrote:

Hello

My code below opens a Word document in binary mode and places the data
into
a buffer.
<OT>
You might get a bit more joy out of Word docs if you do some research into
"structured storage" or "compound documents".
</OT>
I then want to search this buffer for a string. I tried using
strstr but think it stops looking when it reaches first null character or
some control character in data. What C function should I use to be able
to search in a BYTE data buffer?


Look up the Boyer-Moore search algorithm on the Net, and implement it in C.

--
Richard Heathfield : bi****@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton
Nov 14 '05 #7

"Allin Cottrell" <co******@wfu.edu> wrote in message
news:c0***********@f1n1.spenet.wfu.edu...
E. Robert Tisdale wrote:

At present, the most widely used C compilers can be made C90-
conforming, if you know the right options to use. At present,
no commonly used C compiler can be made C99-conforming, with
any combination of options.


The freely available lcc-win32 compiler implements most of C99
http://www.cs.virginia.edu/~lcc-win32
Nov 14 '05 #8
Allin Cottrell <co******@wfu.edu> wrote in message news:<c0***********@f1n1.spenet.wfu.edu>...
Angus Comber wrote:

My code below opens a Word document in binary mode and places the data into
a buffer. I then want to search this buffer for a string. I tried using
strstr but think it stops looking when it reaches first null character or
some control character in data. What C function should I use to be able to
search in a BYTE data buffer?

Code:

#include <stdio.h>


You need to include more than this (e.g. stdlib.h for malloc).

char szPath[MAX_PATH] = "";
strcpy(szPath, "E:\\MyPath\\ahl.doc");
FILE* stream;

FILE *file = fopen(szPath, "rb"); // Open the file
fseek(file, 0, SEEK_END); // Seek to the end
long file_size = ftell(file); // Get the current position


Hmm, you're assuming C99, where you can use C++-style comments,
and can introduce new variables at any point in the code? For
portability, you're best to stick with C90.
rewind (file); // rewind to start of file

// allocate memory to contain the whole file.
BYTE* byBuffer = (BYTE*) malloc (file_size);


Look at the archive of this newsgroup, and you'll see many good
reasons _not_ to cast the return from malloc, if you're writing
in C.
// if (buffer == NULL) exit (2);


exit(2)?? What sort of code is that?

Well, anyway, here is an ISO/ANSI C program that (I think) will
do what you want, not necessarily with greatest efficiency.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

long find_string_in_buf (unsigned char *buf, size_t len,
const char *s)
{
long i, j;
int slen = strlen(s);
long imax = len - slen - 1;
long ret = -1;
int match;

for (i=0; i<imax; i++) {
match = 1;
for (j=0; j<slen; j++) {
if (buf[i+j] != s[j]) {
match = 0;
break;
}
}
if (match) {
ret = i;
break;
}
}

return ret;
}

A perhaps simpler implementation is as follows. Seems like using memcmp
is a win here.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* look for the query in binary input data of indicated length */
const char *find_str_in_data(const char *data, size_t data_len,
const char *query)
{
const char *p = data;
size_t query_len = strlen(query);

while (1) {
if (data + data_len - p < query_len) {
break;
}
if (*p == query[0]) {
if (memcmp(p, query, query_len) == 0) {
return p;
}
}
p++;
}

return NULL;
}

int main()
{
const char *query = "foo";
const char data1[] = { 'a', 'b', '\0', 'f', '\0', 'f', 'o', 'o' };
const char data2[] = { 'a', 'b', '\0', 'f', '\0', 'f', 'o', 'o', 'q'};
const char data3[] = { 'f', 'o' };
const char *result;

result = find_str_in_data(data1, sizeof data1, query);
printf("query '%s'%s found in data1\n", query, result ? "" : " NOT");
result = find_str_in_data(data2, sizeof data2, query);
printf("query '%s'%s found in data2\n", query, result ? "" : " NOT");
result = find_str_in_data(data3, sizeof data3, query);
printf("query '%s'%s found in data3\n", query, result ? "" : " NOT");

return EXIT_SUCCESS;
}

-David
Nov 14 '05 #9

"Mike Wahler" <mk******@mkwahler.net> wrote in message
news:sI*******************@newsread1.news.pas.eart hlink.net...

"Mike Wahler" <mk******@mkwahler.net> wrote in message
news:p4*******************@newsread1.news.pas.eart hlink.net...

/* nz_str() */
/* */
/* behaves as 'strstr()' but handles input with */
/* embedded zero characters */
/* */
/* 'data' and 'to_find' must be zero-terminated */


Correction: 'data' must be terminated by at least
*two* consecutive zero characters.

char *nz_strstr(char *data, const char *to_find)
{
char *result = 0;

while(!(result = strstr(data, to_find)))

data += strlen(data) + 1;


.. because of the "+ 1" used to step *over* the zeros.


And *another* problem! (surprised nobody caught it):

This will have undefined behavior if the searched-for string isn't
found, it'll run off the end of the 'data' array.

It needs a 'size' parameter to check against, or at least
a 'dummy' copy of the searched-for item pasted onto the
end as a 'sentinel'.

-Mike
Nov 14 '05 #10
Angus Comber wrote:
Hello

My code below opens a Word document in binary mode and places the data into
a buffer. I then want to search this buffer for a string. I tried using
strstr but think it stops looking when it reaches first null character or
some control character in data. What C function should I use to be able to
search in a BYTE data buffer?


Use the Aho Corasick algorithm referenced previously in this newsgroup.

Though the discussion doesn't seem to have much to do with the algorithm.

-- glen

Nov 14 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: RDRaider | last post by:
How can I find which record(s) cause this error: Server: Msg 8152, Level 16, State 9, Line 1 String or binary data would be truncated. The statement has been terminated. I have tried Profiler...
1
by: Bernie Yaeger | last post by:
What causes the error 'string or binary data would be truncated'? Here's the routine that sometimes causes the error, sometimes not: irow("ctotal") = FormatCurrency(irow("total"), 2,...
1
by: languy | last post by:
Hi there I'm having a problem when using the SqlDataAdapter. When calling the Update(DataSet, string) method I get the following error message "String or binary data would be truncated". The...
0
by: dileepkumar | last post by:
I'm trying to save file data to sqlserver. file data is converted to sysyte.io.stream and passed through webservice and in Business layer i changed the stream to byte array and passed it as a...
6
by: dtarczynski | last post by:
Hello. I have problem when i inserting strings with special characters into MS SQL 2005 database for example:...
7
Coldfire
by: Coldfire | last post by:
i am having error ....details are ASP.Net application...in which I have a textbox <asp:TextBox ID="Other" TextMode=SingleLine CssClass="serviceBox" Width="250" Height="45" Runat="server"...
1
by: dreamer247 | last post by:
hii My insert quary is giving an error "Server: Msg 8152, Level 16, State 9, Line 1 String or binary data would be truncated. The statement has been terminated. " What may be the...
2
by: sunkesula | last post by:
I update a field in the database that gives the last update time. The first time I edit the item it puts a value in this field. The second time the applications fails with The statement has...
1
by: kefelegn | last post by:
i am using sql 2005 and asp.net and code behind C# when i try to insert data to table it says "String or binary data would be truncated. The statement has been terminated." any one help me pls be...
2
by: ramuksasi | last post by:
String or binary data would be truncated. The statement has been terminated. I'm using the following code but some times(not all the time) i get the above error can u help Dim ST As String =...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.