473,765 Members | 2,024 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to find a string in a stream of binary data?


Hello

My code below opens a Word document in binary mode and places the data into
a buffer. I then want to search this buffer for a string. I tried using
strstr but think it stops looking when it reaches first null character or
some control character in data. What C function should I use to be able to
search in a BYTE data buffer?
Code:

#include <stdio.h>

char szPath[MAX_PATH] = "";
strcpy(szPath, "E:\\MyPath\\ah l.doc");
FILE* stream;

FILE *file = fopen(szPath, "rb"); // Open the file
fseek(file, 0, SEEK_END); // Seek to the end
long file_size = ftell(file); // Get the current position
rewind (file); // rewind to start of file

// allocate memory to contain the whole file.
BYTE* byBuffer = (BYTE*) malloc (file_size);
// if (buffer == NULL) exit (2);

// copy the file into the buffer.
fread (byBuffer,1,fil e_size, file);

const char* szFind = "selected";
//strstr(StrToLoo kIn, StrToFind);
char* szResult = strstr((char*)b yBuffer, szFind);

fclose(file); // Close the file

free( byBuffer );
Angus Comber
an***@NOSPAMite loffice.com

Nov 14 '05 #1
10 14603
Angus Comber wrote:

My code below opens a Word document in binary mode and places the data into
a buffer. I then want to search this buffer for a string. I tried using
strstr but think it stops looking when it reaches first null character or
some control character in data. What C function should I use to be able to
search in a BYTE data buffer?

Code:

#include <stdio.h>
You need to include more than this (e.g. stdlib.h for malloc).
char szPath[MAX_PATH] = "";
strcpy(szPath, "E:\\MyPath\\ah l.doc");
FILE* stream;

FILE *file = fopen(szPath, "rb"); // Open the file
fseek(file, 0, SEEK_END); // Seek to the end
long file_size = ftell(file); // Get the current position
Hmm, you're assuming C99, where you can use C++-style comments,
and can introduce new variables at any point in the code? For
portability, you're best to stick with C90.
rewind (file); // rewind to start of file

// allocate memory to contain the whole file.
BYTE* byBuffer = (BYTE*) malloc (file_size);
Look at the archive of this newsgroup, and you'll see many good
reasons _not_ to cast the return from malloc, if you're writing
in C.
// if (buffer == NULL) exit (2);


exit(2)?? What sort of code is that?

Well, anyway, here is an ISO/ANSI C program that (I think) will
do what you want, not necessarily with greatest efficiency.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

long find_string_in_ buf (unsigned char *buf, size_t len,
const char *s)
{
long i, j;
int slen = strlen(s);
long imax = len - slen - 1;
long ret = -1;
int match;

for (i=0; i<imax; i++) {
match = 1;
for (j=0; j<slen; j++) {
if (buf[i+j] != s[j]) {
match = 0;
break;
}
}
if (match) {
ret = i;
break;
}
}

return ret;
}

int main (int argc, char **argv)
{
const char *targ = "selected";
const char *fname;
FILE *fp;
size_t file_size;
unsigned char *buf;
long loc;

if (argc < 2) {
fputs("Please supply a filename\n", stderr);
exit(EXIT_FAILU RE);
}

fname = argv[1];

fp = fopen(fname, "rb");
if (fp == NULL) {
fprintf(stderr, "Couldn't open '%s'\n", fname);
exit(EXIT_FAILU RE);
}

fseek(fp, 0, SEEK_END);
file_size = ftell(fp);
rewind(fp);

buf = malloc(file_siz e);
if (buf == NULL) {
fputs("Out of memory\n", stderr);
exit(EXIT_FAILU RE);
}

fread(buf, 1, file_size, fp);

loc = find_string_in_ buf(buf, file_size, targ);
if (loc < 0) {
printf("The target string '%s' was not found\n", targ);
} else {
printf("The target string '%s' was found at byte %ld\n",
targ, loc);
}

fclose(fp);
free(buf);

return 0;
}

--
Allin Cottrell
Department of Economics
Wake Forest University, NC
Nov 14 '05 #2

"Angus Comber" <an***@iteloffi ce.com.PLEASENO SPAM> wrote in message
news:40******** *************** @mercury.nildra m.net...

Hello

My code below opens a Word document in binary mode and places the data into a buffer. I then want to search this buffer for a string. I tried using
strstr but think it stops looking when it reaches first null character or
some control character in data. What C function should I use to be able to search in a BYTE data buffer?
Code:

#include <stdio.h>

char szPath[MAX_PATH] = "";
strcpy(szPath, "E:\\MyPath\\ah l.doc");
FILE* stream;

FILE *file = fopen(szPath, "rb"); // Open the file
fseek(file, 0, SEEK_END); // Seek to the end
long file_size = ftell(file); // Get the current position
rewind (file); // rewind to start of file

// allocate memory to contain the whole file.
BYTE* byBuffer = (BYTE*) malloc (file_size);
// if (buffer == NULL) exit (2);

// copy the file into the buffer.
fread (byBuffer,1,fil e_size, file);

const char* szFind = "selected";
//strstr(StrToLoo kIn, StrToFind);
char* szResult = strstr((char*)b yBuffer, szFind);

fclose(file); // Close the file

free( byBuffer );
Angus Comber
an***@NOSPAMite loffice.com

#include <stdio.h>
#include <string.h>

#define BUFFER_SIZE 200 /* adjust to your needs */
/* nz_str() */
/* */
/* behaves as 'strstr()' but handles input with */
/* embedded zero characters */
/* */
/* 'data' and 'to_find' must be zero-terminated */
char *nz_strstr(char *data, const char *to_find)
{
char *result = 0;

while(!(result = strstr(data, to_find)))
data += strlen(data) + 1;

return result;
}

int main(int argc, char **argv)
{
char buffer[BUFFER_SIZE] = "Hello world\0 this is\0 a test";
char what[] = "test";
char *p = nz_strstr(buffe r, what);

printf("string '%s' ", what);

if(p)
printf("found at offset %lu\n", (unsigned long)(p - buffer));
else
printf("not found\n");
return 0;
}

-Mike

Nov 14 '05 #3
Allin Cottrell wrote:
Hmm, you're assuming C99, where you can use C++-style comments,
and can introduce new variables at any point in the code?
For portability, you're best to stick with C90.


When, if ever, would you recommend moving on to C 99?

Nov 14 '05 #4
E. Robert Tisdale wrote:
Allin Cottrell wrote:
Hmm, you're assuming C99, where you can use C++-style comments,
and can introduce new variables at any point in the code?
For portability, you're best to stick with C90.

When, if ever, would you recommend moving on to C 99?


When the more commonly used C compilers offer support for C99
that is comparable to the support they curently offer for C90.

At present, the most widely used C compilers can be made C90-
conforming, if you know the right options to use. At present,
no commonly used C compiler can be made C99-conforming, with
any combination of options.

--
Allin Cottrell
Department of Economics
Wake Forest University, NC
Nov 14 '05 #5

"Mike Wahler" <mk******@mkwah ler.net> wrote in message
news:p4******** ***********@new sread1.news.pas .earthlink.net. ..

/* nz_str() */
/* */
/* behaves as 'strstr()' but handles input with */
/* embedded zero characters */
/* */
/* 'data' and 'to_find' must be zero-terminated */
Correction: 'data' must be terminated by at least
*two* consecutive zero characters.

char *nz_strstr(char *data, const char *to_find)
{
char *result = 0;

while(!(result = strstr(data, to_find))) data += strlen(data) + 1;


... because of the "+ 1" used to step *over* the zeros.
BTW, Angus, referring to your original code, there's no such
type as 'BYTE' in C.

-Mike

Nov 14 '05 #6
Angus Comber wrote:

Hello

My code below opens a Word document in binary mode and places the data
into
a buffer.
<OT>
You might get a bit more joy out of Word docs if you do some research into
"structured storage" or "compound documents".
</OT>
I then want to search this buffer for a string. I tried using
strstr but think it stops looking when it reaches first null character or
some control character in data. What C function should I use to be able
to search in a BYTE data buffer?


Look up the Boyer-Moore search algorithm on the Net, and implement it in C.

--
Richard Heathfield : bi****@eton.pow ernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton
Nov 14 '05 #7

"Allin Cottrell" <co******@wfu.e du> wrote in message
news:c0******** ***@f1n1.spenet .wfu.edu...
E. Robert Tisdale wrote:

At present, the most widely used C compilers can be made C90-
conforming, if you know the right options to use. At present,
no commonly used C compiler can be made C99-conforming, with
any combination of options.


The freely available lcc-win32 compiler implements most of C99
http://www.cs.virginia.edu/~lcc-win32
Nov 14 '05 #8
Allin Cottrell <co******@wfu.e du> wrote in message news:<c0******* ****@f1n1.spene t.wfu.edu>...
Angus Comber wrote:

My code below opens a Word document in binary mode and places the data into
a buffer. I then want to search this buffer for a string. I tried using
strstr but think it stops looking when it reaches first null character or
some control character in data. What C function should I use to be able to
search in a BYTE data buffer?

Code:

#include <stdio.h>


You need to include more than this (e.g. stdlib.h for malloc).

char szPath[MAX_PATH] = "";
strcpy(szPath, "E:\\MyPath\\ah l.doc");
FILE* stream;

FILE *file = fopen(szPath, "rb"); // Open the file
fseek(file, 0, SEEK_END); // Seek to the end
long file_size = ftell(file); // Get the current position


Hmm, you're assuming C99, where you can use C++-style comments,
and can introduce new variables at any point in the code? For
portability, you're best to stick with C90.
rewind (file); // rewind to start of file

// allocate memory to contain the whole file.
BYTE* byBuffer = (BYTE*) malloc (file_size);


Look at the archive of this newsgroup, and you'll see many good
reasons _not_ to cast the return from malloc, if you're writing
in C.
// if (buffer == NULL) exit (2);


exit(2)?? What sort of code is that?

Well, anyway, here is an ISO/ANSI C program that (I think) will
do what you want, not necessarily with greatest efficiency.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

long find_string_in_ buf (unsigned char *buf, size_t len,
const char *s)
{
long i, j;
int slen = strlen(s);
long imax = len - slen - 1;
long ret = -1;
int match;

for (i=0; i<imax; i++) {
match = 1;
for (j=0; j<slen; j++) {
if (buf[i+j] != s[j]) {
match = 0;
break;
}
}
if (match) {
ret = i;
break;
}
}

return ret;
}

A perhaps simpler implementation is as follows. Seems like using memcmp
is a win here.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* look for the query in binary input data of indicated length */
const char *find_str_in_da ta(const char *data, size_t data_len,
const char *query)
{
const char *p = data;
size_t query_len = strlen(query);

while (1) {
if (data + data_len - p < query_len) {
break;
}
if (*p == query[0]) {
if (memcmp(p, query, query_len) == 0) {
return p;
}
}
p++;
}

return NULL;
}

int main()
{
const char *query = "foo";
const char data1[] = { 'a', 'b', '\0', 'f', '\0', 'f', 'o', 'o' };
const char data2[] = { 'a', 'b', '\0', 'f', '\0', 'f', 'o', 'o', 'q'};
const char data3[] = { 'f', 'o' };
const char *result;

result = find_str_in_dat a(data1, sizeof data1, query);
printf("query '%s'%s found in data1\n", query, result ? "" : " NOT");
result = find_str_in_dat a(data2, sizeof data2, query);
printf("query '%s'%s found in data2\n", query, result ? "" : " NOT");
result = find_str_in_dat a(data3, sizeof data3, query);
printf("query '%s'%s found in data3\n", query, result ? "" : " NOT");

return EXIT_SUCCESS;
}

-David
Nov 14 '05 #9

"Mike Wahler" <mk******@mkwah ler.net> wrote in message
news:sI******** ***********@new sread1.news.pas .earthlink.net. ..

"Mike Wahler" <mk******@mkwah ler.net> wrote in message
news:p4******** ***********@new sread1.news.pas .earthlink.net. ..

/* nz_str() */
/* */
/* behaves as 'strstr()' but handles input with */
/* embedded zero characters */
/* */
/* 'data' and 'to_find' must be zero-terminated */


Correction: 'data' must be terminated by at least
*two* consecutive zero characters.

char *nz_strstr(char *data, const char *to_find)
{
char *result = 0;

while(!(result = strstr(data, to_find)))

data += strlen(data) + 1;


.. because of the "+ 1" used to step *over* the zeros.


And *another* problem! (surprised nobody caught it):

This will have undefined behavior if the searched-for string isn't
found, it'll run off the end of the 'data' array.

It needs a 'size' parameter to check against, or at least
a 'dummy' copy of the searched-for item pasted onto the
end as a 'sentinel'.

-Mike
Nov 14 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
40178
by: RDRaider | last post by:
How can I find which record(s) cause this error: Server: Msg 8152, Level 16, State 9, Line 1 String or binary data would be truncated. The statement has been terminated. I have tried Profiler but I can't get it to tell me which records are causing the error. Here's the script I'm running: EXEC sp_executesql N'UPDATE IMDISFIL_SQL
1
2813
by: Bernie Yaeger | last post by:
What causes the error 'string or binary data would be truncated'? Here's the routine that sometimes causes the error, sometimes not: irow("ctotal") = FormatCurrency(irow("total"), 2, TriState.False, TriState.True, TriState.True) ctotal is a char column in an sql server table; total is a currency column in the same table. Notwithstanding, the conversion appears to be made, even though the exception is being thrown. Any help would be...
1
8076
by: languy | last post by:
Hi there I'm having a problem when using the SqlDataAdapter. When calling the Update(DataSet, string) method I get the following error message "String or binary data would be truncated". The rows neither have data that exceeds 8000 chars nor contain any LOB fields. Furthermore the data is transferred between two tables that are bitwise identical. -- snip -- System.Data.SqlClient.SqlCommand command =
0
1765
by: dileepkumar | last post by:
I'm trying to save file data to sqlserver. file data is converted to sysyte.io.stream and passed through webservice and in Business layer i changed the stream to byte array and passed it as a parameter to my stored procedure. Image file or storing in database. but when i try to save a text file its giving an exception "string or binary data would be truncated \r\n the statement has terminated". I used sqlDBType.Image for the parameter tyep....
6
13332
by: dtarczynski | last post by:
Hello. I have problem when i inserting strings with special characters into MS SQL 2005 database for example: http://www.netsprint.pl/serwis/search?q=http%3A%2F%2Fwww%2Ejobpilot%2Epl%2Fprofile%2Frss20%2F%3Ftitle%3Djobpilot%2Epl%2B%2D%2Boferty%2Bpracy%26region%5B%5D%3D505%26appkind%5B%5D%3D1%2C2%26language%5B%5D%3D11%26profession%5B%5D%3D12%2C65%2C14%2C59%26industry%5B%5D%3D19%2C48%2C23%2C24%2C16%2C33%2C26%26limit%3D10 Then i getting this...
7
17711
Coldfire
by: Coldfire | last post by:
i am having error ....details are ASP.Net application...in which I have a textbox <asp:TextBox ID="Other" TextMode=SingleLine CssClass="serviceBox" Width="250" Height="45" Runat="server" MaxLength="1000" /></asp:TextBox> and this textbox is in a <asp:Repeater > which has the count of 35. The textbox value is stored in the SQL DB And the field data-type in the SQL DB is VARCHAR(1000)
1
2391
by: dreamer247 | last post by:
hii My insert quary is giving an error "Server: Msg 8152, Level 16, State 9, Line 1 String or binary data would be truncated. The statement has been terminated. " What may be the reason of that error.
2
19322
by: sunkesula | last post by:
I update a field in the database that gives the last update time. The first time I edit the item it puts a value in this field. The second time the applications fails with The statement has been terminated. String or binary data would be truncated. What's the deal?? There is enough space for the date/time so I am not sure what is going on here.. Anyone have any ideas?
1
2410
by: kefelegn | last post by:
i am using sql 2005 and asp.net and code behind C# when i try to insert data to table it says "String or binary data would be truncated. The statement has been terminated." any one help me pls be fast
2
3236
by: ramuksasi | last post by:
String or binary data would be truncated. The statement has been terminated. I'm using the following code but some times(not all the time) i get the above error can u help Dim ST As String = "update MCVISITORS SET HOSTADDRESS='" & Host & "',USERAGENT='" & UserAgent & "',BROWSER='" & Browser & "',CRAWLER='" & Crawler & "', URL='" & URL1 & "', REFERRER='" & Referrer & "', VISITCOUNT='" & VisitCount & "', ORIG_REFERER='" & OriginalReferrer...
0
9568
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9399
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10007
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9957
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9835
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8832
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5423
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3924
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3532
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.