473,569 Members | 2,737 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Parsing Binary Files

Hello fellow Coders!

ok, I"m trying to write a very simple application in C#. (Yes its my first
program)

What I want to do is :

1) Open a binary file
2) Search this file for a particular string.
3) Close the file

Now is there any special thing I should do as this is a binary file ?

Any code examples would very greating appreciated.

Thank You

Hemang Shah
Nov 16 '05 #1
9 11403
Hemang Shah <v-*****@microsoft .com> wrote:
ok, I"m trying to write a very simple application in C#. (Yes its my first
program)

What I want to do is :

1) Open a binary file
2) Search this file for a particular string.
3) Close the file

Now is there any special thing I should do as this is a binary file ?


Well, if you're trying to search for a *string*, you'll need to know
the encoding - or by "string" do you mean "sequence of bytes"?

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #2
Hello Jon

I'm trying to search for occurances of "OU=" in the binary file yes so its a
sequence of bytes.

If I open the file in hexviewer, I can see these and search for it. Rather
then opening up the file in hexviewer everytime, I want to write a utility
to search it and display it.

I did find some code online which opens the file in binary mode and displays
it on a text box.
But what you see in the text box is not the same what you see in hexviewer.
Moreover, I don't really understand the code.

Here is the code:

void DisplayFile()

{

int nCols = 16;

FileStream inStream = new FileStream(chos enfile, FileMode.Open,

FileAccess.Read );

long nBytesToRead = inStream.Length ;

if (nBytesToRead > 65536/4)

nBytesToRead = 65536/4;

int nLines = (int)(nBytesToR ead/nCols) + 1;

string [] lines = new string[nLines];

int nBytesRead = 0;

for (int i=0 ; i<nLines ; i++)

{

StringBuilder nextLine = new StringBuilder() ;

nextLine.Capaci ty = 4*nCols;

for (int j = 0 ; j<nCols ; j++)

{

int nextByte = inStream.ReadBy te();

nBytesRead++;

if (nextByte < 0 || nBytesRead > 65536)

break;

char nextChar = (char)nextByte;

if (nextChar < 16)

nextLine.Append (" x0" + string.Format(" {0,1:X}",

(int)nextChar)) ;

else if

(char.IsLetterO rDigit(nextChar ) ||

char.IsPunctuat ion(nextChar))

nextLine.Append (" " + nextChar + " ");

else

nextLine.Append (" x" + string.Format(" {0,2:X}",

(int)nextChar)) ;

}

lines[i] = nextLine.ToStri ng();

}

inStream.Close( );

this.textBoxCon tents.Lines = lines;

}

Thank You

_______________ _______________ _______________ _______________ ______________

Hemang Shah MCSE A+
Enterprise Messaging Support
Direct phone: (905) 568-0434 x 23854

Email: v-*****@microsoft .com

Office hours: Wed to Sat from 19:00-06:00 hrs EST.

"Jon Skeet [C# MVP]" <sk***@pobox.co m> wrote in message
news:MP******** *************** *@msnews.micros oft.com...
Hemang Shah <v-*****@microsoft .com> wrote:
ok, I"m trying to write a very simple application in C#. (Yes its my
first
program)

What I want to do is :

1) Open a binary file
2) Search this file for a particular string.
3) Close the file

Now is there any special thing I should do as this is a binary file ?


Well, if you're trying to search for a *string*, you'll need to know
the encoding - or by "string" do you mean "sequence of bytes"?

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #3
Hemang Shah <v-*****@microsoft .com> wrote:
I'm trying to search for occurances of "OU=" in the binary file yes so its a
sequence of bytes.
But OU= is a sequence of *characters*. Do you mean you're looking for
the sequence of bytes which form the ASCII encoding for "OU="? I
suspect that's what you're after.
If I open the file in hexviewer, I can see these and search for it. Rather
then opening up the file in hexviewer everytime, I want to write a utility
to search it and display it.

I did find some code online which opens the file in binary mode and displays
it on a text box.
But what you see in the text box is not the same what you see in hexviewer.
Moreover, I don't really understand the code.


The first thing is to ditch that code. It's bad in many, many ways.

I don't have time to write some sample code for you right now, but I'll
try tomorrow afternoon. Basically, you should read the file in chunks,
and then look through for the correct sequence, knowing that it might
go across a "chunk boundary".

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #4
Yes you are right, that is what I'm trying to achieve.. A sequence of
*Characters* which I thought comprised a string.

I can send you a sample of the type of files I'm trying to read if you like.

I would really appreciate if you could write me a sample, that would be
going over & beyond!

You can write it tomorrow or whenever you can. Or you can point me to some
good resources which would teach / explain the logic behind it.

Reading in chunks makes sense. Sometimes the files that I'll be parsing
will even exceed 16 to 80GB in size. But i'll only have to parse the first
few 100MBs of data to get the "OU=".

Thanks a lot again in advance.

Hemang

"Jon Skeet [C# MVP]" <sk***@pobox.co m> wrote in message
news:MP******** *************** *@msnews.micros oft.com...
Hemang Shah <v-*****@microsoft .com> wrote:
I'm trying to search for occurances of "OU=" in the binary file yes so
its a
sequence of bytes.


But OU= is a sequence of *characters*. Do you mean you're looking for
the sequence of bytes which form the ASCII encoding for "OU="? I
suspect that's what you're after.
If I open the file in hexviewer, I can see these and search for it.
Rather
then opening up the file in hexviewer everytime, I want to write a
utility
to search it and display it.

I did find some code online which opens the file in binary mode and
displays
it on a text box.
But what you see in the text box is not the same what you see in
hexviewer.
Moreover, I don't really understand the code.


The first thing is to ditch that code. It's bad in many, many ways.

I don't have time to write some sample code for you right now, but I'll
try tomorrow afternoon. Basically, you should read the file in chunks,
and then look through for the correct sequence, knowing that it might
go across a "chunk boundary".

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #5
"Hemang Shah" <v-*****@microsoft .com> wrote
news:eG******** *****@TK2MSFTNG P14.phx.gbl...
Yes you are right, that is what I'm trying to achieve.. A sequence of
*Characters* which I thought comprised a string.
I think what Jon was trying to say is that *bytes* and *characters* are two
different things: In .net, characters are usually unicode characters, i.e.
have a size of 2 bytes. You can convert these to a variety of binary
representations (including plain ASCII) which have a different layout.
Now, in your binary file, do you want to look for occurances of a string in
*unicode* representation or ascii (or other) representation?
...
I would really appreciate if you could write me a sample, that would be
going over & beyond!


Here's a little sample I've come up with:
It reads binary blocks of data from a file, then tests every possible
position. After that, it copies the trailing n bytes of the buffer to the
beginning and starts reading after byte n, so it can find matches on "chunk
boundaries". (I think it works)
Note that this is not the fastest searching algorithm; (google for
"boyer-moore" for more info). But I'd guess in your case the HD is the
bottleneck anyway.
using System;
using System.IO;

class BinarySearch
{
static void Main()
{
string stringToLookFor = "7777";
string filePath = @"C:\SomePath\p i.txt";

// convert the string to a binary (ASCII) representation
byte[] bufferToLookFor =
System.Text.Enc oding.ASCII.Get Bytes(stringToL ookFor);

int matchCounter = 1; // count matches for nicer output

// open the file in binary mode
using (Stream stream = new FileStream(file Path, FileMode.Open,
FileAccess.Read ))
{
byte[] readBuffer = new byte[16384]; // our input buffer
int bytesRead = 0; // number of bytes read
int offset = 0; // offset inside read-buffer
long filePos = 0; // position inside the file
before read operation
while ((bytesRead = stream.Read(rea dBuffer, offset,
readBuffer.Leng th-offset)) > 0)
{
for (int i=0; i<bytesRead+off set-bufferToLookFor .Length; i++)
{
bool match = true;
for (int j=0; j<bufferToLookF or.Length; j++)
if (bufferToLookFo r[j] != readBuffer[i+j])
{
match = false;
break;
}
if (match)
{
Console.WriteLi ne("{0,5}. \"{1}\" found at {3:x}",
matchCounter++, stringToLookFor , filePath, filePos+i-offset);
//return;
}
}
// store file position before next read
filePos = stream.Position ;

// store the last few characters to ensure matches on "chunk
boundaries"
offset = bufferToLookFor .Length;
for (int i=0; i<offset; i++)
readBuffer[i] = readBuffer[readBuffer.Leng th-offset+i];
}
}
Console.WriteLi ne("No match found");
}
}
Niki
Nov 16 '05 #6
Hemang Shah <v-*****@microsoft .com> wrote:
I would really appreciate if you could write me a sample, that would be
going over & beyond!


Is the sample Niki provided okay for you? (I like the idea of copying
the buffer - nice simple way of dealing with boundaries.)

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #7
Thank you Niki & Jon

I took the sample and it worked for me. I was able to get proper matches.

Now I have some questions if you don't mind me asking:

1) The code right now is Case Sensitive I guess to the string we want to
search is that correct ?
2) If I want it to be not case sensitive, do I type the string in every
posible combination and search with each of those bytes ? or is there a
better solution
3) What I want to do is, after the search is met, I want to read x amount
of characters after that search and display it. Now the # of characters
after the search is not fixed, it could be 1 word or it could be a sentence.
I would know it because it will truncate with another search string.
4) I don't understand the copying of buffer so that we can check across
boundries, I understand the concept but I cannot follow the code from there.
Also, how do I handle my fetching the info if it is across boundries.
5) Our input buffer is set to 16 bytes. Is there any reason its 16 ? or it
could be any size.

I hope I was able to ask the right questions.

Thank You

Hemang.


"Niki Estner" <ni*********@cu be.net> wrote in message
news:OR******** ******@TK2MSFTN GP10.phx.gbl...
"Hemang Shah" <v-*****@microsoft .com> wrote
news:eG******** *****@TK2MSFTNG P14.phx.gbl...
Yes you are right, that is what I'm trying to achieve.. A sequence of
*Characters* which I thought comprised a string.


I think what Jon was trying to say is that *bytes* and *characters* are
two different things: In .net, characters are usually unicode characters,
i.e. have a size of 2 bytes. You can convert these to a variety of binary
representations (including plain ASCII) which have a different layout.
Now, in your binary file, do you want to look for occurances of a string
in *unicode* representation or ascii (or other) representation?
...
I would really appreciate if you could write me a sample, that would be
going over & beyond!


Here's a little sample I've come up with:
It reads binary blocks of data from a file, then tests every possible
position. After that, it copies the trailing n bytes of the buffer to the
beginning and starts reading after byte n, so it can find matches on
"chunk boundaries". (I think it works)
Note that this is not the fastest searching algorithm; (google for
"boyer-moore" for more info). But I'd guess in your case the HD is the
bottleneck anyway.
using System;
using System.IO;

class BinarySearch
{
static void Main()
{
string stringToLookFor = "7777";
string filePath = @"C:\SomePath\p i.txt";

// convert the string to a binary (ASCII) representation
byte[] bufferToLookFor =
System.Text.Enc oding.ASCII.Get Bytes(stringToL ookFor);

int matchCounter = 1; // count matches for nicer output

// open the file in binary mode
using (Stream stream = new FileStream(file Path, FileMode.Open,
FileAccess.Read ))
{
byte[] readBuffer = new byte[16384]; // our input buffer
int bytesRead = 0; // number of bytes read
int offset = 0; // offset inside read-buffer
long filePos = 0; // position inside the file
before read operation
while ((bytesRead = stream.Read(rea dBuffer, offset,
readBuffer.Leng th-offset)) > 0)
{
for (int i=0; i<bytesRead+off set-bufferToLookFor .Length; i++)
{
bool match = true;
for (int j=0; j<bufferToLookF or.Length; j++)
if (bufferToLookFo r[j] != readBuffer[i+j])
{
match = false;
break;
}
if (match)
{
Console.WriteLi ne("{0,5}. \"{1}\" found at {3:x}",
matchCounter++, stringToLookFor , filePath, filePos+i-offset);
//return;
}
}
// store file position before next read
filePos = stream.Position ;

// store the last few characters to ensure matches on "chunk
boundaries"
offset = bufferToLookFor .Length;
for (int i=0; i<offset; i++)
readBuffer[i] = readBuffer[readBuffer.Leng th-offset+i];
}
}
Console.WriteLi ne("No match found");
}
}
Niki

Nov 16 '05 #8
Hemang Shah <v-*****@microsoft .com> wrote:
Thank you Niki & Jon

I took the sample and it worked for me. I was able to get proper matches.

Now I have some questions if you don't mind me asking:

1) The code right now is Case Sensitive I guess to the string we want to
search is that correct ?
Yes.
2) If I want it to be not case sensitive, do I type the string in every
posible combination and search with each of those bytes ? or is there a
better solution
Well, you could supply multiple byte arrays, and check whether the nth
byte is any of the acceptable ones, rather than just a single
acceptable one. You then just supply a lower case version and an upper
case version - you don't need to come up with every combination.
3) What I want to do is, after the search is met, I want to read x amount
of characters after that search and display it. Now the # of characters
after the search is not fixed, it could be 1 word or it could be a sentence.
I would know it because it will truncate with another search string.
To what extent is this *really* a binary file? Pretty much everything
you've said has been in terms of text.
4) I don't understand the copying of buffer so that we can check across
boundries, I understand the concept but I cannot follow the code from there.
I haven't actually looked at Niki's code myself.
Also, how do I handle my fetching the info if it is across boundries.
5) Our input buffer is set to 16 bytes. Is there any reason its 16 ? or it
could be any size.


It could be set to any size. I'd usually use about 32K myself.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #9
"Hemang Shah" <v-*****@microsoft .com> wrote in
news:eq******** ******@TK2MSFTN GP09.phx.gbl...
Thank you Niki & Jon

I took the sample and it worked for me. I was able to get proper matches.

Now I have some questions if you don't mind me asking:

1) The code right now is Case Sensitive I guess to the string we want to
search is that correct ?
Yes.
2) If I want it to be not case sensitive, do I type the string in every
posible combination and search with each of those bytes ? or is there a
better solution
I'd convert the input string to uppercase, and convert each byte in the
buffer to uppercase too before comparing.
3) What I want to do is, after the search is met, I want to read x amount
of characters after that search and display it. Now the # of characters
after the search is not fixed, it could be 1 word or it could be a
sentence. I would know it because it will truncate with another search
string.
If you have the offset in the file, you can use Stream.Seek & Stream.Read to
do that.
4) I don't understand the copying of buffer so that we can check across
boundries, I understand the concept but I cannot follow the code from
there.
Try to use a short buffer (e.g. 20 bytes), and a short file and step through
the code with the debugger. IMO that's generally the best way to see what a
program does.
Also, how do I handle my fetching the info if it is across boundries.
As I said, I'd use a separate Stream.Read call to extract that info.
5) Our input buffer is set to 16 bytes. Is there any reason its 16 ? or
it could be any size.
It's set to 16 kbytes. HD access can only be performed in 4 k pages, so it
should be at least 4k (otherwise the HD will have to read the same page more
than once). I usually make it a little bigger so the overhead for calling
into the OS isn't done that often.
If you don't care for performance (e.g. for testing or debugging) you can
make it any size as long as it's bigger than the search string.
I hope I was able to ask the right questions.


There are no stupid questions. Only stupid answers...

Niki
Nov 16 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

27
4906
by: Eric | last post by:
Assume that disk space is not an issue (the files will be small < 5k in general for the purpose of storing preferences) Assume that transportation to another OS may never occur. Are there any solid reasons to prefer text files over binary files files?
28
2753
by: wwj | last post by:
void main() { char* p="Hello"; printf("%s",p); *p='w'; printf("%s",p); }
6
2241
by: alice | last post by:
hi all, Can anybody please tell the advantages which the binary files offers over the character files. Thanks, Alice walls
4
3678
by: knapak | last post by:
Hello I'm a self instructed amateur attempting to read a huge file from disk... so bear with me please... I just learned that reading a file in binary is faster than text. So I wrote the following code that compiles OK. It runs and shows the requested output. However, after execution, it pops one of those windows to send error reports...
8
5100
by: dagecko | last post by:
Hi I would like to know how to detect if a file is binary or not. It's important for me but I don't know where to start. Ty
10
3633
by: joelagnel | last post by:
hi friends, i've been having this confusion for about a year, i want to know the exact difference between text and binary files. using the fwrite function in c, i wrote 2 bytes of integers in binary mode. according to me, notepad opens files and each byte of the file read, it converts that byte from ascii to its correct character and...
15
2974
by: JoeC | last post by:
I am writing a program that I am trying to learn and save binary files. This is the page I found as a source: http://www.angelfire.com/country/aldev0/cpphowto/cpp_BinaryFileIO.html I have successfully created and used txt files. I am trying to save then load in an array of pointers to objects:
3
3831
by: masood.iqbal | last post by:
Hi, Kindly excuse my novice question. In all the literature on ifstream that I have seen, nowhere have I read what happens if you try to read a binary file using the ">>" operator. I ran into the two problems while trying to read a binary file. 1). All whitespace characters were skipped 2). Certain binary files gave a core dump
9
2872
by: deepakvsoni | last post by:
are binary files portable?
0
7700
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
8125
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7676
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7974
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
5219
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3653
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3642
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2114
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
0
938
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.