473,387 Members | 1,578 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Parsing Binary Files

Hello fellow Coders!

ok, I"m trying to write a very simple application in C#. (Yes its my first
program)

What I want to do is :

1) Open a binary file
2) Search this file for a particular string.
3) Close the file

Now is there any special thing I should do as this is a binary file ?

Any code examples would very greating appreciated.

Thank You

Hemang Shah
Nov 16 '05 #1
9 11382
Hemang Shah <v-*****@microsoft.com> wrote:
ok, I"m trying to write a very simple application in C#. (Yes its my first
program)

What I want to do is :

1) Open a binary file
2) Search this file for a particular string.
3) Close the file

Now is there any special thing I should do as this is a binary file ?


Well, if you're trying to search for a *string*, you'll need to know
the encoding - or by "string" do you mean "sequence of bytes"?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #2
Hello Jon

I'm trying to search for occurances of "OU=" in the binary file yes so its a
sequence of bytes.

If I open the file in hexviewer, I can see these and search for it. Rather
then opening up the file in hexviewer everytime, I want to write a utility
to search it and display it.

I did find some code online which opens the file in binary mode and displays
it on a text box.
But what you see in the text box is not the same what you see in hexviewer.
Moreover, I don't really understand the code.

Here is the code:

void DisplayFile()

{

int nCols = 16;

FileStream inStream = new FileStream(chosenfile, FileMode.Open,

FileAccess.Read);

long nBytesToRead = inStream.Length;

if (nBytesToRead > 65536/4)

nBytesToRead = 65536/4;

int nLines = (int)(nBytesToRead/nCols) + 1;

string [] lines = new string[nLines];

int nBytesRead = 0;

for (int i=0 ; i<nLines ; i++)

{

StringBuilder nextLine = new StringBuilder();

nextLine.Capacity = 4*nCols;

for (int j = 0 ; j<nCols ; j++)

{

int nextByte = inStream.ReadByte();

nBytesRead++;

if (nextByte < 0 || nBytesRead > 65536)

break;

char nextChar = (char)nextByte;

if (nextChar < 16)

nextLine.Append(" x0" + string.Format("{0,1:X}",

(int)nextChar));

else if

(char.IsLetterOrDigit(nextChar) ||

char.IsPunctuation(nextChar))

nextLine.Append(" " + nextChar + " ");

else

nextLine.Append(" x" + string.Format("{0,2:X}",

(int)nextChar));

}

lines[i] = nextLine.ToString();

}

inStream.Close();

this.textBoxContents.Lines = lines;

}

Thank You

__________________________________________________ ________________________

Hemang Shah MCSE A+
Enterprise Messaging Support
Direct phone: (905) 568-0434 x 23854

Email: v-*****@microsoft.com

Office hours: Wed to Sat from 19:00-06:00 hrs EST.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Hemang Shah <v-*****@microsoft.com> wrote:
ok, I"m trying to write a very simple application in C#. (Yes its my
first
program)

What I want to do is :

1) Open a binary file
2) Search this file for a particular string.
3) Close the file

Now is there any special thing I should do as this is a binary file ?


Well, if you're trying to search for a *string*, you'll need to know
the encoding - or by "string" do you mean "sequence of bytes"?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #3
Hemang Shah <v-*****@microsoft.com> wrote:
I'm trying to search for occurances of "OU=" in the binary file yes so its a
sequence of bytes.
But OU= is a sequence of *characters*. Do you mean you're looking for
the sequence of bytes which form the ASCII encoding for "OU="? I
suspect that's what you're after.
If I open the file in hexviewer, I can see these and search for it. Rather
then opening up the file in hexviewer everytime, I want to write a utility
to search it and display it.

I did find some code online which opens the file in binary mode and displays
it on a text box.
But what you see in the text box is not the same what you see in hexviewer.
Moreover, I don't really understand the code.


The first thing is to ditch that code. It's bad in many, many ways.

I don't have time to write some sample code for you right now, but I'll
try tomorrow afternoon. Basically, you should read the file in chunks,
and then look through for the correct sequence, knowing that it might
go across a "chunk boundary".

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #4
Yes you are right, that is what I'm trying to achieve.. A sequence of
*Characters* which I thought comprised a string.

I can send you a sample of the type of files I'm trying to read if you like.

I would really appreciate if you could write me a sample, that would be
going over & beyond!

You can write it tomorrow or whenever you can. Or you can point me to some
good resources which would teach / explain the logic behind it.

Reading in chunks makes sense. Sometimes the files that I'll be parsing
will even exceed 16 to 80GB in size. But i'll only have to parse the first
few 100MBs of data to get the "OU=".

Thanks a lot again in advance.

Hemang

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Hemang Shah <v-*****@microsoft.com> wrote:
I'm trying to search for occurances of "OU=" in the binary file yes so
its a
sequence of bytes.


But OU= is a sequence of *characters*. Do you mean you're looking for
the sequence of bytes which form the ASCII encoding for "OU="? I
suspect that's what you're after.
If I open the file in hexviewer, I can see these and search for it.
Rather
then opening up the file in hexviewer everytime, I want to write a
utility
to search it and display it.

I did find some code online which opens the file in binary mode and
displays
it on a text box.
But what you see in the text box is not the same what you see in
hexviewer.
Moreover, I don't really understand the code.


The first thing is to ditch that code. It's bad in many, many ways.

I don't have time to write some sample code for you right now, but I'll
try tomorrow afternoon. Basically, you should read the file in chunks,
and then look through for the correct sequence, knowing that it might
go across a "chunk boundary".

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #5
"Hemang Shah" <v-*****@microsoft.com> wrote
news:eG*************@TK2MSFTNGP14.phx.gbl...
Yes you are right, that is what I'm trying to achieve.. A sequence of
*Characters* which I thought comprised a string.
I think what Jon was trying to say is that *bytes* and *characters* are two
different things: In .net, characters are usually unicode characters, i.e.
have a size of 2 bytes. You can convert these to a variety of binary
representations (including plain ASCII) which have a different layout.
Now, in your binary file, do you want to look for occurances of a string in
*unicode* representation or ascii (or other) representation?
...
I would really appreciate if you could write me a sample, that would be
going over & beyond!


Here's a little sample I've come up with:
It reads binary blocks of data from a file, then tests every possible
position. After that, it copies the trailing n bytes of the buffer to the
beginning and starts reading after byte n, so it can find matches on "chunk
boundaries". (I think it works)
Note that this is not the fastest searching algorithm; (google for
"boyer-moore" for more info). But I'd guess in your case the HD is the
bottleneck anyway.
using System;
using System.IO;

class BinarySearch
{
static void Main()
{
string stringToLookFor = "7777";
string filePath = @"C:\SomePath\pi.txt";

// convert the string to a binary (ASCII) representation
byte[] bufferToLookFor =
System.Text.Encoding.ASCII.GetBytes(stringToLookFo r);

int matchCounter = 1; // count matches for nicer output

// open the file in binary mode
using (Stream stream = new FileStream(filePath, FileMode.Open,
FileAccess.Read))
{
byte[] readBuffer = new byte[16384]; // our input buffer
int bytesRead = 0; // number of bytes read
int offset = 0; // offset inside read-buffer
long filePos = 0; // position inside the file
before read operation
while ((bytesRead = stream.Read(readBuffer, offset,
readBuffer.Length-offset)) > 0)
{
for (int i=0; i<bytesRead+offset-bufferToLookFor.Length; i++)
{
bool match = true;
for (int j=0; j<bufferToLookFor.Length; j++)
if (bufferToLookFor[j] != readBuffer[i+j])
{
match = false;
break;
}
if (match)
{
Console.WriteLine("{0,5}. \"{1}\" found at {3:x}",
matchCounter++, stringToLookFor, filePath, filePos+i-offset);
//return;
}
}
// store file position before next read
filePos = stream.Position;

// store the last few characters to ensure matches on "chunk
boundaries"
offset = bufferToLookFor.Length;
for (int i=0; i<offset; i++)
readBuffer[i] = readBuffer[readBuffer.Length-offset+i];
}
}
Console.WriteLine("No match found");
}
}
Niki
Nov 16 '05 #6
Hemang Shah <v-*****@microsoft.com> wrote:
I would really appreciate if you could write me a sample, that would be
going over & beyond!


Is the sample Niki provided okay for you? (I like the idea of copying
the buffer - nice simple way of dealing with boundaries.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #7
Thank you Niki & Jon

I took the sample and it worked for me. I was able to get proper matches.

Now I have some questions if you don't mind me asking:

1) The code right now is Case Sensitive I guess to the string we want to
search is that correct ?
2) If I want it to be not case sensitive, do I type the string in every
posible combination and search with each of those bytes ? or is there a
better solution
3) What I want to do is, after the search is met, I want to read x amount
of characters after that search and display it. Now the # of characters
after the search is not fixed, it could be 1 word or it could be a sentence.
I would know it because it will truncate with another search string.
4) I don't understand the copying of buffer so that we can check across
boundries, I understand the concept but I cannot follow the code from there.
Also, how do I handle my fetching the info if it is across boundries.
5) Our input buffer is set to 16 bytes. Is there any reason its 16 ? or it
could be any size.

I hope I was able to ask the right questions.

Thank You

Hemang.


"Niki Estner" <ni*********@cube.net> wrote in message
news:OR**************@TK2MSFTNGP10.phx.gbl...
"Hemang Shah" <v-*****@microsoft.com> wrote
news:eG*************@TK2MSFTNGP14.phx.gbl...
Yes you are right, that is what I'm trying to achieve.. A sequence of
*Characters* which I thought comprised a string.


I think what Jon was trying to say is that *bytes* and *characters* are
two different things: In .net, characters are usually unicode characters,
i.e. have a size of 2 bytes. You can convert these to a variety of binary
representations (including plain ASCII) which have a different layout.
Now, in your binary file, do you want to look for occurances of a string
in *unicode* representation or ascii (or other) representation?
...
I would really appreciate if you could write me a sample, that would be
going over & beyond!


Here's a little sample I've come up with:
It reads binary blocks of data from a file, then tests every possible
position. After that, it copies the trailing n bytes of the buffer to the
beginning and starts reading after byte n, so it can find matches on
"chunk boundaries". (I think it works)
Note that this is not the fastest searching algorithm; (google for
"boyer-moore" for more info). But I'd guess in your case the HD is the
bottleneck anyway.
using System;
using System.IO;

class BinarySearch
{
static void Main()
{
string stringToLookFor = "7777";
string filePath = @"C:\SomePath\pi.txt";

// convert the string to a binary (ASCII) representation
byte[] bufferToLookFor =
System.Text.Encoding.ASCII.GetBytes(stringToLookFo r);

int matchCounter = 1; // count matches for nicer output

// open the file in binary mode
using (Stream stream = new FileStream(filePath, FileMode.Open,
FileAccess.Read))
{
byte[] readBuffer = new byte[16384]; // our input buffer
int bytesRead = 0; // number of bytes read
int offset = 0; // offset inside read-buffer
long filePos = 0; // position inside the file
before read operation
while ((bytesRead = stream.Read(readBuffer, offset,
readBuffer.Length-offset)) > 0)
{
for (int i=0; i<bytesRead+offset-bufferToLookFor.Length; i++)
{
bool match = true;
for (int j=0; j<bufferToLookFor.Length; j++)
if (bufferToLookFor[j] != readBuffer[i+j])
{
match = false;
break;
}
if (match)
{
Console.WriteLine("{0,5}. \"{1}\" found at {3:x}",
matchCounter++, stringToLookFor, filePath, filePos+i-offset);
//return;
}
}
// store file position before next read
filePos = stream.Position;

// store the last few characters to ensure matches on "chunk
boundaries"
offset = bufferToLookFor.Length;
for (int i=0; i<offset; i++)
readBuffer[i] = readBuffer[readBuffer.Length-offset+i];
}
}
Console.WriteLine("No match found");
}
}
Niki

Nov 16 '05 #8
Hemang Shah <v-*****@microsoft.com> wrote:
Thank you Niki & Jon

I took the sample and it worked for me. I was able to get proper matches.

Now I have some questions if you don't mind me asking:

1) The code right now is Case Sensitive I guess to the string we want to
search is that correct ?
Yes.
2) If I want it to be not case sensitive, do I type the string in every
posible combination and search with each of those bytes ? or is there a
better solution
Well, you could supply multiple byte arrays, and check whether the nth
byte is any of the acceptable ones, rather than just a single
acceptable one. You then just supply a lower case version and an upper
case version - you don't need to come up with every combination.
3) What I want to do is, after the search is met, I want to read x amount
of characters after that search and display it. Now the # of characters
after the search is not fixed, it could be 1 word or it could be a sentence.
I would know it because it will truncate with another search string.
To what extent is this *really* a binary file? Pretty much everything
you've said has been in terms of text.
4) I don't understand the copying of buffer so that we can check across
boundries, I understand the concept but I cannot follow the code from there.
I haven't actually looked at Niki's code myself.
Also, how do I handle my fetching the info if it is across boundries.
5) Our input buffer is set to 16 bytes. Is there any reason its 16 ? or it
could be any size.


It could be set to any size. I'd usually use about 32K myself.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #9
"Hemang Shah" <v-*****@microsoft.com> wrote in
news:eq**************@TK2MSFTNGP09.phx.gbl...
Thank you Niki & Jon

I took the sample and it worked for me. I was able to get proper matches.

Now I have some questions if you don't mind me asking:

1) The code right now is Case Sensitive I guess to the string we want to
search is that correct ?
Yes.
2) If I want it to be not case sensitive, do I type the string in every
posible combination and search with each of those bytes ? or is there a
better solution
I'd convert the input string to uppercase, and convert each byte in the
buffer to uppercase too before comparing.
3) What I want to do is, after the search is met, I want to read x amount
of characters after that search and display it. Now the # of characters
after the search is not fixed, it could be 1 word or it could be a
sentence. I would know it because it will truncate with another search
string.
If you have the offset in the file, you can use Stream.Seek & Stream.Read to
do that.
4) I don't understand the copying of buffer so that we can check across
boundries, I understand the concept but I cannot follow the code from
there.
Try to use a short buffer (e.g. 20 bytes), and a short file and step through
the code with the debugger. IMO that's generally the best way to see what a
program does.
Also, how do I handle my fetching the info if it is across boundries.
As I said, I'd use a separate Stream.Read call to extract that info.
5) Our input buffer is set to 16 bytes. Is there any reason its 16 ? or
it could be any size.
It's set to 16 kbytes. HD access can only be performed in 4 k pages, so it
should be at least 4k (otherwise the HD will have to read the same page more
than once). I usually make it a little bigger so the overhead for calling
into the OS isn't done that often.
If you don't care for performance (e.g. for testing or debugging) you can
make it any size as long as it's bigger than the search string.
I hope I was able to ask the right questions.


There are no stupid questions. Only stupid answers...

Niki
Nov 16 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

27
by: Eric | last post by:
Assume that disk space is not an issue (the files will be small < 5k in general for the purpose of storing preferences) Assume that transportation to another OS may never occur. Are there...
28
by: wwj | last post by:
void main() { char* p="Hello"; printf("%s",p); *p='w'; printf("%s",p); }
6
by: alice | last post by:
hi all, Can anybody please tell the advantages which the binary files offers over the character files. Thanks, Alice walls
4
by: knapak | last post by:
Hello I'm a self instructed amateur attempting to read a huge file from disk... so bear with me please... I just learned that reading a file in binary is faster than text. So I wrote the...
8
by: dagecko | last post by:
Hi I would like to know how to detect if a file is binary or not. It's important for me but I don't know where to start. Ty
10
by: joelagnel | last post by:
hi friends, i've been having this confusion for about a year, i want to know the exact difference between text and binary files. using the fwrite function in c, i wrote 2 bytes of integers in...
15
by: JoeC | last post by:
I am writing a program that I am trying to learn and save binary files. This is the page I found as a source: http://www.angelfire.com/country/aldev0/cpphowto/cpp_BinaryFileIO.html I have...
3
by: masood.iqbal | last post by:
Hi, Kindly excuse my novice question. In all the literature on ifstream that I have seen, nowhere have I read what happens if you try to read a binary file using the ">>" operator. I ran into...
9
by: deepakvsoni | last post by:
are binary files portable?
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.