473,320 Members | 1,900 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Scanning byte arrays

Hello,

If im looking for a pattern of byes in a byte array, what would be the
best approach?

I could convert the array into a string and use IndexOf recursivly by
remembering the last position I encountered a match, but it seems to
be the wrong way to go about it, especially if I have an array a
couple of MB in size.

My initial thoughts are to just use a for loop and check for a byte
matching the first instance of the search pattern.
For example

asodaiudoqiiqmngm,3n23n4234kj23rasdrwebHelloWorldk ajsdiouahsd

My way of doing this would be to look for 'H' and then check every
character after that using a bunch of If's.

Is there a built in method or an algorithm I could use? Google has not
helped.

Cheers

May 7 '07 #1
7 7653
Vince,

If you have the contents in a string form, then you can use the IndexOf
method. However, if you need to search by bytes (and you are using
characters here, indicating to me that these are ascii values), then you
could create a generic method, like so:

public static int ByteIndexOf(byte[] searched, byte[] find, int start)
{
// Do standard error checking here.

// Did the values match?
bool matched = false;

// Cycle through each byte of the searched. Do not search past
// searched.Length - find.Length bytes, since it's impossible
// for the value to be found at that point.
for (int index = start; index <= searched.Length - find.Length; ++index)
{
// Assume the values matched.
matched = true;

// Search in the values to be found.
for (int subIndex = 0; subIndex < find.Length; ++subIndex)
{
// Check the value in the searched array vs the value
// in the find array.
if (find[subIndex] != searched[index + subIndex])
{
// The values did not match.
matched = false;

// Break out of the loop.
break;
}
}

// If the values matched, return the index.
if (matched)
{
// Return the index.
return index;
}
}

// None of the values matched, return -1.
return -1;
}

Of course, this could probably be optimized, but this is the general
idea.

Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Vince Panuccio" <to*********************@hotmail.comwrote in message
news:11********************@p77g2000hsh.googlegrou ps.com...
Hello,

If im looking for a pattern of byes in a byte array, what would be the
best approach?

I could convert the array into a string and use IndexOf recursivly by
remembering the last position I encountered a match, but it seems to
be the wrong way to go about it, especially if I have an array a
couple of MB in size.

My initial thoughts are to just use a for loop and check for a byte
matching the first instance of the search pattern.
For example

asodaiudoqiiqmngm,3n23n4234kj23rasdrwebHelloWorldk ajsdiouahsd

My way of doing this would be to look for 'H' and then check every
character after that using a bunch of If's.

Is there a built in method or an algorithm I could use? Google has not
helped.

Cheers

May 7 '07 #2
Hi,

"Vince Panuccio" <to*********************@hotmail.comwrote in message
news:11********************@p77g2000hsh.googlegrou ps.com...
Hello,

If im looking for a pattern of byes in a byte array, what would be the
best approach?

I could convert the array into a string and use IndexOf recursivly by
remembering the last position I encountered a match, but it seems to
be the wrong way to go about it, especially if I have an array a
couple of MB in size.
I don't like that idea neither, at the end you do have a byte array, not a
string.
My initial thoughts are to just use a for loop and check for a byte
matching the first instance of the search pattern.
That will work and not only that but whatever method you use either you or
the framework will do a loop in the array.
The code should not be too complex anyway. Just look the first byte and then
look if the others are a match.

May 7 '07 #3
"Vince Panuccio" <to*********************@hotmail.comwrote in message
news:11********************@p77g2000hsh.googlegrou ps.com...
Hello,

If im looking for a pattern of byes in a byte array,
what would be the best approach?
By "pattern" you seem to mean a substring? Right?
(Because you could, conceivably, mean something like
a substring that matches a regular expression or something
like that).
Well, efficient solutions to the substring matching
problem are well known: Karp-Rabin or Knuth-Morris-Pratt.
Just search the net for an implementation
(e.g. http://www-igm.univ-mlv.fr/~lecroq/s...ml#SECTION0070)

If, however, you mean searching for a regular string
pattern, you should, ideally, compile a finite-automaton
that does the matching. The important property, as regards
efficiency for large master strings, is that you never
have to check the *same* character (byte in your case)
more than once.

Regards,
Christian
>
I could convert the array into a string and use IndexOf recursivly by
remembering the last position I encountered a match, but it seems to
be the wrong way to go about it, especially if I have an array a
couple of MB in size.

My initial thoughts are to just use a for loop and check for a byte
matching the first instance of the search pattern.
For example

asodaiudoqiiqmngm,3n23n4234kj23rasdrwebHelloWorldk ajsdiouahsd

My way of doing this would be to look for 'H' and then check every
character after that using a bunch of If's.

Is there a built in method or an algorithm I could use? Google has not
helped.

Cheers
May 7 '07 #4
"Christian Stapfer" <ni*@dev.nulwrote in message
news:6e***************************@news.hispeed.ch ...
"Vince Panuccio" <to*********************@hotmail.comwrote in message
news:11********************@p77g2000hsh.googlegrou ps.com...
>Hello,

If im looking for a pattern of byes in a byte array,
what would be the best approach?

By "pattern" you seem to mean a substring? Right?
(Because you could, conceivably, mean something like
a substring that matches a regular expression or something
like that).
Well, efficient solutions to the substring matching
problem are well known: Karp-Rabin or Knuth-Morris-Pratt.
Just search the net for an implementation
(e.g. http://www-igm.univ-mlv.fr/~lecroq/s...ml#SECTION0070)
Sorry, I meant to give you this link:
http://www-igm.univ-mlv.fr/~lecroq/s...l#SECTION00140
and this link:
http://www-igm.univ-mlv.fr/~lecroq/s...ml#SECTION0050
as well.

Translating these C-language implementations to
C# shouldn't be that difficult.

Regards,
Christian

May 7 '07 #5
On May 8, 2:10 am, "Christian Stapfer" <n...@dev.nulwrote:
"Christian Stapfer" <n...@dev.nulwrote in message

news:6e***************************@news.hispeed.ch ...
"Vince Panuccio" <totalharmonicdistort...@hotmail.comwrote in message
news:11********************@p77g2000hsh.googlegrou ps.com...
Hello,
If im looking for a pattern of byes in a byte array,
what would be the best approach?
By "pattern" you seem to mean a substring? Right?
(Because you could, conceivably, mean something like
a substring that matches a regular expression or something
like that).
Well, efficient solutions to the substring matching
problem are well known: Karp-Rabin or Knuth-Morris-Pratt.
Just search the net for an implementation
(e.g.http://www-igm.univ-mlv.fr/~lecroq/s...ml#SECTION0070)

Sorry, I meant to give you this link:http://www-igm.univ-mlv.fr/~lecroq/s...l#SECTION00140
and this link:http://www-igm.univ-mlv.fr/~lecroq/s...ml#SECTION0050
as well.

Translating these C-language implementations to
C# shouldn't be that difficult.

Regards,
Christian

I am actually talking about searching for a string made up of ASCII
characters.

Ignacio, I understand what your saying but to me a byte array and a
string are the same thing. Is a string not just an array of characters
but only refered to differently? Is a character array not just an
array of byes represented differently?
I think I'll go with Christian's approach here. The Boyer-Moore
algorithm seems like the way to go.

Thanks for all your insights.

May 8 '07 #6
On May 8, 1:12 pm, Vince Panuccio
<totalharmonicdistort...@hotmail.comwrote:
I am actually talking about searching for a string made up of ASCII
characters.
But is the byte array *also* just an encoded string?
Ignacio, I understand what your saying but to me a byte array and a
string are the same thing.
They're definitely not.
Is a string not just an array of characters but only refered to differently?
Yes - although unicode "funnies" mean that the same characters can be
represented in different ways, considering normalised forms etc.
Is a character array not just an array of byes represented differently?
The "represented differently" is the key here. You shouldn't take
arbitrary binary data and treat it as an ASCII string, for instance.
If the byte array has been generated by taking a string and applying a
particular encoding, that's fine - but otherwise, you're likely to
lose data.

Jon

May 8 '07 #7
CLR strings are always unicode; for strings of ASCII characters then
yes: you can *convert* it uniquely to a byte-array - but this requires
an additional step to copying the data using your chosen encoding.
The boyer-moore algorithm sounds promising, but if you can work with
the string as a char[] rather than a byte[] you will save the
unnecessary work. The algorithm seems to talk about characters, not
bytes, so I can't see this being a problem. But not my area ;-p

Marc
May 8 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Peter | last post by:
Hi how can I compare two byte arrays in VB.NET Thank Peter
1
by: Eric Hendriks | last post by:
// In an unmanaged DLL the following function must be called: // int VFGeneralize(const BYTE * const * features); // "features" parameter is supposed to be an array of byte arrays. // function is...
8
by: Ben Terry | last post by:
What's the most efficient way to transfer data from a byte to a struct? The struct is rather complex--contains other structs as well as byte members. I've tried to use Marshal.Copy and an IntPtr...
7
by: Joseph Lee | last post by:
Hi All, I am having problem when i am using hashtable to keep an array of bytes value as keys. Take a look at the code snippet below --------------------------------------------------- ...
6
by: Dennis | last post by:
I was trying to determine the fastest way to build a byte array from components where the size of the individual components varied depending on the user's input. I tried three classes I built: (1)...
6
by: Vitaly Zayko | last post by:
It's probably simple but I can't find a way how to copy number of bytes from one byte array to another? Just like Array.Copy(SourceArray, SourceIndex, DestArray, DestIndex, Length) does but in my...
5
by: Kamal R. Prasad | last post by:
Hello, I am using a lexer (lex specification supplied to lex) to parse data, and one of the requirements is to handle UTF-8 characters. My understanding is that the first non-ascii character's...
17
by: =?Utf-8?B?U2hhcm9u?= | last post by:
Hi Gurus, I need to transfer a jagged array of byte by reference to unmanaged function, The unmanaged code should changed the values of the array, and when the unmanaged function returns I need...
1
kirubagari
by: kirubagari | last post by:
For i = 49 To mfilesize Step 6 rich1.SelStart = Len(rich1.Text) rich1.SelText = "Before : " & HexByte2Char(arrByte(i)) & _ " " & HexByte2Char(arrByte(i + 1)) & " " _ &...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.