469,631 Members | 991 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,631 developers. It's quick & easy.

[repost] Most efficient way to make sure that 2 byte[] are identical?

I have to compare 2 byte[] and I must be sure that they are fully identic.
I have to perform this check about 1000 times per minute and on arrays that
are between 100-200K in size.
In that sense, what is the most efficient method?

Sincerely,
José
Nov 15 '05 #1
6 1417
Take a hash?

"José Joye" <jose.joye@__No_SPam__bluewin__maPS_oN__.ch> wrote in message
news:ej**************@TK2MSFTNGP12.phx.gbl...
I have to compare 2 byte[] and I must be sure that they are fully identic.
I have to perform this check about 1000 times per minute and on arrays that are between 100-200K in size.
In that sense, what is the most efficient method?

Sincerely,
José

Nov 15 '05 #2
That depends. Can you spare some time when these byte arrays are initialy
loaded or created to compute hash values? If so, do that and compare the
hashes. If not, a simple loop comparing each byte and returning false on
the first mismatch will probably be your best bet.

However, look for ways to optimize this based on your data: if the array is
normally modified at the end only, start comparing at the end. If certain
offsets are more likely to be different, compare those. If your code is the
one modifying the byte array, and the arrays are changed infrequently, use
hashes and check them on subsequent tries.
As you can see, there's no one most efficient method, if you really want to
perform best you need to know your data and how it's modified.

"José Joye" <jose.joye@__No_SPam__bluewin__maPS_oN__.ch> wrote in message
news:ej**************@TK2MSFTNGP12.phx.gbl...
I have to compare 2 byte[] and I must be sure that they are fully identic.
I have to perform this check about 1000 times per minute and on arrays that are between 100-200K in size.
In that sense, what is the most efficient method?

Sincerely,
José

Nov 15 '05 #3
Thanks for your thoughts!

Since I cannot predict the kind of information I will receive (However, the
probability that the arrays are identicals is really high), I will go for
something like MD5.ComputeHash().
José

"Philip Rieck" <st***@mckraken.com> a écrit dans le message de
news:eA**************@TK2MSFTNGP12.phx.gbl...
That depends. Can you spare some time when these byte arrays are initialy
loaded or created to compute hash values? If so, do that and compare the
hashes. If not, a simple loop comparing each byte and returning false on
the first mismatch will probably be your best bet.

However, look for ways to optimize this based on your data: if the array is normally modified at the end only, start comparing at the end. If certain
offsets are more likely to be different, compare those. If your code is the one modifying the byte array, and the arrays are changed infrequently, use
hashes and check them on subsequent tries.
As you can see, there's no one most efficient method, if you really want to perform best you need to know your data and how it's modified.

"José Joye" <jose.joye@__No_SPam__bluewin__maPS_oN__.ch> wrote in message
news:ej**************@TK2MSFTNGP12.phx.gbl...
I have to compare 2 byte[] and I must be sure that they are fully identic. I have to perform this check about 1000 times per minute and on arrays

that
are between 100-200K in size.
In that sense, what is the most efficient method?

Sincerely,
José


Nov 15 '05 #4
José Joye <jose.joye@__No_SPam__bluewin__maPS_oN__.ch> wrote:
Thanks for your thoughts!

Since I cannot predict the kind of information I will receive (However, the
probability that the arrays are identicals is really high), I will go for
something like MD5.ComputeHash().


That's not going to be any faster than doing it the brute force way.

If the arrays are different every time, the fastest thing to do is
almost certainly just check the lengths to start with and then just
compare each byte of the array in the obvious way:

if (array1.Length != array2.Length)
{
return false;
}
for (int i=0; i < array1.Length; i++)
{
if (array1[i] != array2[i])
{
return false;
}
}
return true;

The only problem I can see with that is in terms of caching, and I
don't think it'll actually make any odds.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 15 '05 #5
You're right Jon.
However, I will stick to the MD5 hashing. This will allow me to attach the
hash to the message and also to calculate it in advance.

Thanks,
José
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
José Joye <jose.joye@__No_SPam__bluewin__maPS_oN__.ch> wrote:
Thanks for your thoughts!

Since I cannot predict the kind of information I will receive (However, the probability that the arrays are identicals is really high), I will go for
something like MD5.ComputeHash().


That's not going to be any faster than doing it the brute force way.

If the arrays are different every time, the fastest thing to do is
almost certainly just check the lengths to start with and then just
compare each byte of the array in the obvious way:

if (array1.Length != array2.Length)
{
return false;
}
for (int i=0; i < array1.Length; i++)
{
if (array1[i] != array2[i])
{
return false;
}
}
return true;

The only problem I can see with that is in terms of caching, and I
don't think it'll actually make any odds.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 15 '05 #6
José Joye <jo*******@KILLTHESPAMSbluewin.ch> wrote:
You're right Jon.
However, I will stick to the MD5 hashing. This will allow me to attach the
hash to the message and also to calculate it in advance.


Don't forget that although the hash will never give any false negatives
(arrays which are actually identical but are deemed unequal), it might
give false positives (arrays which are different but are deemed equal).
There's a tiny chance of it happening, but you need to weigh up what
the consequences would be.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 15 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Aaron_TekRecycle.com | last post: by
12 posts views Thread by Programmer | last post: by
3 posts views Thread by Ted Miller | last post: by
8 posts views Thread by Jami Bradley | last post: by
14 posts views Thread by Steve McLellan | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.