Comparing byte arrays

Hi

how can I compare two byte arrays in VB.NET

Thank
Peter

Nov 22 '05 #1

8227

Hi-
Here's a cool way to do it with a hash value by one of the
MVPs. (Corrado Cavalli)
http://groups.google.com/gr*********...TNGP12.phx.gbl
(Watch for line wrapping.)
Joe
"Peter" <an*******@discussions.microsoft.com> wrote in message
news:18**********************************@microsof t.com...

Hi,

how can I compare two byte arrays in VB.NET?

Thanks
Peter

Nov 22 '05 #2

Joe LeVasseur

Hi,

how can I compare two byte arrays in VB.NET?

Thanks
Peter

Nov 22 '05 #3

Cor

Hi Peter,

In the one Joe was sending you is a sample from David, Herfried, Spam and
Corrado, although Joe did say the one from Corrado was cool I thought that
the one from Herfried could be faster. Because that can stop on the process
when it is a very large bytearray or when the first byte is direct different
(what is likely when bytearrays are not the same). I made one myself too
one.
Is serializes the bytearray using the memorystream. (A problem with mine is
that it uses 2 times the memory so not good usable with arrays from 20Mb and
more).

I am curious what is the fastest (when the bytes areas length are unequal
than the ones from Herfried and me are of course both the fastest but that
is to add in the procedure from Corrado too)

Cor.

\\\
Dim abyt1() As Byte = {12, 55, 88, 32}
Dim abyt2() As Byte = {12, 55, 87, 32}
If abyt1.Length = abyt2.Length Then
Dim mem1 As New IO.MemoryStream
Dim mem2 As New IO.MemoryStream
Dim binWriter1 As New IO.BinaryWriter(mem1)
Dim binWriter2 As New IO.BinaryWriter(mem2)
binWriter1.Write(abyt1)
binWriter2.Write(abyt2)
Dim binReader1 As New IO.BinaryReader(binWriter1.BaseStream)
Dim binReader2 As New IO.BinaryReader(binWriter2.BaseStream)
binReader1.BaseStream.Position = 0
binReader2.BaseStream.Position = 0
Dim a, b As String
a = binReader1.ReadChars(abyt1.Length)
b = binReader2.ReadChars(abyt2.Length)
If a <> b Then
MessageBox.Show("not equal char")
End If
Else
MessageBox.Show("not equal length")
End If
///

Nov 22 '05 #4

Cor

Nov 22 '05 #5

Jon Skeet [C# MVP]

Cor <no*@non.com> wrote:

\\\
Dim abyt1() As Byte = {12, 55, 88, 32}
Dim abyt2() As Byte = {12, 55, 87, 32}
If abyt1.Length = abyt2.Length Then
Dim mem1 As New IO.MemoryStream
Dim mem2 As New IO.MemoryStream
Dim binWriter1 As New IO.BinaryWriter(mem1)
Dim binWriter2 As New IO.BinaryWriter(mem2)
binWriter1.Write(abyt1)
binWriter2.Write(abyt2)
Dim binReader1 As New IO.BinaryReader(binWriter1.BaseStream)
Dim binReader2 As New IO.BinaryReader(binWriter2.BaseStream)
binReader1.BaseStream.Position = 0
binReader2.BaseStream.Position = 0
Dim a, b As String
a = binReader1.ReadChars(abyt1.Length)
b = binReader2.ReadChars(abyt2.Length)
If a <> b Then
MessageBox.Show("not equal char")
End If
Else
MessageBox.Show("not equal length")
End If
///

This code confuses bytes with characters, which is never a good idea.
In particular, not every byte array is going to be a valid stream of
UTF-8 encoded characters, at which point ReadChars will throw an
exception.

It also ends up using *4* times as much memory: it first copies all the
data into a stream, and then reads the data again into a character
array, which is going to take twice as much memory as the byte array.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 22 '05 #6

Jon Skeet [C# MVP]

Cor <no*@non.com> wrote:

\\\
Dim abyt1() As Byte = {12, 55, 88, 32}
Dim abyt2() As Byte = {12, 55, 87, 32}
If abyt1.Length = abyt2.Length Then
Dim mem1 As New IO.MemoryStream
Dim mem2 As New IO.MemoryStream
Dim binWriter1 As New IO.BinaryWriter(mem1)
Dim binWriter2 As New IO.BinaryWriter(mem2)
binWriter1.Write(abyt1)
binWriter2.Write(abyt2)
Dim binReader1 As New IO.BinaryReader(binWriter1.BaseStream)
Dim binReader2 As New IO.BinaryReader(binWriter2.BaseStream)
binReader1.BaseStream.Position = 0
binReader2.BaseStream.Position = 0
Dim a, b As String
a = binReader1.ReadChars(abyt1.Length)
b = binReader2.ReadChars(abyt2.Length)
If a <> b Then
MessageBox.Show("not equal char")
End If
Else
MessageBox.Show("not equal length")
End If
///

Nov 22 '05 #7

Jon Skeet [C# MVP]

Peter <an*******@discussions.microsoft.com> wrote:

how can I compare two byte arrays in VB.NET?

Other posters have given you ways using hashes or streams. Personally,
I think it's much easier just to compare each value directly.

This is the code I'd use in C#. I don't know VB.NET well enough to give
you the best, most idiomatic code for that environment, but I suspect
you should be able to understand the C# version:

public static bool CompareByteArrays (byte[] data1, byte[] data2)
{
// If both are null, they're equal
if (data1==null && data2==null)
{
return true;
}
// If either but not both are null, they're not equal
if (data1==null || data2==null)
{
return false;
}
if (data1.Length != data2.Length)
{
return false;
}
for (int i=0; i < data1.Length; i++)
{
if (data1[i] != data2[i])
{
return false;
}
}
return true;
}

That's going to be as efficient as any other algorithm *unless* you
want to compare one byte array to several others, in which case hashing
*might* help you. The above is still likely to be the simplest solution
though.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 22 '05 #8

Jon Skeet [C# MVP]

Peter <an*******@discussions.microsoft.com> wrote:

how can I compare two byte arrays in VB.NET?

Nov 22 '05 #9

Cor

Hi Jon,

This code confuses bytes with characters, which is never a good idea.
In particular, not every byte array is going to be a valid stream of
UTF-8 encoded characters, at which point ReadChars will throw an
exception.
In the other direction I would agree with you, in this direction not.
A 8 bits byte becomes an 16 bits unicode, but the value stays the same.
It also ends up using *4* times as much memory: it first copies all the
data into a stream, and then reads the data again into a character
array, which is going to take twice as much memory as the byte array.

When seeiing your message above I realize it even 6 times because the byte
is converted to uni as you said. However, that is exactly as I stated the
bad isue from this methode (Although I first thought it was 4 and said 2
because I thought I had miscalculated myself).

I stay with the same as Herfried showed as I said in my message, which is by
the way the same as yours, but because there was told that others where
better, I showed this as an other methode, which I probably myself never
shall use.

:-)

Cor

Nov 22 '05 #10

Cor

Hi Jon,

This code confuses bytes with characters, which is never a good idea.
In particular, not every byte array is going to be a valid stream of
UTF-8 encoded characters, at which point ReadChars will throw an
exception.
In the other direction I would agree with you, in this direction not.
A 8 bits byte becomes an 16 bits unicode, but the value stays the same.
It also ends up using *4* times as much memory: it first copies all the
data into a stream, and then reads the data again into a character
array, which is going to take twice as much memory as the byte array.

Nov 22 '05 #11

Jon Skeet [C# MVP]

Cor <no*@non.com> wrote:

This code confuses bytes with characters, which is never a good idea.
In particular, not every byte array is going to be a valid stream of
UTF-8 encoded characters, at which point ReadChars will throw an
exception.
In the other direction I would agree with you, in this direction not.
A 8 bits byte becomes an 16 bits unicode, but the value stays the same.

No, it really doesn't. If you use BinaryWriter with no encoding
parameter, it will use UTF-8 by default. You're trying to decode a byte
array assuming that it's a valid UTF-8 sequence, which it may not be.
For instance, take:
Dim abyt1() As Byte = {47, &Hc0, &Haf}
Dim abyt2() As Byte = {&Hc0, &Haf, 47}

These are both *actually* invalid UTF-8 sequences, but the .NET decoder
doesn't notice that. However, it *does* decode both into the same
string - so you get a false positive.

I can't actually provoke ReadChars into throwing an exception at the
moment, but it *should*.

As I said, confusing bytes and characters is *always* a bad idea.

It also ends up using *4* times as much memory: it first copies all the
data into a stream, and then reads the data again into a character
array, which is going to take twice as much memory as the byte array.

When seeiing your message above I realize it even 6 times because the byte
is converted to uni as you said. However, that is exactly as I stated the
bad isue from this methode (Although I first thought it was 4 and said 2
because I thought I had miscalculated myself).

Um, it's still 4 times:

1 for the original
1 for the memory stream
2 for the string

Where else do you think memory is being used?

Consider a simple test case with 1K of bytes in each array, all being
<0x80:

Memory used by original byte arrays: 2K (2*1K)
Memory used by memory streams: 2K (2*1K)
Memory used by strings: 4K (2*2K)

Total memory: 8K = 4*original 2K
I stay with the same as Herfried showed as I said in my message, which is by
the way the same as yours, but because there was told that others where
better, I showed this as an other methode, which I probably myself never
shall use.

Herfried's method is indeed correct. The use of a hash *looks* clever,
but for just comparing two byte arrays it will be less efficient than
comparing them directly, particularly if there is a difference early
on, or the lengths are different. It also has the tiny possibility of
giving a false positive, if the byte arrays are different but produce
the same hash. (Highly unlikely, but possible.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 22 '05 #12

Jon Skeet [C# MVP]

Cor <no*@non.com> wrote:

This code confuses bytes with characters, which is never a good idea.
In particular, not every byte array is going to be a valid stream of
UTF-8 encoded characters, at which point ReadChars will throw an
exception.
In the other direction I would agree with you, in this direction not.
A 8 bits byte becomes an 16 bits unicode, but the value stays the same.

No, it really doesn't. If you use BinaryWriter with no encoding
parameter, it will use UTF-8 by default. You're trying to decode a byte
array assuming that it's a valid UTF-8 sequence, which it may not be.
For instance, take:
Dim abyt1() As Byte = {47, &Hc0, &Haf}
Dim abyt2() As Byte = {&Hc0, &Haf, 47}

These are both *actually* invalid UTF-8 sequences, but the .NET decoder
doesn't notice that. However, it *does* decode both into the same
string - so you get a false positive.

I can't actually provoke ReadChars into throwing an exception at the
moment, but it *should*.

As I said, confusing bytes and characters is *always* a bad idea.

It also ends up using *4* times as much memory: it first copies all the
data into a stream, and then reads the data again into a character
array, which is going to take twice as much memory as the byte array.

When seeiing your message above I realize it even 6 times because the byte
is converted to uni as you said. However, that is exactly as I stated the
bad isue from this methode (Although I first thought it was 4 and said 2
because I thought I had miscalculated myself).

Um, it's still 4 times:

1 for the original
1 for the memory stream
2 for the string

Where else do you think memory is being used?

Consider a simple test case with 1K of bytes in each array, all being
<0x80:

Memory used by original byte arrays: 2K (2*1K)
Memory used by memory streams: 2K (2*1K)
Memory used by strings: 4K (2*2K)

Total memory: 8K = 4*original 2K
I stay with the same as Herfried showed as I said in my message, which is by
the way the same as yours, but because there was told that others where
better, I showed this as an other methode, which I probably myself never
shall use.

Nov 22 '05 #13

Cor

Hi Jon,

I never take investigations if a byte will be converted to a 2 byte
character or not.

If not it is 4 times if it is converted to 2 bytes it is 6 times.
2 streams in memory
2 arrays from 2 bytes

Actualy it is not important, I only made this to show that when you really
want to do streaming than this would be "a" method.

As I said, I would never think about using this, however this did seem to me
to show that the normal comparising of a byte array as you, Herfried and I
am used to is probably the most sufficient.

The more because you can expect that in a byte array when there is a
difference than probably:
- the lenght is unequal
- it shows already with the first bytes, because that is the nature of a
byte array.

But there seems to be a lot of people who are thinking that when you do
looping in your program it is slow. (Although I think that it is probably
done in all the other methods behind the scene to get the same results).

Cor

Nov 22 '05 #14

Cor

Nov 22 '05 #15

Jon Skeet [C# MVP]

Cor <no*@non.com> wrote:

I never take investigations if a byte will be converted to a 2 byte
character or not. If not it is 4 times if it is converted to 2 bytes it is 6 times.
2 streams in memory
2 arrays from 2 bytes
But there were 2 arrays to start with. You've shown that for each 1K
*per byte array* you end up with an extra 6K *in total*. That means
that for each 2K of original byte array *in total* you end up with 8K
*in total* - 4 times as much.

Work through the example - you only end up with 4 times as much memory
used.
Actualy it is not important, I only made this to show that when you really
want to do streaming than this would be "a" method.
A fatally flawed one, however, due to the char/byte confusion.

If streaming, I'd suggest reading a block at a time, and comparing with
simple byte-by-byte operations. The only tricky bit would be taking
into account that a Read from a stream might not return as much data as
you want it to. You'd either have to loop on each stream to get a full
buffer, then compare the buffers, or manage two partial buffers,
refilling them when necessary.
As I said, I would never think about using this, however this did seem to me
to show that the normal comparising of a byte array as you, Herfried and I
am used to is probably the most sufficient.

The more because you can expect that in a byte array when there is a
difference than probably:
- the lenght is unequal
- it shows already with the first bytes, because that is the nature of a
byte array.

But there seems to be a lot of people who are thinking that when you do
looping in your program it is slow. (Although I think that it is probably
done in all the other methods behind the scene to get the same results).

Of course. Sooner or later, all the algorithms have to "touch" all the
memory, otherwise they can't possibly catch all differences.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 22 '05 #16

Jon Skeet [C# MVP]

Cor <no*@non.com> wrote:

I never take investigations if a byte will be converted to a 2 byte
character or not. If not it is 4 times if it is converted to 2 bytes it is 6 times.
2 streams in memory
2 arrays from 2 bytes
But there were 2 arrays to start with. You've shown that for each 1K
*per byte array* you end up with an extra 6K *in total*. That means
that for each 2K of original byte array *in total* you end up with 8K
*in total* - 4 times as much.

Work through the example - you only end up with 4 times as much memory
used.
Actualy it is not important, I only made this to show that when you really
want to do streaming than this would be "a" method.
A fatally flawed one, however, due to the char/byte confusion.

If streaming, I'd suggest reading a block at a time, and comparing with
simple byte-by-byte operations. The only tricky bit would be taking
into account that a Read from a stream might not return as much data as
you want it to. You'd either have to loop on each stream to get a full
buffer, then compare the buffers, or manage two partial buffers,
refilling them when necessary.
As I said, I would never think about using this, however this did seem to me
to show that the normal comparising of a byte array as you, Herfried and I
am used to is probably the most sufficient.

The more because you can expect that in a byte array when there is a
difference than probably:
- the lenght is unequal
- it shows already with the first bytes, because that is the nature of a
byte array.

But there seems to be a lot of people who are thinking that when you do
looping in your program it is slow. (Although I think that it is probably
done in all the other methods behind the scene to get the same results).

Nov 22 '05 #17

Cor

Hi Jon,

The only tricky bit would be taking
into account that a Read from a stream might not return as much data as
you want it to.

That one I was thinking later on, but a 00 byte in a string stays there so
the string will have that length but only is not showable. (Although I have
thought to check if what I say above is true, however I think we said enough
about this). I never did bring it as an ideal methode, only if you real want
to do it without a for loop, this was also a possibility.

Cor

Nov 22 '05 #18

Cor

Hi Jon,

The only tricky bit would be taking
into account that a Read from a stream might not return as much data as
you want it to.

Nov 22 '05 #19

Jon Skeet [C# MVP]

Cor <no*@non.com> wrote:

The only tricky bit would be taking
into account that a Read from a stream might not return as much data as
you want it to.
That one I was thinking later on, but a 00 byte in a string stays there so
the string will have that length but only is not showable.

I'm not sure how that's relevant, to be honest...
(Although I have
thought to check if what I say above is true, however I think we said enough
about this). I never did bring it as an ideal methode, only if you real want
to do it without a for loop, this was also a possibility.

If you don't mind it being fundamentally broken :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 22 '05 #20

Jon Skeet [C# MVP]

Cor <no*@non.com> wrote:

The only tricky bit would be taking
into account that a Read from a stream might not return as much data as
you want it to.
That one I was thinking later on, but a 00 byte in a string stays there so
the string will have that length but only is not showable.

I'm not sure how that's relevant, to be honest...
(Although I have
thought to check if what I say above is true, however I think we said enough
about this). I never did bring it as an ideal methode, only if you real want
to do it without a for loop, this was also a possibility.

If you don't mind it being fundamentally broken :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 22 '05 #21

Cor

> If you don't mind it being fundamentally broken :)

No

Because I found that For loop the best and I think that with this too the
other methods are more or less broken and that was the major reason I did
make it.

for me this is EOT.

:-)

Cor

Nov 22 '05 #22

Cor

Nov 22 '05 #23

Similar topics