Cor <non@non.com> wrote:[color=blue][color=green]
> > This code confuses bytes with characters, which is never a good idea.
> > In particular, not every byte array is going to be a valid stream of
> > UTF-8 encoded characters, at which point ReadChars will throw an
> > exception.[/color]
>
> In the other direction I would agree with you, in this direction not.
> A 8 bits byte becomes an 16 bits unicode, but the value stays the same.[/color]
No, it really doesn't. If you use BinaryWriter with no encoding
parameter, it will use UTF-8 by default. You're trying to decode a byte
array assuming that it's a valid UTF-8 sequence, which it may not be.
For instance, take:
Dim abyt1() As Byte = {47, &Hc0, &Haf}
Dim abyt2() As Byte = {&Hc0, &Haf, 47}
These are both *actually* invalid UTF-8 sequences, but the .NET decoder
doesn't notice that. However, it *does* decode both into the same
string - so you get a false positive.
I can't actually provoke ReadChars into throwing an exception at the
moment, but it *should*.
As I said, confusing bytes and characters is *always* a bad idea.
[color=blue][color=green]
> > It also ends up using *4* times as much memory: it first copies all the
> > data into a stream, and then reads the data again into a character
> > array, which is going to take twice as much memory as the byte array.[/color]
>
> When seeiing your message above I realize it even 6 times because the byte
> is converted to uni as you said. However, that is exactly as I stated the
> bad isue from this methode (Although I first thought it was 4 and said 2
> because I thought I had miscalculated myself).[/color]
Um, it's still 4 times:
1 for the original
1 for the memory stream
2 for the string
Where else do you think memory is being used?
Consider a simple test case with 1K of bytes in each array, all being
<0x80:
Memory used by original byte arrays: 2K (2*1K)
Memory used by memory streams: 2K (2*1K)
Memory used by strings: 4K (2*2K)
Total memory: 8K = 4*original 2K
[color=blue]
> I stay with the same as Herfried showed as I said in my message, which is by
> the way the same as yours, but because there was told that others where
> better, I showed this as an other methode, which I probably myself never
> shall use.[/color]
Herfried's method is indeed correct. The use of a hash *looks* clever,
but for just comparing two byte arrays it will be less efficient than
comparing them directly, particularly if there is a difference early
on, or the lengths are different. It also has the tiny possibility of
giving a false positive, if the byte arrays are different but produce
the same hash. (Highly unlikely, but possible.)
--
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too