473,738 Members | 9,555 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Hashing and comparing of arrays

How does the GetHashCode() of an array object behave?

Does it combine the GetHashCode() of its elements, or does
it create a sync block for the object?

I want to use readonly arrays as dictionary keys, based on
their content, not their identity. Is this feasible using the
arrays directly, or do I need to wrap them in a struct that
handles GetHashCode and Equal? If so, is such a wrapper
present in the standard class library?

My app will have hundreds or maybe thousands of these
dictionaries, some of which will contain only a single
arraykey-value entry, while others will have thousands
of entries. The number of entries in a particular dictionary
cannot be predicted or estimated reliably in advance. The
values associated with the arrays will be WeakReference
objects, each pointing to an object with a finalizer that will
remove the entry from the containing dictionary.

Would the Dictionary<Some Class[], WeakReference>
generic collection be suitable for this, or do I need to
spice it up?
Apr 18 '06 #1
19 3833
Ole Nielsby <ol*********@sn ailmail.dk> wrote:
How does the GetHashCode() of an array object behave?

Does it combine the GetHashCode() of its elements, or does
it create a sync block for the object?
Neither, as far as I'm aware. I believe it doesn't override GetHashCode
at all, so the implementation in Object is used.
I want to use readonly arrays as dictionary keys, based on
their content, not their identity. Is this feasible using the
arrays directly, or do I need to wrap them in a struct that
handles GetHashCode and Equal? If so, is such a wrapper
present in the standard class library?
You'll need to wrap them, I believe. (Any reason for using a struct
rather than a class though?)
My app will have hundreds or maybe thousands of these
dictionaries, some of which will contain only a single
arraykey-value entry, while others will have thousands
of entries. The number of entries in a particular dictionary
cannot be predicted or estimated reliably in advance. The
values associated with the arrays will be WeakReference
objects, each pointing to an object with a finalizer that will
remove the entry from the containing dictionary.

Would the Dictionary<Some Class[], WeakReference>
generic collection be suitable for this, or do I need to
spice it up?


I'd consider using a HybridDictionar y - it's likely to be much more
efficient for the smaller ones. Unfortunately, however, I don't believe
there's a generic version :(

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 18 '06 #2
Jon Skeet [C# MVP] <sk***@pobox.co m> wrote:
Ole Nielsby <ol*********@sn ailmail.dk> wrote:
How does the GetHashCode() of an array object behave?

Does it combine the GetHashCode() of its elements, or does
it create a sync block for the object?
Neither, as far as I'm aware. I believe it doesn't override
GetHashCode at all, so the implementation in Object is used.


I may have got this wrong but I think the Object::GetHash Code
implementation will create a sync block if there isn't already
one.
I want to use readonly arrays as dictionary keys, based on
their content, not their identity. Is this feasible using the
arrays directly, or do I need to wrap them in a struct that
handles GetHashCode and Equal? If so, is such a wrapper
present in the standard class library?


You'll need to wrap them, I believe. (Any reason for using a
struct rather than a class though?)


There will be large numbers of these arrays, and wrappers
would be unnecessary memory bloat. A generic collection
used with a struct will, if I understand generics right, be able
to store the array pointers as efficiently as if I used them
directly, while using the overrides defined by my struct when
hashing and comparing.
I'd consider using a HybridDictionar y - it's likely to be much
more efficient for the smaller ones. Unfortunately, however,
I don't believe there's a generic version :(


Well then, I guess I will google up an open source generic
HybridDictionar y-like thing - I'm pretty sure it exists somewhere,
I just hoped I could make do with the standard library.

Regards/Ole N.
Apr 18 '06 #3
Ole Nielsby <ol*********@sn ailmail.dk> wrote:
Does it combine the GetHashCode() of its elements, or does
it create a sync block for the object?


Neither, as far as I'm aware. I believe it doesn't override
GetHashCode at all, so the implementation in Object is used.


I may have got this wrong but I think the Object::GetHash Code
implementation will create a sync block if there isn't already
one.


Sync blocks and GetHashCode are unrelated. The first is to do with
locking; the second is to do with hashing. What makes you think they're
related?
You'll need to wrap them, I believe. (Any reason for using a
struct rather than a class though?)


There will be large numbers of these arrays, and wrappers
would be unnecessary memory bloat. A generic collection
used with a struct will, if I understand generics right, be able
to store the array pointers as efficiently as if I used them
directly, while using the overrides defined by my struct when
hashing and comparing.


Yes, if you use generics you should be okay.
I'd consider using a HybridDictionar y - it's likely to be much
more efficient for the smaller ones. Unfortunately, however,
I don't believe there's a generic version :(


Well then, I guess I will google up an open source generic
HybridDictionar y-like thing - I'm pretty sure it exists somewhere,
I just hoped I could make do with the standard library.


It sounds like you do need generics.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 18 '06 #4
Ole Nielsby <ol*********@sn ailmail.dk> wrote:
Neither, as far as I'm aware. I believe it doesn't override
GetHashCode at all, so the implementation in Object is used.


I may have got this wrong but I think the Object::GetHash Code
implementation will create a sync block if there isn't already
one.


Having just googled this, it appears that under v1.0 and v1.1,
Object.GetHashC ode does indeed create a sync block. It's incidental,
however - and the blog linked below suggests that v2.0 may not behave
in the same way.

http://blogs.msdn.com/brada/archive/.../30/50396.aspx

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 18 '06 #5

Jon Skeet [C# MVP] <sk***@pobox.co m> wrote:
Having just googled this, it appears that under v1.0 and v1.1,
Object.GetHashC ode does indeed create a sync block. It's incidental,
however - and the blog linked below suggests that v2.0 may not
behave in the same way.

http://blogs.msdn.com/brada/archive/.../30/50396.aspx


I think what the blog suggests is, you cannot assume that different
objects with non-overridden GetHashCode() always return different
hash codes even though this may be the case under v1.x and some
MS developers were caught using it.

I'll bet a beer that Object.GetHashC ode still creates a sync block in v2.
There is really nowhere else to store the hash code, unless they'd take
the trouble of storing it in the sync lock slot in a form distinguishable
from a real sync lock, then, in case a sync block is created later, move
the hash code to a slot in the sync block. This somehow seems bad
style though, and I don't think the clr forgers will (or should) do
this to facilitate some exotic uses of Object.GetHashC ode() which is
supposed to be overridden anyway.

Regards/Ole N.
Apr 18 '06 #6
"Ole Nielsby" <ol*********@sn ailmail.dk> wrote:
I want to use readonly arrays as dictionary keys, based on
their content, not their identity. Is this feasible using the
arrays directly, or do I need to wrap them in a struct that
handles GetHashCode and Equal? If so, is such a wrapper
present in the standard class library?


Incidentally, I was also looking to hash a list based on contents
recently. I found "lookup2" by Bob Jenkins, and translated it into C#.
Maybe this will be helpful. (or maybe there already exist better C#
hash implementations elsewhere that I hadn't found...)
/// <summary>
/// Produces a hash out of a list of hash-values.
/// This is a C# implementation of the "lookup2" hash of Bob Jenkins
(1996).
/// Licensing conditions for Lookup2: Bob Jenkins writes, "You may
use this
/// code any way you wish, private, educational, or commercial. It's
free."
/// </summary>
/// <param name="keys">An enumerable list of items; we will
GetHashCode()
/// on each of them</param>
/// <param name="previousH ash">If you call hash more than once, feed
the previous
/// hash result into the next call to hash. Otherwise, if you only
call it once,
/// previousHash can be an arbitrary value.</param>
/// <returns>A 32-bit signed integer hash code.</returns>
private static int lookup2_hash(Sy stem.Collection s.IEnumerable keys,
int previousHash)
{ System.Collecti ons.IEnumerator keyi = keys.GetEnumera tor();
uint a = 0x9e3779b9; uint b=a; uint c = (uint)previousH ash;
uint[] k=new uint[12]; uint length=0; uint sublen=12;
while (true)
{ sublen=0; while (keyi.MoveNext( ) && sublen<12)
{ k[sublen] = (uint)keyi.Curr ent.GetHashCode ();
sublen++; length++;
}
if (sublen==12)
{ a += k[0] +(k[1]<<8) +(k[ 2]<<16) +(k[3]<<24);
b += k[4] +(k[5]<<8) +(k[ 6]<<16) +(k[7]<<24);
c += k[8] +(k[9]<<8) +(k[10]<<16) +(k[11]<<24);

a-=b;a-=c;a^=(c>>13);b-=c;b-=a;b^=(a<<8);c-=a;c-=b;c^=(b>>13);a-=b;a-=c;a^=(c>>12);

b-=c;b-=a;b^=(a<<16);c-=a;c-=b;c^=(b>>5);a-=b;a-=c;a^=(c>>3);b-=c;b-=a;b^=(a<<10);
c-=a;c-=b;c^=(b>>15);
}
else
{ c += length;
if (sublen>=11) c+= (k[10]<<24);
if (sublen>=10) c+= (k[9]<<16);
if (sublen>=9 ) c+= (k[8]<<8);
// the first byte of c is reserved for the sublength
if (sublen>=8) b+= (k[7]<<24);
if (sublen>=7) b+= (k[6]<<16);
if (sublen>=6) b+= (k[5]<<8);
if (sublen>=5) b+= (k[4]);
if (sublen>=4) a+= (k[3]<<24);
if (sublen>=3) a+= (k[2]<<16);
if (sublen>=2) a+= (k[1]<<8);
if (sublen>=1) a+= (k[0]);

a-=b;a-=c;a^=(c>>13);b-=c;b-=a;b^=(a<<8);c-=a;c-=b;c^=(b>>13);a-=b;a-=c;a^=(c>>12);

b-=c;b-=a;b^=(a<<16);c-=a;c-=b;c^=(b>>5);a-=b;a-=c;a^=(c>>3);b-=c;b-=a;b^=(a<<10);
c-=a;c-=b;c^=(b>>15);
return (int)c;
}
}
}

--
Lucian
Apr 19 '06 #7


Ole Nielsby wrote:
How does the GetHashCode() of an array object behave?
In .NET1 atleast it returns an identity of the array, so:

byte[] b1 = {};
byte[] b2 = {};

b1.GetHashCode( ) != b2.GetHashCode( )
I want to use readonly arrays as dictionary keys, based on
their content, not their identity. Is this feasible using the
arrays directly, or do I need to wrap them in a struct that
handles GetHashCode and Equal? If so, is such a wrapper
present in the standard class library?
I would suggest using an IDictionary with keys something like:

public class ComparableArray <T>:
IEquatable<Comp arableArray<T>> , // if you decide to use hashing
IComparable<Com parableArray<T> >
{
int? hash; // if you decide to do hashing
public readonly T[] Array;
public ComparableArray (T[] array, int? hash) {
this.Array = array;
this.hash = hash;
}
public ComparableArray (T[] array): this(array, null) {}
public static int CalculateHash(T[] array) {
// do your stuff with Knuth, or whatever. perhaps something like:
uint x = 0;
foreach ( T t in array ) {
x ^= (uint)t.GetHash Code(); // combine hashes
// rotate make ordering %32 significant, Knuth can do wonders here
x = (uint)(x >> 1) | (uint)((x & 0x1) << 31);
}
return (int)x;
}
public override GetHashCode() {
if ( hash == null )
hash = CalculateHash(A rray);
return (int)hash;
}
public bool Equals(Comparab leArray<T> other) {
return
Array.Length == other.Array.Len gth
&& GetHashCode() == other.GetHashCo de()
&& compareItemByIt em(Array, other.Array);
}
public int Compare(Compara bleArray<T> other) {
int diff;
T[] a1 = Array, a2 = other.Array;
diff = a2.Length - a1.Length;
IComparer<T> comp = Comparer.Defaul t<T>;
if ( diff != 0 )
return diff;
for ( int i = 0; i < a1.Length; ++i ) {
diff = a2[i] - a1[i];
if ( diff != 0 )
return diff;
}
return 0;
}
public static bool compareItemByIt em(T[] a1, T[] a2) {
if ( a1.Length != a2.Length )
throw new ArgumentExcepti on("Unequal lengths", "a1 and a2");
// you could just compare some small subset of actual valuse
// iff your hash is really good
for ( int i = 0; i < a1.Length; ++i )
if ( !a1[i].Equals(a2[i]) )
return false;
return true;
}
// Make interchanging T[] and ComparableArray <T> easy
public static explicit operator T[](ComparableArra y a)
{ returrn a.Array; }
public static implicit operator ComparableArray (byte[] a)
{ return new ComparableArray (a); }
// decorate with operators == != <, <=, ... as you like
}

And use IDictionary<Com parableArray<T> , V> as the interface for storage.

You can use any implementation of IDictinoary using ComparableArray <T>
as the key-type, since it supports both hashing and alphabetic sorting.

The above Key-type implementation will cache the hashing of keys, so you
won't have to rehash the keys in the dictionary when doing lookup. It's
not that much performance gained, but since you need a wrapper Key-type
for GetHashCode and Equals anyway...

If you choose your hash-function well you might just want to sample a
few items when comparing item-by-item.

The best data-structure for this kind of operation is know as a
dictionary (don't confuse that with the C# Dictionary :) and is related
to prefix-trees. Have a look at those if you are screaming for
performance (they are pretty cute too :)
My app will have hundreds or maybe thousands of these
dictionaries, some of which will contain only a single
arraykey-value entry, while others will have thousands
of entries. The number of entries in a particular dictionary
cannot be predicted or estimated reliably in advance. The
values associated with the arrays will be WeakReference
objects, each pointing to an object with a finalizer that will
remove the entry from the containing dictionary.
Try just using Dictionary<Comp arableArray<T>, WeakReference> or
SortedDictionar y<ComparableArr ay<T>, WeakReference>, it will have pretty
OK performance. You can always complicate things by using the
proxy-pattern to dispatch to different implementations of IDictionary
depending on dictionary-size.
Would the Dictionary<Some Class[], WeakReference>
generic collection be suitable for this, or do I need to
spice it up?


No, SomeClass[].GetHashCode screws it up.

--
Helge Jensen
mailto:he****** ****@slog.dk
sip:he********* *@slog.dk
-=> Sebastian cover-music: http://ungdomshus.nu <=-
Apr 19 '06 #8

"Lucian Wischik" <lu***@wischik. com> wrote:
I found "lookup2" by Bob Jenkins, and translated it into C#.


Thanks - this goes into my code snippet collection.
Apr 19 '06 #9

Helge Jensen <he**********@s log.dk> wrote:

I would suggest using an IDictionary with keys something like:

public class ComparableArray <T>:
[...]

The above Key-type implementation will cache the hashing of
keys [...]
I'll leave that part out... the arrays will never be hashed more than
once or twice. (They will be hashed and unified on creation). And
I'll do a struct instead of a class if possible, since large amounts of
them will be created.
[...]

Try just using Dictionary<Comp arableArray<T>, WeakReference>
or SortedDictionar y<ComparableArr ay<T>, WeakReference>, it
will have pretty OK performance.


I'll stick with the Dictionary<> then, and change it only if
profiling identifies it as a hot thing.

(The app is an interpreter for a homebrew programming language
that I use for some rather complex tasks, the meanest of which
is as a parser generator system that can produce parsers for
ill-structured languages like VB6. The data model of my language
implies that constant trees must be unified, like intern'ed strings
but deep structures, rather like attribute based XML. This
'cascading unification' doesn't really fit well with .NET,it's much
faster when objects aren't compacted, allowing hash functions
to operate directly on already unified pointers. Anyway, since
I'm doung a .NET port, I might as well go all the way and use
managed objects, interoperabilit y being more important than
raw speed. It's a pity, though, that the framework doesn't have
some support for cascading unification, like a generalized Intern,
perhaps controlled by an interface, like serialization is controlled
by ISerializable.)

Anyway, thanks for clarifying the array hashing issue.
Apr 19 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
461
by: Peter | last post by:
Hi how can I compare two byte arrays in VB.NET Thank Peter
1
495
by: Iain | last post by:
Hi Hopefully I am missing something really simple with this question, but here goes. I have two Bitarrays that I would like to compare. At the moment, I am XORing one with the other and checking to see if the result has any 1s in it (if so, the arrays are different). This seems to be faster than comparing each bit of the two original arrays one at a time. But I still have to iterate over each element of the array, and I'd like to
12
885
by: Elijah Bailey | last post by:
I have two char arrays of size k. I want to know which one is bigger (exactly like for instance I compare two ints/longs/etc.). What is the fastest way to do this? k <= 10 usually for my application. I tried bcmp / a loop for comparison, but it seems they are very slow compared to comparing longs...Any ideas? I tried splitting the array into longs and comparing, but then I face the high endian/low endian problem on some machines.
4
10535
by: agent349 | last post by:
First off, I know arrays can't be compared directly (ie: if (arrary1 == array2)). However, I've been trying to compare two arrays using pointers with no success. Basically, I want to take three sets of character strings from the user. Then I want to run through each element and compare the two strings. If they match I print they match... I'm having a bit of trouble with the actual loop through each array using the pointers and comparing...
1
2241
by: Donald Grove | last post by:
If I have two arrays, what is a good paradigm for comparing what is in them, to determine what elements they share, or don't share? Specifically, each array could potentially contain the integers 1 to 9, but probably not all 9 integers. MyArray1 = 1,2,3,5,6,7,8 MyArray2 = 1,4,6,7 I would need the answers to be arrays too:
11
10376
by: Sheldon | last post by:
Hi, I have two arrays that are identical and contain 1s and zeros. Only the ones are valid and I need to know where both arrays have ones in the same position. I thought logical_and would work but this example proves otherwise: >>> a = >>> b = >>> Numeric.logical_and(a==6,b==6) 0
1
7696
by: psmahesh | last post by:
Hi folks, I am comparing two arrays and removing matches from the second array from the first array. Can someone take a look at this code below and mention if this is okay and perhaps if there is a better way to achieve it for(i=0;i<arrayA.length;i++){ for(j=0;j<arrayB.length;j++){ if(arrayA==arrayB)
6
2259
by: Jayender | last post by:
Hi, What is the difference between Hashing and Encryption ?
5
2151
by: Andrew Robinson | last post by:
I am working on a pretty simple e-commerce web site that will sell our company gift cards online. Our company and merchant policy prohibits us from storing credit card numbers in any way once we clear the transaction using Pay Flow. To help protect against fraud, I would like to know when the same card number is used to make more than one purchase in a given period of time. Would hashing card numbers and then storing and comparing hashes...
0
8969
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8788
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9476
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9335
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9208
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8210
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6053
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3279
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2193
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.