469,287 Members | 2,767 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,287 developers. It's quick & easy.

Compaing two arrays when element order does not matter

I need to compare two string arrays defined as string[] such that the
two arrays are equal if the contents of the two are the same, where
order doesn't matter and every element must be unique.

E.g. these two arrays would test as equal:

servers[0] = "Admin"
servers[1] = "Finance"
servers[2] = "Payroll"
servers[3] = "Sales"

monitors[0] = "Sales"
monitors[1] = "Payroll"
monitors[2] = "Admin"
monitors[3] = "Finance"

Is there an easy way to do this? The only way I can think of is:
1. check that the number of elements is the same because if not then
they're not equal since every element must be unique.
2. create another array (with one element for each element in the first
array) and run through the second array once for each element in the
first, checking if it exists and marking it in the additional array if
it does.
Frankly, this seems clumsy. Is there a better way?

Feb 2 '06 #1
38 9378
On 2 Feb 2006 05:54:48 -0800, "ssg31415926" <ne**********@gmail.com>
wrote:
I need to compare two string arrays defined as string[] such that the
two arrays are equal if the contents of the two are the same, where
order doesn't matter and every element must be unique.

E.g. these two arrays would test as equal:

servers[0] = "Admin"
servers[1] = "Finance"
servers[2] = "Payroll"
servers[3] = "Sales"

monitors[0] = "Sales"
monitors[1] = "Payroll"
monitors[2] = "Admin"
monitors[3] = "Finance"

Is there an easy way to do this? The only way I can think of is:
1. check that the number of elements is the same because if not then
they're not equal since every element must be unique.
2. create another array (with one element for each element in the first
array) and run through the second array once for each element in the
first, checking if it exists and marking it in the additional array if
it does.
Frankly, this seems clumsy. Is there a better way?

Do the two arrays both have to be arrays? If one can be a Hashtable,
then you can load it and use it to test the array.

Or if they have to be arrays you can use a binary search algorithm to
search one of them but one of them will have to be sorted to do that.

Otis Mukinfus
http://www.otismukinfus.com
http://www.tomchilders.com
Feb 2 '06 #2

"ssg31415926" <ne**********@gmail.com> wrote in message
news:11*********************@g14g2000cwa.googlegro ups.com...
I need to compare two string arrays defined as string[] such that the
two arrays are equal if the contents of the two are the same, where
order doesn't matter and every element must be unique.

E.g. these two arrays would test as equal:

servers[0] = "Admin"
servers[1] = "Finance"
servers[2] = "Payroll"
servers[3] = "Sales"

monitors[0] = "Sales"
monitors[1] = "Payroll"
monitors[2] = "Admin"
monitors[3] = "Finance"

Is there an easy way to do this? The only way I can think of is:
1. check that the number of elements is the same because if not then
they're not equal since every element must be unique.
2. create another array (with one element for each element in the first
array) and run through the second array once for each element in the
first, checking if it exists and marking it in the additional array if
it does.
Frankly, this seems clumsy. Is there a better way?


Do the length check then sort the first array then use binary search to look
for elements of the second array in the first.
Feb 2 '06 #3
Just copy the array elements to a hashtable or sorted list to sort them.
Once sorted, you can iterate through the sorted list and compare.

Again, as I often suggest, consider using a CollectionBase based custom
collection instead of an array and you can sort your collection without
having to copy your array to anything else.

HTH
--
Dale Preston
MCAD C#
MCSE, MCDBA
"ssg31415926" wrote:
I need to compare two string arrays defined as string[] such that the
two arrays are equal if the contents of the two are the same, where
order doesn't matter and every element must be unique.

E.g. these two arrays would test as equal:

servers[0] = "Admin"
servers[1] = "Finance"
servers[2] = "Payroll"
servers[3] = "Sales"

monitors[0] = "Sales"
monitors[1] = "Payroll"
monitors[2] = "Admin"
monitors[3] = "Finance"

Is there an easy way to do this? The only way I can think of is:
1. check that the number of elements is the same because if not then
they're not equal since every element must be unique.
2. create another array (with one element for each element in the first
array) and run through the second array once for each element in the
first, checking if it exists and marking it in the additional array if
it does.
Frankly, this seems clumsy. Is there a better way?

Feb 2 '06 #4

"Dale" <da******@nospam.nospam> wrote in message
news:92**********************************@microsof t.com...
Just copy the array elements to a hashtable or sorted list to sort them.
Once sorted, you can iterate through the sorted list and compare.

Again, as I often suggest, consider using a CollectionBase based custom
collection instead of an array and you can sort your collection without
having to copy your array to anything else.


Why do you think that you need to copy an array to sort it?
Feb 2 '06 #5
Lots of suggestions so far, including some that won't work.

Binary searches won't work, since the OP specifically stated that the
arrays would be in random order.

Hashing one array into a Hashtable then using the Hashtable to test for
presence is probably the fastest method for large arrays, since its
cost increases roughly as a factor of the total number of items, rather
than as n times m or anything like that.

For smaller arrays there is always the simpleminded approach:

if (servers.Length != monitors.Length)
{
return false;
}
for (int i = 0; i < servers.Length; i++)
{
bool found = false;
for (int j = 0; i < monitors.Length j++)
{
if (servers[i] == monitors[j])
{
found = true;
}
}
if (!found)
{
return false;
}
}
return true;

If your arrays are going to be small (a few dozen elements at most,
each) then this isn't a bad solution: it is very simple and anyone will
be able to read and understand it.

The problem is that its cost increases by the _product_ of the array
sizes. So if there are "n" elements in the servers array and "m"
elements in the monitors array then the cost (generally speaking) will
be n times m divided by 2.

Copying one array into a Hashtable and then looking up the elements of
the other array in the Hasthable increases as some factor of n + m,
which is far better as the arrays grow:

Hashtable serverHash = new Hashtable();
foreach (string s in servers)
{
serverHash[s] = s;
}
foreach (string m in monitors)
{
if (serverHash.Contains(m))
{
serverHash.Remove(m);
}
else
{
return false;
}
}
return serverHash.Count == 0;

Feb 2 '06 #6
On 2 Feb 2006 12:09:15 -0800, "Bruce Wood" <br*******@canada.com>
wrote:

All good points...
In Line:
Lots of suggestions so far, including some that won't work.

Binary searches won't work, since the OP specifically stated that the
arrays would be in random order.

Hashing one array into a Hashtable then using the Hashtable to test for
presence is probably the fastest method for large arrays, since its
cost increases roughly as a factor of the total number of items, rather
than as n times m or anything like that.

For smaller arrays there is always the simpleminded approach:

if (servers.Length != monitors.Length)
{
return false;
}
for (int i = 0; i < servers.Length; i++)
{
bool found = false;
for (int j = 0; i < monitors.Length j++)
{
if (servers[i] == monitors[j])
{
found = true;
// no reason to keep going if you found it
break;
}
}
if (!found)
{
return false;
}
}
return true;

If your arrays are going to be small (a few dozen elements at most,
each) then this isn't a bad solution: it is very simple and anyone will
be able to read and understand it.

The problem is that its cost increases by the _product_ of the array
sizes. So if there are "n" elements in the servers array and "m"
elements in the monitors array then the cost (generally speaking) will
be n times m divided by 2.

Copying one array into a Hashtable and then looking up the elements of
the other array in the Hasthable increases as some factor of n + m,
which is far better as the arrays grow:

Hashtable serverHash = new Hashtable();
foreach (string s in servers)
{
serverHash[s] = s;
}
foreach (string m in monitors)
{
if (serverHash.Contains(m))
{
serverHash.Remove(m);
}
else
{
return false;
}
}
return serverHash.Count == 0;


Otis Mukinfus
http://www.otismukinfus.com
http://www.tomchilders.com
Feb 3 '06 #7
Yes, that was particularly silly of me. I meant to include "&& !found"
in the loop condition, but forgot.

Feb 3 '06 #8

"Bruce Wood" <br*******@canada.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
Lots of suggestions so far, including some that won't work.

Binary searches won't work, since the OP specifically stated that the
arrays would be in random order.


He didn't say that they you weren't allowed to sort them (or at least one of
them) which is what I suggested.
Feb 3 '06 #9
I don't assume you have to copy an array to sort it. I just assume that the
OP would have done that already if that's what he wanted. I assumed that he
wanted to keep the existing arrays in their original order. How did you
interpret his question in regards to that?

Also, if he copies the arrays to a hashtable he can solve the problem of
uniqueness. Another way to solve the uniqueness problem is to use a custom
collection so that he can implement his own add method which will guarantee
uniqueness.

--
Dale Preston
MCAD C#
MCSE, MCDBA
"Nick Hounsome" wrote:

"Dale" <da******@nospam.nospam> wrote in message
news:92**********************************@microsof t.com...
Just copy the array elements to a hashtable or sorted list to sort them.
Once sorted, you can iterate through the sorted list and compare.

Again, as I often suggest, consider using a CollectionBase based custom
collection instead of an array and you can sort your collection without
having to copy your array to anything else.


Why do you think that you need to copy an array to sort it?

Feb 3 '06 #10
Even better, rather than

found = true;
break;

is to just:

return true;

--
Dale Preston
MCAD C#
MCSE, MCDBA
"Otis Mukinfus" wrote:
On 2 Feb 2006 12:09:15 -0800, "Bruce Wood" <br*******@canada.com>
wrote:

All good points...
In Line:
Lots of suggestions so far, including some that won't work.

Binary searches won't work, since the OP specifically stated that the
arrays would be in random order.

Hashing one array into a Hashtable then using the Hashtable to test for
presence is probably the fastest method for large arrays, since its
cost increases roughly as a factor of the total number of items, rather
than as n times m or anything like that.

For smaller arrays there is always the simpleminded approach:

if (servers.Length != monitors.Length)
{
return false;
}
for (int i = 0; i < servers.Length; i++)
{
bool found = false;
for (int j = 0; i < monitors.Length j++)
{
if (servers[i] == monitors[j])
{
found = true;


// no reason to keep going if you found it
break;
}
}
if (!found)
{
return false;
}
}
return true;

If your arrays are going to be small (a few dozen elements at most,
each) then this isn't a bad solution: it is very simple and anyone will
be able to read and understand it.

The problem is that its cost increases by the _product_ of the array
sizes. So if there are "n" elements in the servers array and "m"
elements in the monitors array then the cost (generally speaking) will
be n times m divided by 2.

Copying one array into a Hashtable and then looking up the elements of
the other array in the Hasthable increases as some factor of n + m,
which is far better as the arrays grow:

Hashtable serverHash = new Hashtable();
foreach (string s in servers)
{
serverHash[s] = s;
}
foreach (string m in monitors)
{
if (serverHash.Contains(m))
{
serverHash.Remove(m);
}
else
{
return false;
}
}
return serverHash.Count == 0;


Otis Mukinfus
http://www.otismukinfus.com
http://www.tomchilders.com

Feb 3 '06 #11
Thank you all for your suggestions.

SSG

Feb 3 '06 #12
> return true;

No, because then you would return true upon finding the first element
of the servers array in the monitors array, without checking the other
servers.

Feb 3 '06 #13
True. Nonetheless, your hash suggestion was much more efficient. Even a
quicksort becomes expensive as the number of elements grows.

When it comes to sorting versus hashing, I always think of it this way:
Sorting costs more because it's putting the elements _in order_, not
just creating a way to find them quickly. So, I always ask myself
whether I need the elements to be _in order_ for any reason. If the
answer is no, then a sort is usually my last resort, as I don't want to
incur the extra cost to establish an order that I don't need.

In this case the OP was testing for presence. So a sort would incur a
bunch of extra work to order the elements when that ordering wouldn't
really contribute to the solution. Hashing (which you also suggested)
is a much better idea.

Feb 3 '06 #14
On 2 Feb 2006 22:57:27 -0800, "Bruce Wood" <br*******@canada.com>
wrote:
Yes, that was particularly silly of me. I meant to include "&& !found"
in the loop condition, but forgot.

I wasn't throwing rocks at you Bruce. Both ways are fine;o)

Otis Mukinfus
http://www.otismukinfus.com
http://www.tomchilders.com
Feb 4 '06 #15
Simple solution that involves no sorting. Just add all the hash codes
from both arrays and compare the two sums. This O(n) solution has an
extremely slight possibility of a false positive.

// first check length of two arrays, if not equal then done
if( servers.Length != monitors.Length )
return false;

// add all hashcode values of each element in first array
int serversSumOfHashCodes = 0;
for( int index=0; index<servers.Length; index++ )
{
serversSumOfHashCodes += servers[index].GetHashCode();
}

// add all hashcode values of each element in second array
int monitorsSumOfHashCodes = 0;
for( int index=0; index<monitors.Length; index++ )
{
monitorsSumOfHashCodes += monitors[index].GetHashCode();
}

// compare first sumOfHashCodes with second sumOfHashCodes
// if sums are == then the two arrays contain the same of elements
if( serversSumOfHashCodes == monitorsSumOfHashCodes )
return true;

return false;
*** Sent via Developersdex http://www.developersdex.com ***
Feb 5 '06 #16
David Larson <it***************@hotmail.com> wrote:
Simple solution that involves no sorting. Just add all the hash codes
from both arrays and compare the two sums. This O(n) solution has an
extremely slight possibility of a false positive.


But in the case where there *is* a false positive, it'll be a real pain
to track down. Hash codes really *shouldn't* be used as definite
equality checks.

Of course, it's more reasonable to do this as a quick way of returning
a negative, and then testing more exhaustively in the positive case.

(I'd use foreach loops instead of the two for loops, by the way - or
preferrably make each one a call to a single common routine that
contained the loop. Of course, this was demonstrating the idea more
than giving the actual code you'd use.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Feb 5 '06 #17
On Fri, 3 Feb 2006 03:24:18 -0800, "Dale" <da******@nospam.nospam>
wrote:
I don't assume you have to copy an array to sort it. I just assume that the
OP would have done that already if that's what he wanted. I assumed that he
wanted to keep the existing arrays in their original order. How did you
interpret his question in regards to that?

Also, if he copies the arrays to a hashtable he can solve the problem of
uniqueness. Another way to solve the uniqueness problem is to use a custom
collection so that he can implement his own add method which will guarantee
uniqueness.


Is there really anything wrong, given the arrays are the same length,
with just doing a brute force compare. You are going to need to make
a one-to-one match to pass anyway so I figure taking one element at a
time and walking the other array and bailing out of the loop with a
fail the moment you don't find a match. Is there a more elegant way?
Can the more elegant way also be guaranteed to outperform a brute
force attempt in all but the most remote circumstance?

However, I agree with one of the earlier posters however. If I was
going to need this kind of capacity I would be more inclined to use an
Array or some other appropriate class from the Collections namespace
simply for convenience.

Ken Wilson
Seeking viable employment in Victoria, BC
Feb 7 '06 #18
Ken Wilson wrote:
Also, if he copies the arrays to a hashtable he can solve the problem of
uniqueness. Another way to solve the uniqueness problem is to use a custom
collection so that he can implement his own add method which will guarantee
uniqueness.


Is there really anything wrong, given the arrays are the same length,
with just doing a brute force compare. You are going to need to make
a one-to-one match to pass anyway so I figure taking one element at a
time and walking the other array and bailing out of the loop with a
fail the moment you don't find a match. Is there a more elegant way?
Can the more elegant way also be guaranteed to outperform a brute
force attempt in all but the most remote circumstance?


A sort and then compare would generally be O(n log n). Inserting values
into a hashtable and then walking through checking for existence is
likely to be O(n) (assuming reasonably distributed hash codes). Brute
force will be O(n^2). It's only likely to make a difference when you
get to large arrays, but then the difference could be very significant.

(To check the hashtable stuff, I had a look at the docs for Hashtable
which claim that retrieving a value is an O(1) operation. I find that
hard to believe - if every key in the table has the same hashcode, it
has to walk through all of them, making it an O(n) operation. That's
not mentioned anywhere...)

Jon

Feb 7 '06 #19
On 2 Feb 2006 12:09:15 -0800, "Bruce Wood" <br*******@canada.com>
wrote:
Lots of suggestions so far, including some that won't work.

Binary searches won't work, since the OP specifically stated that the
arrays would be in random order.

Hashing one array into a Hashtable then using the Hashtable to test for
presence is probably the fastest method for large arrays, since its
cost increases roughly as a factor of the total number of items, rather
than as n times m or anything like that.

For smaller arrays there is always the simpleminded approach:

I think there is a bug in this code. If the very first item passes
won't the found variable be irreversibly changed to true?
if (servers.Length != monitors.Length)
{
return false;
}
for (int i = 0; i < servers.Length; i++)
{
bool found = false;
for (int j = 0; i < monitors.Length j++)
{
if (servers[i] == monitors[j])
{
found = true;
}
}
if (!found)
{
return false;
}
}
return true;


Following your initial code I have derived this. Excuse me for
wrapping it in a method but I just love re-usable code, [;-)

bool IsEquivalent(string[] servers, string[] monitors)
{
bool equivalent = false; // assume the worst

if (servers.Length == monitors.Length) // same length?
{
bool found = true; // assume the best

for (int i = 0; i < servers.Length; i++)
{
for (int j = 0; j < monitors.Length; j++)
{
if (servers[i] == monitors[j]
{
continue;
}
else
{
found = false;
break;
}
}

if (!found)
{
break;
}
}

equivalent = found;
}

return equivalent;
}

If I am wrong in this analysis please feel free to correct me. It
wouldn't be my first time and I suspect it wouldn't be my last anyway.
[;-)

Ken Wilson
Seeking viable employment in Victoria, BC
Feb 7 '06 #20
On Sun, 05 Feb 2006 12:14:23 -0800, David Larson
<it***************@hotmail.com> wrote:
Simple solution that involves no sorting. Just add all the hash codes
from both arrays and compare the two sums. This O(n) solution has an
extremely slight possibility of a false positive.
What if you're coding a mission critical routine? Is extremely slight
a good safety margin? I would hate to have to chase this down if it
was one of those intermittent bugs you can get in a system. (Think
shuttle o-rings, it can happen with software too).
// first check length of two arrays, if not equal then done
if( servers.Length != monitors.Length )
return false;

// add all hashcode values of each element in first array
int serversSumOfHashCodes = 0;
for( int index=0; index<servers.Length; index++ )
{
serversSumOfHashCodes += servers[index].GetHashCode();
}

// add all hashcode values of each element in second array
int monitorsSumOfHashCodes = 0;
for( int index=0; index<monitors.Length; index++ )
{
monitorsSumOfHashCodes += monitors[index].GetHashCode();
}

// compare first sumOfHashCodes with second sumOfHashCodes
// if sums are == then the two arrays contain the same of elements
if( serversSumOfHashCodes == monitorsSumOfHashCodes )
return true;

return false;
*** Sent via Developersdex http://www.developersdex.com ***


Ken Wilson
Seeking viable employment in Victoria, BC
Feb 7 '06 #21
"Ken Wilson" <kw*********@NOshaw.SPAMca> wrote in message
news:f4********************************@4ax.com...
On 2 Feb 2006 12:09:15 -0800, "Bruce Wood" <br*******@canada.com>
wrote:
Lots of suggestions so far, including some that won't work.

Binary searches won't work, since the OP specifically stated that the
arrays would be in random order.

Hashing one array into a Hashtable then using the Hashtable to test for
presence is probably the fastest method for large arrays, since its
cost increases roughly as a factor of the total number of items, rather
than as n times m or anything like that.

For smaller arrays there is always the simpleminded approach:


I think there is a bug in this code. If the very first item passes
won't the found variable be irreversibly changed to true?


found is reset to false at the start of each outside loop-through.
Feb 7 '06 #22

Jon Skeet [C# MVP] wrote:
(To check the hashtable stuff, I had a look at the docs for Hashtable
which claim that retrieving a value is an O(1) operation. I find that
hard to believe - if every key in the table has the same hashcode, it
has to walk through all of them, making it an O(n) operation. That's
not mentioned anywhere...)


Well, yes, but that would have to be a spectacularly badly designed
Hashtable, a spectacularly bad hash key algorithm, or very, very bad
luck. Generally speaking, hash storage and retrieval is O(k), which I
suppose using a different notation could be called O(1). Anyway, it's
typically a heck of a lot faster even than binary search.

Feb 7 '06 #23
You're wrong. :)

Your inner loop will continue until it finds a monitor that does _not_
equal the current server. In other words, for arrays of greater than
one item each, it will always return false.

My original code was correct, apart from the silly mistake of not
bailing out of the inner loop when I found a match.
If the very first item passes won't the found variable be irreversibly changed to true?


Well, yes, but that's the point. A match was found; it doesn't matter
if all of the other items in the array _don't_ match... all that
matters is that one _did_. In fact, once "found" is set to true,
according to the OP's description of the problem, it's guaranteed that
none of the other entries will match.

Your inner loop does the inverse: it returns true only if _every_
monitor matches the current server, which will never be true if there
is more than one monitor in the array.

Feb 7 '06 #24
On 7 Feb 2006 10:12:18 -0800, "Bruce Wood" <br*******@canada.com>
wrote:
You're wrong. :)

Your inner loop will continue until it finds a monitor that does _not_
equal the current server. In other words, for arrays of greater than
one item each, it will always return false.

My original code was correct, apart from the silly mistake of not
bailing out of the inner loop when I found a match.
If the very first item passes won't the found variable be irreversibly changed to true?


Well, yes, but that's the point. A match was found; it doesn't matter
if all of the other items in the array _don't_ match... all that
matters is that one _did_. In fact, once "found" is set to true,
according to the OP's description of the problem, it's guaranteed that
none of the other entries will match.

Your inner loop does the inverse: it returns true only if _every_
monitor matches the current server, which will never be true if there
is more than one monitor in the array.


Respectfully I beg to differ. Here is the opening paragraph from the
OP's post.

---

I need to compare two string arrays defined as string[] such that the
two arrays are equal if the contents of the two are the same, where
order doesn't matter and every element must be unique.

E.g. these two arrays would test as equal:

servers[0] = "Admin"
servers[1] = "Finance"
servers[2] = "Payroll"
servers[3] = "Sales"

monitors[0] = "Sales"
monitors[1] = "Payroll"
monitors[2] = "Admin"
monitors[3] = "Finance"

---

As you can see, according to the original problem statement, there has
to be a one-to-one match for each element in both arrays. Otherwise,
the length of the second array is irrelevant if all you need it one
match because you might find that one match in a shorter or a longer
array if it's only one you are looking for.

Ken Wilson
Seeking viable employment in Victoria, BC
Feb 7 '06 #25
Yes, but that's why there are two loops. The inner loop is only asking
the question, "Is there any element in the second array that matches
the current element in the first array?" The outer loop asks the
question, "Does the inner loop test pass for every element in the first
array?"

Let's take the OP's example and run it through the code you posted:

servers[0] = "Admin"
servers[1] = "Finance"
servers[2] = "Payroll"
servers[3] = "Sales"

monitors[0] = "Sales"
monitors[1] = "Payroll"
monitors[2] = "Admin"
monitors[3] = "Finance"

For i = 0 and j = 0, "Sales" does not equal "Admin", so found will be
set to false and we break out of the "j" loop. Since found is now
false, we execute the contents of the "if (!found)" test, and break out
of the outer loop as well. The method returns false.

Now, the same example using my posted code, with Otis' adjustment to
improve the efficiency by breaking out of the inner loop when a match
is found.

i = 0
found is set to false
j = 0
"Sales" does not equal "Admin", so found remains false
j = 1
"Sales" does not equal "Payroll", so found remains false
j = 2
"Sales" equals "Sales", so found is set to true, and we exit the inner
loop
found is true, so we do not return yet
i = 1
found is set to false
j = 0
"Finance" does not equal "Sales", so found remains false
j = 1
"Finance" does not equal "Payroll", so found remains false
j = 2
"Finance" does not equal "Admin", so found remains false
j = 3
"Finance" equals "Finance", so found is set to true and we exit the
inner loop
found is true, so we do not return yet
i = 2
found is set to false
j = 0
"Payroll" does not equal "Sales", so found remains false
j = 1
"Payroll" equals "Payroll", so found is set to true and we exit the
inner loop
found is true, so we do not return yet
i = 3
found is set to false
j = 0
"Sales" equals "Sales", so found is set to true and we exit the inner
loop
found is true, so we do not return yet
The outer loop is done, the method returns true

Feb 7 '06 #26
Bruce Wood <br*******@canada.com> wrote:
Jon Skeet [C# MVP] wrote:
(To check the hashtable stuff, I had a look at the docs for Hashtable
which claim that retrieving a value is an O(1) operation. I find that
hard to believe - if every key in the table has the same hashcode, it
has to walk through all of them, making it an O(n) operation. That's
not mentioned anywhere...)


Well, yes, but that would have to be a spectacularly badly designed
Hashtable, a spectacularly bad hash key algorithm, or very, very bad
luck. Generally speaking, hash storage and retrieval is O(k), which I
suppose using a different notation could be called O(1). Anyway, it's
typically a heck of a lot faster even than binary search.


*Typically* is fine - but I thought the point of the O notation was
that it was an upper bound... Maybe I'm misremembering though :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Feb 7 '06 #27
Computing all the hash codes is the slowest part of storing all the
strings in a dictionary and then looking them up, so I don't think this
will save much over just copying one list to a dictionary and then
looking up each element of the other list.

Feb 7 '06 #28
This is a perfect illustration of why the faster and simpler dictionary
code is preferable. Maybe the nested loop solution is easier for a
freshman programming student to understand, but it's about 100 times
more likely to be wrong. Anyone who knows about the Dictionary
collection could understand and verify the dictionary solution in a
minute or two.

Feb 7 '06 #29
Yes, and no.

Notice that the dictionary solution has the subtle distinction that you
have to remove the dictionary entries as you find them, and then ensure
that the dictionary is empty at the end to guard against duplicates in
the second list giving a false positive.

Nonetheless, overall I agree with your assessment. I would choose the
dictionary solution, myself.

Feb 8 '06 #30
On 7 Feb 2006 11:18:32 -0800, "Bruce Wood" <br*******@canada.com>
wrote:
Yes, but that's why there are two loops. The inner loop is only asking
the question, "Is there any element in the second array that matches
the current element in the first array?" The outer loop asks the
question, "Does the inner loop test pass for every element in the first
array?"


<snip long analysis>

With respect, I'm a bit of a dog with a bone on issues sometimes so I
will apologize right now. [8-)

I ran both of our algorithms. The error I thought I spotted in your
algorithm still exists in that as long as one item tests right you
will never report a failure of the whole because you never set found
back to false. I did add the break in the inner loop if server and
monitor match. On the upside, mine is buggy both ways. Don't know
where I went wrong but it's a good lesson on why you don't start
winging code in the early AM when you should be in bed (referring to
myself of course.) I'm going to chase this further in my own code
because, as another poster said, a college entrant should be able to
code a brute force one.

Ken Wilson
Seeking viable employment in Victoria, BC
Feb 8 '06 #31
Just for fun, I cut and pasted my code into VS and ran it. I did find
one bug, but not the one you reported. The inner loop contains two
typos. I wrote:

for (int j = 0; i < monitors.Length j++)

which is, of course, at least missing a semicolon. However, I also
mistyped "i <" instead of "j <". The correct line is

for (int j = 0; j < monitors.Length; j++)

to which I added Otis' test to get:

for (int j = 0; j < monitors.Length && !found; j++)

With that change, it works just fine.

Anyway, I think we've beaten this one to death. :)

Feb 8 '06 #32
On 8 Feb 2006 09:46:37 -0800, "Bruce Wood" <br*******@canada.com>
wrote:
Just for fun, I cut and pasted my code into VS and ran it. I did find
one bug, but not the one you reported. The inner loop contains two
typos. I wrote:

for (int j = 0; i < monitors.Length j++)

which is, of course, at least missing a semicolon. However, I also
mistyped "i <" instead of "j <". The correct line is

for (int j = 0; j < monitors.Length; j++)

to which I added Otis' test to get:

for (int j = 0; j < monitors.Length && !found; j++)

With that change, it works just fine.

Anyway, I think we've beaten this one to death. :)


I believe so. At least now I am more sure I would rather just use a
collection of some sort where all the headaches have already been had
by someone other than me. :-)

Ken Wilson
Seeking viable employment in Victoria, BC
Feb 8 '06 #33
I don't think all that much subtlety is required. Really, beginning
programmers should be taught about associative stores in about the
second week of class. It would save the world millions of coding
hours, and billions of billions of wasted computing cycles.

bool UnorderedListContentsAreEqual<T>(List<T> a, List<T> b)
{
if (a.Count != b.Count) return false;
Dictionary<T,int> d = new Dictionary<T,int>();
foreach (T t in a)
{
++d[t];
}
foreach (T t in b)
{
if (--d[t] < 0) return false;
}
return true;
}

Feb 12 '06 #34
Step 1. Check Length for both array if not equal then return false.

Step 2. Calculate Hash for each element in Array 1 and add them (for
string values you can use hashing function like :
SIGMA( ascii value of A[i] * i)) for i = 0 to len of array -1

Step 3 Do the same for array 2

If the sum are same them both array are equal as per defn else false.

Mar 7 '06 #35


Rana wrote:
Step 2. Calculate Hash for each element in Array 1 and add them (for
string values you can use hashing function like :
SIGMA( ascii value of A[i] * i)) for i = 0 to len of array -1

Step 3 Do the same for array 2

If the sum are same them both array are equal as per defn else false.


Not really, hashes are not guaranteed to be unique. what hash'es should
fulfill is the implication:

a.Equals(b) => a.GetHashCode() == b.GetHashCode()

that's not the same as the implication:

a.GetHashCode() == b.GetHashCode() => a.Equals(b)

which is required for your implementation to be correct. *And* your
solution have overflow problems, even *if*:

a.Equals(b) <=> a.GetHashCode() == b.GetHashCode()

the sum is will most likely produce overflows (it almost certainly will
if a good hashfunction is used) and invalidly declare lists equal.

Actually, when I combine hash'es I usually prefer xor'ing (^=) hash'es
together, it doesn't produce any overflow (won't fail in "checked" code)
and is associative but it also allows removal of elements by the same
operation. It doesn't work well on multisets though :)

What you *do* however know after hashing all elements is:

hashmap(lista) != hashmap(listb) => lista != listb

and if the lists are almost never equals (if for example this is an
assertion that the lists are differenct) this may be faster than using
other comparisons. But I'd take a look at it with a
performance/memory-profiler before writing code that complex for the
situation.

Probably the simplest ways to compare the lists are:
1. Use the dictionary solution proposed
2. Sort them and compare item by item.

If this is too inefficient for the concrete scenario perhaps a some
reqirement on the data-structures should be made?

--
Helge Jensen
mailto:he**********@slog.dk
sip:he**********@slog.dk
-=> Sebastian cover-music: http://ungdomshus.nu <=-
Mar 7 '06 #36
I was not talking of the hashcode given by object.GetHashCode().
I agree as you said, a.Equals(b) does not imply a.GetHashCode() ==
b.GetHashCode()

Rather we need to calculate hash values using some functions on the
elements(values) of the array

if the elements are string(considering them to be char array) we can
use the function
say HASH(arr) -> sigma( ascii value of arr[i] * i + 1) for i = 0 to
arr.Length - 1

for example
"AB" => 1*65 + 2*66
"BC" => 1*66 + 2*67 etc.

unless the strings are extremely large the summation will not overflow.

and also HASH(strA) == HASH(StrB) => StrA==StrB (the fn above works
ascii strings)

Please correct me if I am wrong

Mar 8 '06 #37
Rana <ch************@gmail.com> wrote:
I was not talking of the hashcode given by object.GetHashCode().
I agree as you said, a.Equals(b) does not imply a.GetHashCode() ==
b.GetHashCode()

Rather we need to calculate hash values using some functions on the
elements(values) of the array
But the point of *hash* values is that they're still not unique. They
manage to give an absolute "these values *aren't* equal" answer in most
cases while taking *less information size* than the original. If the
information size is the same as the original, you're no better off than
just comparing for equality the long way - and if the information size
*is* less than the original, then there must be possible duplicates by
the pigeon-hole principle.
if the elements are string(considering them to be char array) we can
use the function
say HASH(arr) -> sigma( ascii value of arr[i] * i + 1) for i = 0 to
arr.Length - 1

for example
"AB" => 1*65 + 2*66
"BC" => 1*66 + 2*67 etc.

unless the strings are extremely large the summation will not overflow.
The values won't be unique though. Two spaces (1*32+2*32) would end up
with the same hash code as '`' (1*96) for example.
and also HASH(strA) == HASH(StrB) => StrA==StrB (the fn above works
ascii strings)


No, as shown by the counterexample above.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 8 '06 #38


Rana wrote:
I was not talking of the hashcode given by object.GetHashCode().
OK. The same reasoning applies to any hash-code, it's not really
important whether it's object.GetHashCode().
I agree as you said, a.Equals(b) does not imply a.GetHashCode() ==
b.GetHashCode()
Is there a typing error here? I said the opposite: "does imply" :)

For a function, h, to be a hash-function (w.r.t. two specific
equality-relations of the domain and codomain of h). it must fullfill:

Domain equality preservation
a == b => h(a) == h(b)

Codomain inequality preservation
h(a) != h(b) => a != b

Although this definition allows i.e. the constant-function

c(x) = 0

as a hashing function for strings to integers, it is usually implicit
that h is should map elements in its domain into it's codomain "with a
good amount of spreading" -- which determines the "quality" of the
hashing done by h. For a better definition of that look at
"perfect-hashing" and "cryptographic-hashing".

If the codomain of h is smaller than the domain of h, the pigeon-hole
principle demonstrates that h cannot be one-to-one, and in that case:

h(a) == h(b) =/=> a == b
if the elements are string(considering them to be char array) we can
use the function
say HASH(arr) -> sigma( ascii value of arr[i] * i + 1) for i = 0 to
arr.Length - 1

for example
"AB" => 1*65 + 2*66
"BC" => 1*66 + 2*67 etc.

unless the strings are extremely large the summation will not overflow.


It does have other problems though, most notably that it breaks codomain
inquality preservation from above, as demonstrated by Jon Skeet, which
you *need* to correctly decide permutational equality.

Also, the hash-function you propose does not help you to decide
equality-by-permutation, since it explicitly encodes the position of
elements.

Lets get back to the original problem, comparing arrays unsorted -- the
permutation-equality problem: can one array be permuted into the other.
Lets assume that the arrays have equal length, N, otherwise the problem
is simple :)

Both the dictionary and the sorting approach transforms the input into a
canonical form, which can then be more efficiently compared.

The dictionary-approach canonically represents the set of unordered
permutations of a squence as a counter for each occurence.
It requires the elements to be effectively hashable,and allows
permutational-equality to be decided in O(N) memory and O(N) time by
comparing the sizes of the keys-sets and traversing one hash-table
looking up counts in the other.

The sorting solution canonically represents the set of unordered
permutations of a sequence by ordering the elements. This places the
stronger, compared to hashability, demand of an ordering on the
elements. After bringing both arrays on a canonical form, the arrays can
be compared item by item, deciding permutational-equality on O(N) memory
and O(N log N) time, depending on the choice of sorting algorithm.

What *hashing* can do for you (in this case) is that it can with O(1)
memory and O(N) time decide "roughly" if it is possible for the two
arrays to be equals. That can be done by using a (preferably surjective)
function, s, on each element (possibly identity, as in your example),
and combining these via any associative function, c, for example: xor,
times, sum or whatever. The associative property *exactly* removes the
ordering. If the hashes of two arrays matches you still have to do a
real comparison, but if they dont, you have proved they cannot be
permuted into each other.

If you expect to do many comparisons of arrays to other arrays, it *may*
be a good idea to use this approach before doing a full comparison. You
can amortize the effort spent calculation the hash over the number of
times you do comparisons.

Depending on the way you hash you may get quite good results with this
technuqie, but I doubt it would really be practial. I would look into
hashing one of the canonicalized representations instead and storing
that along with the array. I suspect the canonical representation will
allow you to achieve a better utilization of the codomain of the hashing.

--
Helge Jensen
mailto:he**********@slog.dk
sip:he**********@slog.dk
-=> Sebastian cover-music: http://ungdomshus.nu <=-
Mar 8 '06 #39

This discussion thread is closed

Replies have been disabled for this discussion.

By using this site, you agree to our Privacy Policy and Terms of Use.