469,306 Members | 2,121 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,306 developers. It's quick & easy.

Quickly finding duplicates in an ArrayList

Basically I have an ArrayList of strings; I need a fast way of
searching through and comparing the elements of the ArrayList and the
return an ArrayList of items that have 3 Duplicates.

For example, if the ArrayList has the following elements:

elems[0] = "blue"
elems[1] = "red"
elems[2] = "blue"
elems[3] = "green"
elems[4] = "red"
elems[5] = "red"

I want to have a function that goes through the list and then returns a
new arraylist like:

newElems[0] = "red"
newElems[1] = "red"
newElems[2] = "red"

In actuality my arraylist is an array of objects, which have a string
property and I'm referencing like: elems[0].filename = "red"; but the
above example is easier to follow.

I would like to be able to perform this search as fast as possible. I
personally don't have any real experience or familiarity with topics
such as binary trees, etc; would something like that be the route to
take? Thanks.

Nov 16 '05 #1
11 33004
I know of two routes that you can follow.

The first - look at the ArrayList.IndexOf method, which is probably the
quickest way to find a certain item in the underlying array. I believe there
is an overload to specify at which index to start the search.

Secondly, if you create the arraylist yourself (not some other 3rd-party
code) you could use ArrayList.Contains() when populating the list to ensure
the duplicates don't get into the list in the first place.

My "two" euro cents. :)

"paradox" <de*@demiurgeinc.com> schrieb im Newsbeitrag
news:11**********************@o13g2000cwo.googlegr oups.com...
Basically I have an ArrayList of strings; I need a fast way of
searching through and comparing the elements of the ArrayList and the
return an ArrayList of items that have 3 Duplicates.

For example, if the ArrayList has the following elements:

elems[0] = "blue"
elems[1] = "red"
elems[2] = "blue"
elems[3] = "green"
elems[4] = "red"
elems[5] = "red"

I want to have a function that goes through the list and then returns a
new arraylist like:

newElems[0] = "red"
newElems[1] = "red"
newElems[2] = "red"

In actuality my arraylist is an array of objects, which have a string
property and I'm referencing like: elems[0].filename = "red"; but the
above example is easier to follow.

I would like to be able to perform this search as fast as possible. I
personally don't have any real experience or familiarity with topics
such as binary trees, etc; would something like that be the route to
take? Thanks.

Nov 16 '05 #2
Though it is written in VB.NET, the demo project at the link below is easy
to follow and should be helpful.

http://getdotnetco.web101.discountas...wnloadPage.htm
"paradox" <de*@demiurgeinc.com> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
Basically I have an ArrayList of strings; I need a fast way of
searching through and comparing the elements of the ArrayList and the
return an ArrayList of items that have 3 Duplicates.

For example, if the ArrayList has the following elements:

elems[0] = "blue"
elems[1] = "red"
elems[2] = "blue"
elems[3] = "green"
elems[4] = "red"
elems[5] = "red"

I want to have a function that goes through the list and then returns a
new arraylist like:

newElems[0] = "red"
newElems[1] = "red"
newElems[2] = "red"

In actuality my arraylist is an array of objects, which have a string
property and I'm referencing like: elems[0].filename = "red"; but the
above example is easier to follow.

I would like to be able to perform this search as fast as possible. I
personally don't have any real experience or familiarity with topics
such as binary trees, etc; would something like that be the route to
take? Thanks.

Nov 16 '05 #3
Well... actually the point of the code is to find the duplicates and
then display them to users. To give you a much better understanding,
the program I am creating is for determining conflicts in the pk3 (zip)
files used in Quake 3. As people develop new maps, they create and use
existing map elements (textures, shaders, etc) and they are all stored
in these pk3 files. The problem that occurs is that they create their
own version of certain shaders or textures without renaming or changing
the filename directory structure; and this causes the game to fail to
load properly. Specifically the problem seems to be associated to
occasions when there are 3 pk3 files with a shader element that has the
same name but different file size.

I've gone through the pk3 (zip) files and populated an arraylist called
conlficts with a object of type element (a class I've defined).

I'm trying to provide a report that will go thru this list and return
these elements (and their associated pk3 files). So that the end user
can easily determine which files that they need to remove in order to
be able to connect successfully.

Using IndexOf isn't much different than the slow 3 nested loop approach
I'm trying to avoid.

Nov 16 '05 #4
I would iterate through the ArrayList and for each item, add it to a
HashTable if it does not already exist in the HashTable and set its
value to 1. If it already exists, increment the value of that item in
the HashTable.

Once you've gone through the ArrayList, iterate through the HashTable
to see which items have a value greater than 1.

It's heavy on memory, but runs in linear time. Should be much better
than 3 nested loops.

Nov 16 '05 #5
Walk through the array list and add each item to a hashtable that
stores the number of occurrences of that item. Since hashtable inserts
and searches have O(1) time complexity the entire operation would be
O(n), which is good.

Hashtable table = new Hashtable();
foreach (string item in elems)
{
if (!table.Contains(item))
{
table.Add(item, 0);
}

int count = (int)table[item] + 1;
table[item] = count;
if (count == 2)
{
// Duplicate items are identified here.
}
}

Brian

paradox wrote:
Basically I have an ArrayList of strings; I need a fast way of
searching through and comparing the elements of the ArrayList and the
return an ArrayList of items that have 3 Duplicates.

For example, if the ArrayList has the following elements:

elems[0] = "blue"
elems[1] = "red"
elems[2] = "blue"
elems[3] = "green"
elems[4] = "red"
elems[5] = "red"

I want to have a function that goes through the list and then returns
a new arraylist like:

newElems[0] = "red"
newElems[1] = "red"
newElems[2] = "red"

In actuality my arraylist is an array of objects, which have a string
property and I'm referencing like: elems[0].filename = "red"; but the
above example is easier to follow.

I would like to be able to perform this search as fast as possible. I
personally don't have any real experience or familiarity with topics
such as binary trees, etc; would something like that be the route to
take? Thanks.


Nov 16 '05 #6
This is almost what I was going to suggest, but with the difference that
since you need to track the original ArrayList indexes of the duplicate
items, you should create new ArrayLists and store them in the hashtable.
Something like:

Hashtable h = new Hashtable();
for(i=0; i < MainArrayList.Count; i++) {
// get a key (name, whatever) in some way that's appropriate for your
objects:
someKeyType key = GetKeyFromObject(MainArrayList[i]);
// make a new arraylist if necessary
if(h[key] == null) {
h[key] = new ArrayList();
}
// store the index of this item in the arraylist
((ArrayList)h[key]).Add(i);
}
foreach(someKeyType key in h.Keys) {
if( ((ArrayList)h[key]).Count > 1) {
// do whatever you need to do to show the duplicates to the user.
// ((ArrayList)h[key]) contains the indexes of the duplicate items.
}
}

Again, harsh on memory, but fast.
"Kenny" <kv*@thereturnexchange.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
I would iterate through the ArrayList and for each item, add it to a
HashTable if it does not already exist in the HashTable and set its
value to 1. If it already exists, increment the value of that item in
the HashTable.

Once you've gone through the ArrayList, iterate through the HashTable
to see which items have a value greater than 1.

It's heavy on memory, but runs in linear time. Should be much better
than 3 nested loops.

Nov 16 '05 #7
Very nice Jason, that is exactly what I needed.

Thank you all very much!

Nov 16 '05 #8
Out of curiousity.. how exactly does a Hashtable work? Why is it so
fast?

Nov 16 '05 #9
Google for it first. There's a lot of information available that does
a better job of explaining it than I could do. If there is still
something you don't quite understand go ahead and post back and we'll
be more than happy to clarify something.

Brian

paradox wrote:
Out of curiousity.. how exactly does a Hashtable work? Why is it so
fast?


Nov 16 '05 #10
Yeah, I did google it immediately after posting the question and was
able to find sufficient information. If I could of deleted the post, I
would have. ;)

I do have a small question though. Does the .NET Hashtable work in
exactly the same manner as a general Hashtable? Also, I tried using
reflector to track down the GetHashCode function to see how it worked;
I ended up at the String class (taking into consideration the type of
key I was using was a string), and the code I found for the GetHashCode
function was:

[MethodImpl(MethodImplOptions.InternalCall)]
public override extern int GetHashCode();

I couldn't figure out where to find anymore info...

Nov 16 '05 #11
The GetHashCode method is a virtual member of an Object so everything
has a default implementation for it. The string's version of the
method is internal to the .NET framework. That's why you can't see it.
One of the most frequently used ways of getting the hash code of an
object is to XOR all of the object's fields. In the case of a string
you might XOR all characters together. I seriously doubt that's what
the framework is doing though. XOR works well because it is fast and
produces a very random distribution of values. That's important when
it comes time to map the hash code to a slot in the memory array. A
good distribution of hash codes lowers the probability that two keys
will get mapped to the same slot in memory. This is called a
collision. There are many strategies for handling collisions. I
believe the Hashtable in dotnet uses quadratic probing. This strategy
involves number theory and the use of prime numbers.

paradox wrote:
Yeah, I did google it immediately after posting the question and was
able to find sufficient information. If I could of deleted the post,
I would have. ;)

I do have a small question though. Does the .NET Hashtable work in
exactly the same manner as a general Hashtable? Also, I tried using
reflector to track down the GetHashCode function to see how it
worked; I ended up at the String class (taking into consideration
the type of key I was using was a string), and the code I found for
the GetHashCode function was:

[MethodImpl(MethodImplOptions.InternalCall)]
public override extern int GetHashCode();

I couldn't figure out where to find anymore info...


Nov 16 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Erich | last post: by
reply views Thread by Timo Nentwig | last post: by
6 posts views Thread by Maxi | last post: by
Thekid
3 posts views Thread by Thekid | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by harlem98 | last post: by
1 post views Thread by Geralt96 | last post: by
reply views Thread by harlem98 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.