473,563 Members | 2,747 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Quickly finding duplicates in an ArrayList

Basically I have an ArrayList of strings; I need a fast way of
searching through and comparing the elements of the ArrayList and the
return an ArrayList of items that have 3 Duplicates.

For example, if the ArrayList has the following elements:

elems[0] = "blue"
elems[1] = "red"
elems[2] = "blue"
elems[3] = "green"
elems[4] = "red"
elems[5] = "red"

I want to have a function that goes through the list and then returns a
new arraylist like:

newElems[0] = "red"
newElems[1] = "red"
newElems[2] = "red"

In actuality my arraylist is an array of objects, which have a string
property and I'm referencing like: elems[0].filename = "red"; but the
above example is easier to follow.

I would like to be able to perform this search as fast as possible. I
personally don't have any real experience or familiarity with topics
such as binary trees, etc; would something like that be the route to
take? Thanks.

Nov 16 '05 #1
11 33477
I know of two routes that you can follow.

The first - look at the ArrayList.Index Of method, which is probably the
quickest way to find a certain item in the underlying array. I believe there
is an overload to specify at which index to start the search.

Secondly, if you create the arraylist yourself (not some other 3rd-party
code) you could use ArrayList.Conta ins() when populating the list to ensure
the duplicates don't get into the list in the first place.

My "two" euro cents. :)

"paradox" <de*@demiurgein c.com> schrieb im Newsbeitrag
news:11******** **************@ o13g2000cwo.goo glegroups.com.. .
Basically I have an ArrayList of strings; I need a fast way of
searching through and comparing the elements of the ArrayList and the
return an ArrayList of items that have 3 Duplicates.

For example, if the ArrayList has the following elements:

elems[0] = "blue"
elems[1] = "red"
elems[2] = "blue"
elems[3] = "green"
elems[4] = "red"
elems[5] = "red"

I want to have a function that goes through the list and then returns a
new arraylist like:

newElems[0] = "red"
newElems[1] = "red"
newElems[2] = "red"

In actuality my arraylist is an array of objects, which have a string
property and I'm referencing like: elems[0].filename = "red"; but the
above example is easier to follow.

I would like to be able to perform this search as fast as possible. I
personally don't have any real experience or familiarity with topics
such as binary trees, etc; would something like that be the route to
take? Thanks.

Nov 16 '05 #2
Though it is written in VB.NET, the demo project at the link below is easy
to follow and should be helpful.

http://getdotnetco.web101.discountas...wnloadPage.htm
"paradox" <de*@demiurgein c.com> wrote in message
news:11******** **************@ o13g2000cwo.goo glegroups.com.. .
Basically I have an ArrayList of strings; I need a fast way of
searching through and comparing the elements of the ArrayList and the
return an ArrayList of items that have 3 Duplicates.

For example, if the ArrayList has the following elements:

elems[0] = "blue"
elems[1] = "red"
elems[2] = "blue"
elems[3] = "green"
elems[4] = "red"
elems[5] = "red"

I want to have a function that goes through the list and then returns a
new arraylist like:

newElems[0] = "red"
newElems[1] = "red"
newElems[2] = "red"

In actuality my arraylist is an array of objects, which have a string
property and I'm referencing like: elems[0].filename = "red"; but the
above example is easier to follow.

I would like to be able to perform this search as fast as possible. I
personally don't have any real experience or familiarity with topics
such as binary trees, etc; would something like that be the route to
take? Thanks.

Nov 16 '05 #3
Well... actually the point of the code is to find the duplicates and
then display them to users. To give you a much better understanding,
the program I am creating is for determining conflicts in the pk3 (zip)
files used in Quake 3. As people develop new maps, they create and use
existing map elements (textures, shaders, etc) and they are all stored
in these pk3 files. The problem that occurs is that they create their
own version of certain shaders or textures without renaming or changing
the filename directory structure; and this causes the game to fail to
load properly. Specifically the problem seems to be associated to
occasions when there are 3 pk3 files with a shader element that has the
same name but different file size.

I've gone through the pk3 (zip) files and populated an arraylist called
conlficts with a object of type element (a class I've defined).

I'm trying to provide a report that will go thru this list and return
these elements (and their associated pk3 files). So that the end user
can easily determine which files that they need to remove in order to
be able to connect successfully.

Using IndexOf isn't much different than the slow 3 nested loop approach
I'm trying to avoid.

Nov 16 '05 #4
I would iterate through the ArrayList and for each item, add it to a
HashTable if it does not already exist in the HashTable and set its
value to 1. If it already exists, increment the value of that item in
the HashTable.

Once you've gone through the ArrayList, iterate through the HashTable
to see which items have a value greater than 1.

It's heavy on memory, but runs in linear time. Should be much better
than 3 nested loops.

Nov 16 '05 #5
Walk through the array list and add each item to a hashtable that
stores the number of occurrences of that item. Since hashtable inserts
and searches have O(1) time complexity the entire operation would be
O(n), which is good.

Hashtable table = new Hashtable();
foreach (string item in elems)
{
if (!table.Contain s(item))
{
table.Add(item, 0);
}

int count = (int)table[item] + 1;
table[item] = count;
if (count == 2)
{
// Duplicate items are identified here.
}
}

Brian

paradox wrote:
Basically I have an ArrayList of strings; I need a fast way of
searching through and comparing the elements of the ArrayList and the
return an ArrayList of items that have 3 Duplicates.

For example, if the ArrayList has the following elements:

elems[0] = "blue"
elems[1] = "red"
elems[2] = "blue"
elems[3] = "green"
elems[4] = "red"
elems[5] = "red"

I want to have a function that goes through the list and then returns
a new arraylist like:

newElems[0] = "red"
newElems[1] = "red"
newElems[2] = "red"

In actuality my arraylist is an array of objects, which have a string
property and I'm referencing like: elems[0].filename = "red"; but the
above example is easier to follow.

I would like to be able to perform this search as fast as possible. I
personally don't have any real experience or familiarity with topics
such as binary trees, etc; would something like that be the route to
take? Thanks.


Nov 16 '05 #6
This is almost what I was going to suggest, but with the difference that
since you need to track the original ArrayList indexes of the duplicate
items, you should create new ArrayLists and store them in the hashtable.
Something like:

Hashtable h = new Hashtable();
for(i=0; i < MainArrayList.C ount; i++) {
// get a key (name, whatever) in some way that's appropriate for your
objects:
someKeyType key = GetKeyFromObjec t(MainArrayList[i]);
// make a new arraylist if necessary
if(h[key] == null) {
h[key] = new ArrayList();
}
// store the index of this item in the arraylist
((ArrayList)h[key]).Add(i);
}
foreach(someKey Type key in h.Keys) {
if( ((ArrayList)h[key]).Count > 1) {
// do whatever you need to do to show the duplicates to the user.
// ((ArrayList)h[key]) contains the indexes of the duplicate items.
}
}

Again, harsh on memory, but fast.
"Kenny" <kv*@thereturne xchange.com> wrote in message
news:11******** **************@ f14g2000cwb.goo glegroups.com.. .
I would iterate through the ArrayList and for each item, add it to a
HashTable if it does not already exist in the HashTable and set its
value to 1. If it already exists, increment the value of that item in
the HashTable.

Once you've gone through the ArrayList, iterate through the HashTable
to see which items have a value greater than 1.

It's heavy on memory, but runs in linear time. Should be much better
than 3 nested loops.

Nov 16 '05 #7
Very nice Jason, that is exactly what I needed.

Thank you all very much!

Nov 16 '05 #8
Out of curiousity.. how exactly does a Hashtable work? Why is it so
fast?

Nov 16 '05 #9
Google for it first. There's a lot of information available that does
a better job of explaining it than I could do. If there is still
something you don't quite understand go ahead and post back and we'll
be more than happy to clarify something.

Brian

paradox wrote:
Out of curiousity.. how exactly does a Hashtable work? Why is it so
fast?


Nov 16 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
6972
by: Eric Linders | last post by:
Hi, I'm trying to figure out the most efficient method for taking the first character in a string (which will be a number), and use it as a variable to check to see if the other numbers in the string match that first number. I'm using this code for form validation of a telephone number. Previous records from the past few months show that...
3
4197
by: Erich | last post by:
I have a company table and I would like to write a query that will return to me any duplicate companies. However, it is a little more complicated then just matching on exact company names. I would like it to give me duplicates where x number of letters at the beginning of the company name match AND x number of letters of the address match AND...
0
1476
by: Timo Nentwig | last post by:
Hi! <node> <name>foo</name> <id>1</id> <age>7</age> </node> <node.....> <node> <name>foo</name>
1
1360
by: AlanAylett | last post by:
Hi all, i am having a problems with a SQL select query. I have to find all the records in a ONE table where TWO fields, field1 and field2 for example, are duplicated. eg field1 field2 a b
6
2877
by: Maxi | last post by:
I have 100 tabes in an Access database, every table has 1 filed with 100 names (records), no primary key assigned. I would like to find duplicates. Here is the criteria: The computer should pick up the first name of Table1 and check that name in that table (Table1) as well as the remaining 99 tables. Continue this till we reach the last...
5
8259
by: limperger | last post by:
Hello everyone! Is out there any way to search for duplicate entries without using the "Find duplicates" option? In the Access 97 installed in my workplace, the Find duplicates option is disabled (don't ask me why) and I think that there are little chances of having it installed. Any wonder of how to overcome this situation without the...
17
2985
by: Gayatree | last post by:
I have to concatenate 2 colimns in a table and find duplicates in them. I already used the method select * from tableA a where (select count(*) from TableA b where acol1+ +col2 = b.col1+ +col2)>1 But the performance is very bad. Data is also huge Can you help me Thanks in advance
3
25056
Thekid
by: Thekid | last post by:
I'm trying to figure out a way to find if there are duplicates in an array. My idea was to take the array as 'a' and make a second array as 'b' and remove the duplicates from 'b' using 'set' and then compare a to b. If they're different then it will print out 'duplicates found'. The problem is that even after trying different arrays, some with...
0
7664
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7583
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7885
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7948
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6250
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5213
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3642
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3626
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
923
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.