471,854 Members | 1,530 Online

# Finding duplicates in an array

145 100+
I'm trying to figure out a way to find if there are duplicates in an array. My idea was to take the array as 'a' and make a second array as 'b' and remove the duplicates from 'b' using 'set' and then compare a to b. If they're different then it will print out 'duplicates found'. The problem is that even after trying different arrays, some with duplicates some without, that 'b' rearranges the numbers. Here's an example:

Expand|Select|Wrap|Line Numbers
1.
2. a='1934, 2311, 1001, 4056, 1001, 3459, 9078'
3. b=list(set(a))
4. if a != b:
5.     print "duplicates found"
6. else:
7.    print "nothing found"
8.
9.
Is there a simpler way to find if there are duplicates?
Thanks
Oct 21 '09 #1
3 24856
bvdet
2,851 Expert Mod 2GB
In your code, you have assigned variable 'a' to a string.
Expand|Select|Wrap|Line Numbers
1. >>> list(set(a))
2. [' ', ',', '1', '0', '3', '2', '5', '4', '7', '6', '9', '8']
3. >>>
To see if there are any duplicates, let's start a list. Sets are unordered, but you can compare the length of the list to the length of the set.
Expand|Select|Wrap|Line Numbers
1. >>> a=[1934, 2311, 1001, 4056, 1001, 3459, 9078]
2. >>> b = set(a)
3. >>> len(b)
4. 6
5. >>> len(a)
6. 7
7. >>>
Oct 21 '09 #2
bvdet
2,851 Expert Mod 2GB
To find the items that have duplicates:
Expand|Select|Wrap|Line Numbers
1. >>> for item in a:
2. ...     dd[item] = dd.get(item, 0) + 1
3. ...
4. >>> dd
5. {3459: 1, 2311: 1, 1001: 2, 1934: 1, 9078: 1, 4056: 1}
6. >>>
OR (less efficient)
Expand|Select|Wrap|Line Numbers
1. >>> for item in set(a):
2. ...     if a.count(item) > 1:
3. ...         print "Duplicate found: %s" % (item)
4. ...
5. Duplicate found: 1001
6. >>>
Oct 21 '09 #3
Thekid
145 100+
Thanks! I went with your first suggestion, which was along the lines of what I was thinking but I didn't consider comparing the lengths since set() is unordered.
Oct 29 '09 #4