473,219 Members | 1,819 Online

Checking for the existence of Duplicates

I have found a lot of material on removing duplicates from a list, but I
am trying to find the most efficient way to just check for the existence
of duplicates in a list. Here is the best I have come up with so far:

CheckList = [x[ValIndex] for x in self.__XRList[z]]
FilteredList = filter((lambda x:x != 0),CheckList)
if len(FilteredList) len(sets.Set(FilteredList)): return
False

The first statement pulls the slice out of a matrix I need to check.
The filter statement gets rid of zeroes (they don't count as duplicates).
The if statement is the actual duplicates check

This is in a program that generates random numbers to do a brute force
solve on a sudoku-like puzzle. Once a certain level of difficulty in
the puzzle is reached, performance goes off a cliff because the
duplicate checking code, although fast, is executed so many times.
Sep 28 '07 #1
1 3297
On Sep 28, 10:27 pm, AndyB <andrewblakes...@gmail.comwrote:
...
This is in a program that generates random numbers to do a brute force
solve on a sudoku-like puzzle. Once a certain level of difficulty in
the puzzle is reached, performance goes off a cliff because the
duplicate checking code, although fast, is executed so many times.
For a sudoku solver, you may be better dodging the problem, and
maintaining a set per row, column and box saying which numbers have
been placed already - and thus avoiding adding duplicates in the first
place. It may be better to use a bitfield instead (ie use bits 1..9 of
an int to represent these sets). You should also check out Donald
Knuth's 'Dancing Links' algorithm as a clever implementation of a
brute-force search that works perfectly for sudoku solving. (It uses
linked lists, but the idea still works in python).

I have found a lot of material on removing duplicates from a list, but I
am trying to find the most efficient way to just check for the existence
of duplicates in a list. Here is the best I have come up with so far:

CheckList = [x[ValIndex] for x in self.__XRList[z]]
FilteredList = filter((lambda x:x != 0),CheckList)
if len(FilteredList) len(sets.Set(FilteredList)): return
In general, the natural way to test for duplicates is as you have it,
except that set is now built-in, so you can write
def has_no_duplicates(x):
return len(x) == len(set(x))

But in your case, you're having to construct the list and filter, so
it's better just to loop and do everything at once:

found = set()
for x in self.__XRList[z]:
cell = x[ValIndex]
if cell != 0 and cell in found:
return False

Since your elements are probably numbers 1 to 9 (or 0), you might use
bitfields. I'd expect this to be faster, but you should check:

found = 0
for x in self.__XRList[z]:
cellbit = 1 << x[ValIndex]
if cellbit != 1 and (cellbit & found):
return False
found |= cellbit

--
Paul Hankin

Sep 28 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

 4 by: GujuBoy | last post by: i want to check to see if a certain program is installed on my windows box using python. how can i do that...(ie, i want to see if "word" is installed) please help 1 by: Spockie | last post by: I do not use linkedlist stdlibrary, vector i make my own, but i have problems with one issue, and that is checking if there are duplicates in the middle of inputing information line 294 ... 1 by: Xeno Campanoli | last post by: I'm having a hard time checking existence of windows. The only thing I found on this is page 224 of Rhino. Why is there no function to check this? Is there a publication on this part of the... 2 by: mike | last post by: I had a form like below that validated that a file was there before it would submit.