467,077 Members | 945 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,077 developers. It's quick & easy.

Removing the duplicate data from List,Tuples and Dictionary

256MB
Hi,

How to remove the duplicate data from List,Tuples and Dictionary?.

Thanks in advance
PSB
Mar 8 '07 #1
  • viewed: 19375
Share:
13 Replies
256MB
Hi ,

For example :

list1 = [ [1,0,0],[0,1,0],[1,1,0],[0,0,0],[1,1,0]]

I dont want to have the duplicate values in the list

the output has to be as mentioned below

list1 = [ [1,0,0],[0,1,0],[1,1,0],[0,0,0]]

Thanks & Regards
PSB
Mar 8 '07 #2
bvdet
Expert Mod 2GB
Hi ,

For example :

list1 = [ [1,0,0],[0,1,0],[1,1,0],[0,0,0],[1,1,0]]

I dont want to have the duplicate values in the list

the output has to be as mentioned below

list1 = [ [1,0,0],[0,1,0],[1,1,0],[0,0,0]]

Thanks & Regards
PSB
Since the items in the list are unhashable, it's a bit more complicated:
Expand|Select|Wrap|Line Numbers
  1. >>> list1
  2. [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
  3. >>> set([x for x in list1])
  4.  
  5. Traceback (most recent call last):
  6.   File "<pyshell#12>", line 1, in -toplevel-
  7.     set([x for x in list1])
  8. TypeError: list objects are unhashable
  9. >>> set([str(x) for x in list1])
  10. set(['[1, 0, 0]', '[1, 1, 0]', '[0, 1, 0]', '[0, 0, 0]'])
  11. >>> list(eval(x) for x in set([str(x) for x in list1]))
  12. [[1, 0, 0], [1, 1, 0], [0, 1, 0], [0, 0, 0]]
  13. >>> 
Mar 9 '07 #3
bartonc
Expert 4TB
Since the items in the list are unhashable, it's a bit more complicated:
Expand|Select|Wrap|Line Numbers
  1. >>> list1
  2. [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
  3. >>> set([x for x in list1])
  4.  
  5. Traceback (most recent call last):
  6.   File "<pyshell#12>", line 1, in -toplevel-
  7.     set([x for x in list1])
  8. TypeError: list objects are unhashable
  9. >>> set([str(x) for x in list1])
  10. set(['[1, 0, 0]', '[1, 1, 0]', '[0, 1, 0]', '[0, 0, 0]'])
  11. >>> list(eval(x) for x in set([str(x) for x in list1]))
  12. [[1, 0, 0], [1, 1, 0], [0, 1, 0], [0, 0, 0]]
  13. >>> 
Pretty darn clever, there, BV.
Mar 9 '07 #4
Expert 256MB
another way
Expand|Select|Wrap|Line Numbers
  1. >>> list1
  2. [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
  3. >>> [ item for item in list1 if item not in locals()['_[1]'] ]
  4. [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0]]
  5.  
Mar 9 '07 #5
bartonc
Expert 4TB
another way
Expand|Select|Wrap|Line Numbers
  1. >>> list1
  2. [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
  3. >>> [ item for item in list1 if item not in locals()['_[1]'] ]
  4. [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0]]
  5.  
You'll have to explain that one:
>>> list1 = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
>>> locals()['_[1]']
File "<console>", line 1, in ?
''' exceptions.KeyError : '_[1]' '''
>>>
Mar 9 '07 #6
bvdet
Expert Mod 2GB
You'll have to explain that one:
>>> list1 = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
>>> locals()['_[1]']
File "<console>", line 1, in ?
''' exceptions.KeyError : '_[1]' '''
>>>
I looked it up. From Python 2.3 docs:

In Python 2.3, a list comprehension "leaks" the control variables of each "for" it contains into the containing scope. However, this behavior is deprecated, and relying on it will not work once this bug is fixed in a future release.


'_[1]' is a temporary name used while the list is being constructed. Additional names '_[2]', '_[3]' are used for nested comprehensions. Totally undocumented, but real cool!
Mar 9 '07 #7
Expert 256MB
You'll have to explain that one:
>>> list1 = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
>>> locals()['_[1]']
File "<console>", line 1, in ?
''' exceptions.KeyError : '_[1]' '''
>>>
locals()['_[1]'] is used in list comprehension to denote a "temporary" name, as bv has pointed out in the docs.
Mar 9 '07 #8
bartonc
Expert 4TB
locals()['_[1]'] is used in list comprehension to denote a "temporary" name, as bv has pointed out in the docs.
Very, very cool. You guys make me feel inadequate.
Mar 9 '07 #9
ilikepython
Expert 512MB
Would this work:

Expand|Select|Wrap|Line Numbers
  1. count = -1
  2. list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0]]
  3. list2 = list1[:]
  4. for item in list1:
  5.     count+= 1
  6.     if item in list2:
  7.         del list1[count]
  8.  
Mar 9 '07 #10
bvdet
Expert Mod 2GB
Would this work:

Expand|Select|Wrap|Line Numbers
  1. count = -1
  2. list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0]]
  3. list2 = list1[:]
  4. for item in list1:
  5.     count+= 1
  6.     if item in list2:
  7.         del list1[count]
  8.  
No. When you delete an item from the object you are iterating on, it gets messed up. This works basically the same as the list comprehension above:
Expand|Select|Wrap|Line Numbers
  1. >>> list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0], [0,1,0], [1,0,0], [0,0,1], [1,0,1]]
  2. >>> list2 = []
  3. >>> for item in list1:
  4. ...     if item not in list2:
  5. ...         list2.append(item)
  6. ...         
  7. >>> list1
  8. [[0, 1, 1], [0, 1, 1], [1, 0, 1], [1, 0, 0], [0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]
  9. >>> list2
  10. [[0, 1, 1], [1, 0, 1], [1, 0, 0], [0, 0, 0], [0, 1, 0], [0, 0, 1]]
  11. >>> 
Mar 9 '07 #11
ilikepython
Expert 512MB
No. When you delete an item from the object you are iterating on, it gets messed up. This works basically the same as the list comprehension above:
Expand|Select|Wrap|Line Numbers
  1. >>> list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0], [0,1,0], [1,0,0], [0,0,1], [1,0,1]]
  2. >>> list2 = []
  3. >>> for item in list1:
  4. ...     if item not in list2:
  5. ...         list2.append(item)
  6. ...         
  7. >>> list1
  8. [[0, 1, 1], [0, 1, 1], [1, 0, 1], [1, 0, 0], [0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]
  9. >>> list2
  10. [[0, 1, 1], [1, 0, 1], [1, 0, 0], [0, 0, 0], [0, 1, 0], [0, 0, 1]]
  11. >>> 
Oh ok, thank you
Mar 9 '07 #12
@ilikepython
hey i managed to do this for dictionary:
Expand|Select|Wrap|Line Numbers
  1. d = {1:'a', 2:'a', 'k':1, 'e':1, 'p':'z',3:'a',4:'a',5:'a',6:'a',7:'a',8:'a',9:'a',10:'b'}
  2.  
  3. values = {}
  4. for key in d:
  5.     if d[key] in values:
  6.         values[d[key]] += 1
  7.     else:
  8.         values[d[key]] = 1
  9.  
  10. dups = [key for key in values if values[key]>1]
  11. if len(dups):
  12.     print "Duplicate values:"
  13.     for dup in dups:
  14.         print dup
Sep 13 '11 #13
If original order of your data matters the above solutions are the way to go. BUT, some assumptions are made in the answers to your question.

You say "remove duplicate data". It's assumed that you mean to remove the -extra- duplicate items from your data, but to also retain only one copy of that item, as opposed to removing all items that are duplicated including the first occurance.

It also assumes that the duplciates to remove are after the first occurance of an item, but leave only the first one.

This therefore presumes order is important. It wasn't clear that you wanted to remove "left" hand duplicates or "right" hand duplicates.

There are times when you want to remove the "old" duplicate data and have the new data that's just appended (to the proverbial end) to be kept.

It's also presumed the original type() of the data is important; ie a list remain a list.

But because you are removing duplicates at arbitrary locations in your data (as you don't know when the duplicates are going to occur) the -need- for order becomes a little more vague.

Additionally you mention removing items with duplicate values in a dict(). Because the order of dict items is not (historically) gaurenteed, this also raises the question for the need for "ordered" items; as removing duplicate-valued items in dicts are considered arbitray; which key to remove?

The question on removing dups for dicts is: Which key is more important to keep then the other keys that have the same values? You can't say the left most or right most, as they are not reliably ordered.

So if order doesn't matter:

Expand|Select|Wrap|Line Numbers
  1. list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0]] 
  2. list2 =[list(y) for y in set([tuple(x) for x in list1])]
If the original order is not important but you want it ordered,
use the list3 = sorted(list2) or list2.sort() builtin function.

DICTs:
You don't have to use "for" loops or list comprehensions,
you can use dict comprehensions; they exsist.

Expand|Select|Wrap|Line Numbers
  1. dict1 = {1:'a', 2:'a', 'k':1, 'e':1, 'p':'z',3:'a',4:'a',5:'a',6:'a',7:'a',8:'a',9:'a',10:'b'}
  2. nondups = {k:v for k,v in dict1.items() if dict1.values().count(v)==1}
  3. print nondups
  4. {'p': 'z', 10: 'b'}
  5.  
  6. ddups = {k:v for k,v in dict1.items() if dict1.values().count(v)>1}
  7. print ddups
  8. {1: 'a', 2: 'a', 3: 'a', 4: 'a', 5: 'a', 6: 'a', 7: 'a', 8: 'a', 9: 'a', 'k': 1, 'e': 1}
Remember the print out of a dict uses the dict.__str__() method which orders the printout nicely. BUT items retrieved by iterating over a dict may not be the same order. Test your real data to see.

Expand|Select|Wrap|Line Numbers
  1. for k in dict1: print k
I don't believe there are such things as tuple comprehensions as a tuple is like a string, nonmutable. Using the format:

Expand|Select|Wrap|Line Numbers
  1. (x for x in seq) 
returns a generator, not a tuple.

If the original order for lists matter: I've seen the data get transformed into something with count and or order data inserted per item.

For the dict1 above you might see an interm data format like this:

Expand|Select|Wrap|Line Numbers
  1. dict2 = {(1,2,3,4,5,6,7,8,9):'a', (10,):'b', ('e','k'):1, ('p',):'z')}
The above will work because the original keys are already hashable and now the keys are in the hashable tuple sequence. But with this approach you would have to delete the item from the dict and readd it with a new key every time a new duplicate is found.

or a more dynamic version where new duplicates can easily be added and accounted for. You make a new dict where you switch the data with the key and the key with the data if the data is hashable.

Expand|Select|Wrap|Line Numbers
  1. dict2 = {'a':[1,2,3,4,5,6,7,8,9], 'b':[10,], 1:['e','k'], 'z':['p',]}
There are other fancier methods, like elementtrees and linked lists. for dealing with duplicate data; which often depends on what your data is, and why you are you even getting duplicate data.

Perhaps some of these methods will illuminate.

Here the values are lists, not tuples, as you can just append new duplicates to the values list. However this approach requires the data can be a key in a dict, which often it can not.
Sep 14 '11 #14

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

7 posts views Thread by Lowell Kirsh | last post: by
24 posts views Thread by Mandus | last post: by
90 posts views Thread by Christoph Zwerschke | last post: by
4 posts views Thread by sri2097 | last post: by
7 posts views Thread by =?Utf-8?B?Sm9lbCBNZXJr?= | last post: by
4 posts views Thread by mrstephengross | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.