473,385 Members | 1,569 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Removing the duplicate data from List,Tuples and Dictionary

440 256MB
Hi,

How to remove the duplicate data from List,Tuples and Dictionary?.

Thanks in advance
PSB
Mar 8 '07 #1
13 19875
psbasha
440 256MB
Hi ,

For example :

list1 = [ [1,0,0],[0,1,0],[1,1,0],[0,0,0],[1,1,0]]

I dont want to have the duplicate values in the list

the output has to be as mentioned below

list1 = [ [1,0,0],[0,1,0],[1,1,0],[0,0,0]]

Thanks & Regards
PSB
Mar 8 '07 #2
bvdet
2,851 Expert Mod 2GB
Hi ,

For example :

list1 = [ [1,0,0],[0,1,0],[1,1,0],[0,0,0],[1,1,0]]

I dont want to have the duplicate values in the list

the output has to be as mentioned below

list1 = [ [1,0,0],[0,1,0],[1,1,0],[0,0,0]]

Thanks & Regards
PSB
Since the items in the list are unhashable, it's a bit more complicated:
Expand|Select|Wrap|Line Numbers
  1. >>> list1
  2. [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
  3. >>> set([x for x in list1])
  4.  
  5. Traceback (most recent call last):
  6.   File "<pyshell#12>", line 1, in -toplevel-
  7.     set([x for x in list1])
  8. TypeError: list objects are unhashable
  9. >>> set([str(x) for x in list1])
  10. set(['[1, 0, 0]', '[1, 1, 0]', '[0, 1, 0]', '[0, 0, 0]'])
  11. >>> list(eval(x) for x in set([str(x) for x in list1]))
  12. [[1, 0, 0], [1, 1, 0], [0, 1, 0], [0, 0, 0]]
  13. >>> 
Mar 9 '07 #3
bartonc
6,596 Expert 4TB
Since the items in the list are unhashable, it's a bit more complicated:
Expand|Select|Wrap|Line Numbers
  1. >>> list1
  2. [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
  3. >>> set([x for x in list1])
  4.  
  5. Traceback (most recent call last):
  6.   File "<pyshell#12>", line 1, in -toplevel-
  7.     set([x for x in list1])
  8. TypeError: list objects are unhashable
  9. >>> set([str(x) for x in list1])
  10. set(['[1, 0, 0]', '[1, 1, 0]', '[0, 1, 0]', '[0, 0, 0]'])
  11. >>> list(eval(x) for x in set([str(x) for x in list1]))
  12. [[1, 0, 0], [1, 1, 0], [0, 1, 0], [0, 0, 0]]
  13. >>> 
Pretty darn clever, there, BV.
Mar 9 '07 #4
ghostdog74
511 Expert 256MB
another way
Expand|Select|Wrap|Line Numbers
  1. >>> list1
  2. [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
  3. >>> [ item for item in list1 if item not in locals()['_[1]'] ]
  4. [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0]]
  5.  
Mar 9 '07 #5
bartonc
6,596 Expert 4TB
another way
Expand|Select|Wrap|Line Numbers
  1. >>> list1
  2. [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
  3. >>> [ item for item in list1 if item not in locals()['_[1]'] ]
  4. [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0]]
  5.  
You'll have to explain that one:
>>> list1 = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
>>> locals()['_[1]']
File "<console>", line 1, in ?
''' exceptions.KeyError : '_[1]' '''
>>>
Mar 9 '07 #6
bvdet
2,851 Expert Mod 2GB
You'll have to explain that one:
>>> list1 = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
>>> locals()['_[1]']
File "<console>", line 1, in ?
''' exceptions.KeyError : '_[1]' '''
>>>
I looked it up. From Python 2.3 docs:

In Python 2.3, a list comprehension "leaks" the control variables of each "for" it contains into the containing scope. However, this behavior is deprecated, and relying on it will not work once this bug is fixed in a future release.


'_[1]' is a temporary name used while the list is being constructed. Additional names '_[2]', '_[3]' are used for nested comprehensions. Totally undocumented, but real cool!
Mar 9 '07 #7
ghostdog74
511 Expert 256MB
You'll have to explain that one:
>>> list1 = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
>>> locals()['_[1]']
File "<console>", line 1, in ?
''' exceptions.KeyError : '_[1]' '''
>>>
locals()['_[1]'] is used in list comprehension to denote a "temporary" name, as bv has pointed out in the docs.
Mar 9 '07 #8
bartonc
6,596 Expert 4TB
locals()['_[1]'] is used in list comprehension to denote a "temporary" name, as bv has pointed out in the docs.
Very, very cool. You guys make me feel inadequate.
Mar 9 '07 #9
ilikepython
844 Expert 512MB
Would this work:

Expand|Select|Wrap|Line Numbers
  1. count = -1
  2. list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0]]
  3. list2 = list1[:]
  4. for item in list1:
  5.     count+= 1
  6.     if item in list2:
  7.         del list1[count]
  8.  
Mar 9 '07 #10
bvdet
2,851 Expert Mod 2GB
Would this work:

Expand|Select|Wrap|Line Numbers
  1. count = -1
  2. list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0]]
  3. list2 = list1[:]
  4. for item in list1:
  5.     count+= 1
  6.     if item in list2:
  7.         del list1[count]
  8.  
No. When you delete an item from the object you are iterating on, it gets messed up. This works basically the same as the list comprehension above:
Expand|Select|Wrap|Line Numbers
  1. >>> list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0], [0,1,0], [1,0,0], [0,0,1], [1,0,1]]
  2. >>> list2 = []
  3. >>> for item in list1:
  4. ...     if item not in list2:
  5. ...         list2.append(item)
  6. ...         
  7. >>> list1
  8. [[0, 1, 1], [0, 1, 1], [1, 0, 1], [1, 0, 0], [0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]
  9. >>> list2
  10. [[0, 1, 1], [1, 0, 1], [1, 0, 0], [0, 0, 0], [0, 1, 0], [0, 0, 1]]
  11. >>> 
Mar 9 '07 #11
ilikepython
844 Expert 512MB
No. When you delete an item from the object you are iterating on, it gets messed up. This works basically the same as the list comprehension above:
Expand|Select|Wrap|Line Numbers
  1. >>> list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0], [0,1,0], [1,0,0], [0,0,1], [1,0,1]]
  2. >>> list2 = []
  3. >>> for item in list1:
  4. ...     if item not in list2:
  5. ...         list2.append(item)
  6. ...         
  7. >>> list1
  8. [[0, 1, 1], [0, 1, 1], [1, 0, 1], [1, 0, 0], [0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]
  9. >>> list2
  10. [[0, 1, 1], [1, 0, 1], [1, 0, 0], [0, 0, 0], [0, 1, 0], [0, 0, 1]]
  11. >>> 
Oh ok, thank you
Mar 9 '07 #12
pes456
1
@ilikepython
hey i managed to do this for dictionary:
Expand|Select|Wrap|Line Numbers
  1. d = {1:'a', 2:'a', 'k':1, 'e':1, 'p':'z',3:'a',4:'a',5:'a',6:'a',7:'a',8:'a',9:'a',10:'b'}
  2.  
  3. values = {}
  4. for key in d:
  5.     if d[key] in values:
  6.         values[d[key]] += 1
  7.     else:
  8.         values[d[key]] = 1
  9.  
  10. dups = [key for key in values if values[key]>1]
  11. if len(dups):
  12.     print "Duplicate values:"
  13.     for dup in dups:
  14.         print dup
Sep 13 '11 #13
If original order of your data matters the above solutions are the way to go. BUT, some assumptions are made in the answers to your question.

You say "remove duplicate data". It's assumed that you mean to remove the -extra- duplicate items from your data, but to also retain only one copy of that item, as opposed to removing all items that are duplicated including the first occurance.

It also assumes that the duplciates to remove are after the first occurance of an item, but leave only the first one.

This therefore presumes order is important. It wasn't clear that you wanted to remove "left" hand duplicates or "right" hand duplicates.

There are times when you want to remove the "old" duplicate data and have the new data that's just appended (to the proverbial end) to be kept.

It's also presumed the original type() of the data is important; ie a list remain a list.

But because you are removing duplicates at arbitrary locations in your data (as you don't know when the duplicates are going to occur) the -need- for order becomes a little more vague.

Additionally you mention removing items with duplicate values in a dict(). Because the order of dict items is not (historically) gaurenteed, this also raises the question for the need for "ordered" items; as removing duplicate-valued items in dicts are considered arbitray; which key to remove?

The question on removing dups for dicts is: Which key is more important to keep then the other keys that have the same values? You can't say the left most or right most, as they are not reliably ordered.

So if order doesn't matter:

Expand|Select|Wrap|Line Numbers
  1. list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0]] 
  2. list2 =[list(y) for y in set([tuple(x) for x in list1])]
If the original order is not important but you want it ordered,
use the list3 = sorted(list2) or list2.sort() builtin function.

DICTs:
You don't have to use "for" loops or list comprehensions,
you can use dict comprehensions; they exsist.

Expand|Select|Wrap|Line Numbers
  1. dict1 = {1:'a', 2:'a', 'k':1, 'e':1, 'p':'z',3:'a',4:'a',5:'a',6:'a',7:'a',8:'a',9:'a',10:'b'}
  2. nondups = {k:v for k,v in dict1.items() if dict1.values().count(v)==1}
  3. print nondups
  4. {'p': 'z', 10: 'b'}
  5.  
  6. ddups = {k:v for k,v in dict1.items() if dict1.values().count(v)>1}
  7. print ddups
  8. {1: 'a', 2: 'a', 3: 'a', 4: 'a', 5: 'a', 6: 'a', 7: 'a', 8: 'a', 9: 'a', 'k': 1, 'e': 1}
Remember the print out of a dict uses the dict.__str__() method which orders the printout nicely. BUT items retrieved by iterating over a dict may not be the same order. Test your real data to see.

Expand|Select|Wrap|Line Numbers
  1. for k in dict1: print k
I don't believe there are such things as tuple comprehensions as a tuple is like a string, nonmutable. Using the format:

Expand|Select|Wrap|Line Numbers
  1. (x for x in seq) 
returns a generator, not a tuple.

If the original order for lists matter: I've seen the data get transformed into something with count and or order data inserted per item.

For the dict1 above you might see an interm data format like this:

Expand|Select|Wrap|Line Numbers
  1. dict2 = {(1,2,3,4,5,6,7,8,9):'a', (10,):'b', ('e','k'):1, ('p',):'z')}
The above will work because the original keys are already hashable and now the keys are in the hashable tuple sequence. But with this approach you would have to delete the item from the dict and readd it with a new key every time a new duplicate is found.

or a more dynamic version where new duplicates can easily be added and accounted for. You make a new dict where you switch the data with the key and the key with the data if the data is hashable.

Expand|Select|Wrap|Line Numbers
  1. dict2 = {'a':[1,2,3,4,5,6,7,8,9], 'b':[10,], 1:['e','k'], 'z':['p',]}
There are other fancier methods, like elementtrees and linked lists. for dealing with duplicate data; which often depends on what your data is, and why you are you even getting duplicate data.

Perhaps some of these methods will illuminate.

Here the values are lists, not tuples, as you can just append new duplicates to the values list. However this approach requires the data can be a key in a dict, which often it can not.
Sep 14 '11 #14

Sign in to post your reply or Sign up for a free account.

Similar topics

7
by: Lowell Kirsh | last post by:
I have a script which I use to find all duplicates of files within a given directory and all its subdirectories. It seems like it's longer than it needs to be but I can't figure out how to shorten...
24
by: Mandus | last post by:
Hi there, inspired by a recent thread where the end of reduce/map/lambda in Python was discussed, I looked over some of my maps, and tried to convert them to list-comprehensions. This one I...
90
by: Christoph Zwerschke | last post by:
Ok, the answer is easy: For historical reasons - built-in sets exist only since Python 2.4. Anyway, I was thinking about whether it would be possible and desirable to change the old behavior in...
4
by: sri2097 | last post by:
Hi all, I'm storing number of dictionary values into a file using the 'cPickle' module and then am retrieving it. The following is the code for it - # Code for storing the values in the file...
6
by: Niyazi | last post by:
Hi all, What is fastest way removing duplicated value from string array using vb.net? Here is what currently I am doing but the the array contains over 16000 items. And it just do it in 10 or...
5
by: aamircheema | last post by:
Hi, Say I have a table Job with columns name, date, salary . I want to get the name ,date and salary for the date when that person earned maximum salary. I am using something like SELECT...
3
by: psbasha | last post by:
Hi, Is it mandatory to clear the data from the List ,Dictionary and Tuples after using it?. - PSB
7
by: =?Utf-8?B?Sm9lbCBNZXJr?= | last post by:
I have created a custom class with both value type members and reference type members. I then have another custom class which inherits from a generic list of my first class. This custom listneeds...
4
by: mrstephengross | last post by:
Let's say I've got a list of tuples, like so: ( ('a', '1'), ('b', '2'), ('c', '3') And I want to turn it into a dictionary in which the first value of each tuple is a key and the second value...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.