Removing the duplicate data from List,Tuples and Dictionary

psbasha

440 256MB

Hi,

How to remove the duplicate data from List,Tuples and Dictionary?.

Thanks in advance
PSB

Mar 8 '07 #1

Subscribe Post Reply

19875

psbasha

440

256MB

Hi ,

For example :

list1 = [ [1,0,0],[0,1,0],[1,1,0],[0,0,0],[1,1,0]]

I dont want to have the duplicate values in the list

the output has to be as mentioned below

list1 = [ [1,0,0],[0,1,0],[1,1,0],[0,0,0]]

Thanks & Regards
PSB

Mar 8 '07 #2

bvdet

2,851

Expert Mod 2GB

Hi ,

For example :

list1 = [ [1,0,0],[0,1,0],[1,1,0],[0,0,0],[1,1,0]]

I dont want to have the duplicate values in the list

the output has to be as mentioned below

list1 = [ [1,0,0],[0,1,0],[1,1,0],[0,0,0]]

Thanks & Regards
PSB

Since the items in the list are unhashable, it's a bit more complicated:

Expand|Select|Wrap|Line Numbers

 >>> list1

[[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]

>>> set([x for x in list1])
 
Traceback (most recent call last):

  File "<pyshell#12>", line 1, in -toplevel-

    set([x for x in list1])

TypeError: list objects are unhashable

>>> set([str(x) for x in list1])

set(['[1, 0, 0]', '[1, 1, 0]', '[0, 1, 0]', '[0, 0, 0]'])

>>> list(eval(x) for x in set([str(x) for x in list1]))

[[1, 0, 0], [1, 1, 0], [0, 1, 0], [0, 0, 0]]

>>>

Mar 9 '07 #3

bartonc

6,596

Expert 4TB

Since the items in the list are unhashable, it's a bit more complicated:

Expand|Select|Wrap|Line Numbers

>>> list1

[[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]

>>> set([x for x in list1])

Traceback (most recent call last):

File "<pyshell#12>", line 1, in -toplevel-

set([x for x in list1])

TypeError: list objects are unhashable

>>> set([str(x) for x in list1])

set(['[1, 0, 0]', '[1, 1, 0]', '[0, 1, 0]', '[0, 0, 0]'])

>>> list(eval(x) for x in set([str(x) for x in list1]))

[[1, 0, 0], [1, 1, 0], [0, 1, 0], [0, 0, 0]]

>>>

Pretty darn clever, there, BV.

Mar 9 '07 #4

ghostdog74

511

Expert 256MB

another way

Expand|Select|Wrap|Line Numbers

 
>>> list1

[[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]

>>> [ item for item in list1 if item not in locals()['_[1]'] ]

[[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0]]

Mar 9 '07 #5

bartonc

6,596

Expert 4TB

another way

Expand|Select|Wrap|Line Numbers

>>> list1

[[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]

>>> [ item for item in list1 if item not in locals()['_[1]'] ]

[[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0]]

You'll have to explain that one:
>>> list1 = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
>>> locals()['_[1]']
File "<console>", line 1, in ?
''' exceptions.KeyError : '_[1]' '''
>>>

Mar 9 '07 #6

bvdet

2,851

Expert Mod 2GB

You'll have to explain that one:
>>> list1 = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
>>> locals()['_[1]']
File "<console>", line 1, in ?
''' exceptions.KeyError : '_[1]' '''
>>>

I looked it up. From Python 2.3 docs:

In Python 2.3, a list comprehension "leaks" the control variables of each "for" it contains into the containing scope. However, this behavior is deprecated, and relying on it will not work once this bug is fixed in a future release.

'_[1]' is a temporary name used while the list is being constructed. Additional names '_[2]', '_[3]' are used for nested comprehensions. Totally undocumented, but real cool!

Mar 9 '07 #7

ghostdog74

511

Expert 256MB

You'll have to explain that one:
>>> list1 = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 0], [1, 1, 0]]
>>> locals()['_[1]']
File "<console>", line 1, in ?
''' exceptions.KeyError : '_[1]' '''
>>>

locals()['_[1]'] is used in list comprehension to denote a "temporary" name, as bv has pointed out in the docs.

Mar 9 '07 #8

bartonc

6,596

Expert 4TB

locals()['_[1]'] is used in list comprehension to denote a "temporary" name, as bv has pointed out in the docs.

Very, very cool. You guys make me feel inadequate.

Mar 9 '07 #9

ilikepython

844

Expert 512MB

Would this work:

Expand|Select|Wrap|Line Numbers

 
count = -1

list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0]]

list2 = list1[:]

for item in list1:

    count+= 1

    if item in list2:

        del list1[count]

Mar 9 '07 #10

bvdet

2,851

Expert Mod 2GB

Would this work:

Expand|Select|Wrap|Line Numbers

count = -1

list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0]]

list2 = list1[:]

for item in list1:

    count+= 1

    if item in list2:

        del list1[count]

No. When you delete an item from the object you are iterating on, it gets messed up. This works basically the same as the list comprehension above:

Expand|Select|Wrap|Line Numbers

 >>> list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0], [0,1,0], [1,0,0], [0,0,1], [1,0,1]]

>>> list2 = []

>>> for item in list1:

...     if item not in list2:

...         list2.append(item)

...         

>>> list1

[[0, 1, 1], [0, 1, 1], [1, 0, 1], [1, 0, 0], [0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]

>>> list2

[[0, 1, 1], [1, 0, 1], [1, 0, 0], [0, 0, 0], [0, 1, 0], [0, 0, 1]]

>>>

Mar 9 '07 #11

ilikepython

844

Expert 512MB

No. When you delete an item from the object you are iterating on, it gets messed up. This works basically the same as the list comprehension above:

Expand|Select|Wrap|Line Numbers

>>> list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0], [0,1,0], [1,0,0], [0,0,1], [1,0,1]]

>>> list2 = []

>>> for item in list1:

...     if item not in list2:

...         list2.append(item)

...

>>> list1

[[0, 1, 1], [0, 1, 1], [1, 0, 1], [1, 0, 0], [0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]

>>> list2

[[0, 1, 1], [1, 0, 1], [1, 0, 0], [0, 0, 0], [0, 1, 0], [0, 0, 1]]

>>>

Oh ok, thank you

Mar 9 '07 #12

pes456

@ilikepython
hey i managed to do this for dictionary:

Expand|Select|Wrap|Line Numbers

 d = {1:'a', 2:'a', 'k':1, 'e':1, 'p':'z',3:'a',4:'a',5:'a',6:'a',7:'a',8:'a',9:'a',10:'b'}
 
values = {}

for key in d:

    if d[key] in values:

        values[d[key]] += 1

    else:

        values[d[key]] = 1
 
dups = [key for key in values if values[key]>1]

if len(dups):

    print "Duplicate values:"

    for dup in dups:

        print dup

Sep 13 '11 #13

Dev Player

If original order of your data matters the above solutions are the way to go. BUT, some assumptions are made in the answers to your question.

You say "remove duplicate data". It's assumed that you mean to remove the -extra- duplicate items from your data, but to also retain only one copy of that item, as opposed to removing all items that are duplicated including the first occurance.

It also assumes that the duplciates to remove are after the first occurance of an item, but leave only the first one.

This therefore presumes order is important. It wasn't clear that you wanted to remove "left" hand duplicates or "right" hand duplicates.

There are times when you want to remove the "old" duplicate data and have the new data that's just appended (to the proverbial end) to be kept.

It's also presumed the original type() of the data is important; ie a list remain a list.

But because you are removing duplicates at arbitrary locations in your data (as you don't know when the duplicates are going to occur) the -need- for order becomes a little more vague.

Additionally you mention removing items with duplicate values in a dict(). Because the order of dict items is not (historically) gaurenteed, this also raises the question for the need for "ordered" items; as removing duplicate-valued items in dicts are considered arbitray; which key to remove?

The question on removing dups for dicts is: Which key is more important to keep then the other keys that have the same values? You can't say the left most or right most, as they are not reliably ordered.

So if order doesn't matter:

Expand|Select|Wrap|Line Numbers

 list1 = [[0,1,1], [0,1,1], [1,0,1], [1,0,0], [0,0,0]] 

list2 =[list(y) for y in set([tuple(x) for x in list1])]

If the original order is not important but you want it ordered,
use the list3 = sorted(list2) or list2.sort() builtin function.

DICTs:
You don't have to use "for" loops or list comprehensions,
you can use dict comprehensions; they exsist.

Expand|Select|Wrap|Line Numbers

 dict1 = {1:'a', 2:'a', 'k':1, 'e':1, 'p':'z',3:'a',4:'a',5:'a',6:'a',7:'a',8:'a',9:'a',10:'b'}

nondups = {k:v for k,v in dict1.items() if dict1.values().count(v)==1}

print nondups

{'p': 'z', 10: 'b'}
 
ddups = {k:v for k,v in dict1.items() if dict1.values().count(v)>1}

print ddups

{1: 'a', 2: 'a', 3: 'a', 4: 'a', 5: 'a', 6: 'a', 7: 'a', 8: 'a', 9: 'a', 'k': 1, 'e': 1}

Remember the print out of a dict uses the dict.__str__() method which orders the printout nicely. BUT items retrieved by iterating over a dict may not be the same order. Test your real data to see.

Expand|Select|Wrap|Line Numbers

for k in dict1: print k

I don't believe there are such things as tuple comprehensions as a tuple is like a string, nonmutable. Using the format:

Expand|Select|Wrap|Line Numbers

(x for x in seq)

returns a generator, not a tuple.

If the original order for lists matter: I've seen the data get transformed into something with count and or order data inserted per item.

For the dict1 above you might see an interm data format like this:

Expand|Select|Wrap|Line Numbers

dict2 = {(1,2,3,4,5,6,7,8,9):'a', (10,):'b', ('e','k'):1, ('p',):'z')}

The above will work because the original keys are already hashable and now the keys are in the hashable tuple sequence. But with this approach you would have to delete the item from the dict and readd it with a new key every time a new duplicate is found.

or a more dynamic version where new duplicates can easily be added and accounted for. You make a new dict where you switch the data with the key and the key with the data if the data is hashable.

Expand|Select|Wrap|Line Numbers

dict2 = {'a':[1,2,3,4,5,6,7,8,9], 'b':[10,], 1:['e','k'], 'z':['p',]}

There are other fancier methods, like elementtrees and linked lists. for dealing with duplicate data; which often depends on what your data is, and why you are you even getting duplicate data.

Perhaps some of these methods will illuminate.

Here the values are lists, not tuples, as you can just append new duplicates to the values list. However this approach requires the data can be a key in a dict, which often it can not.

Sep 14 '11 #14

Similar topics

how can I make this script shorter?

by: Lowell Kirsh | last post by:

I have a script which I use to find all duplicates of files within a given directory and all its subdirectories. It seems like it's longer than it needs to be but I can't figure out how to shorten...

Python

map vs. list-comprehension

by: Mandus | last post by:

Hi there, inspired by a recent thread where the end of reduce/map/lambda in Python was discussed, I looked over some of my maps, and tried to convert them to list-comprehensions. This one I...

Python

Why is dictionary.keys() a list and not a set?

by: Christoph Zwerschke | last post by:

Ok, the answer is easy: For historical reasons - built-in sets exist only since Python 2.4. Anyway, I was thinking about whether it would be possible and desirable to change the old behavior in...

Python

Removing Duplicate entries in a file...

by: sri2097 | last post by:

Hi all, I'm storing number of dictionary values into a file using the 'cPickle' module and then am retrieving it. The following is the code for it - # Code for storing the values in the file...

Python

Fastest way removing duplicated value from string array

by: Niyazi | last post by:

Hi all, What is fastest way removing duplicated value from string array using vb.net? Here is what currently I am doing but the the array contains over 16000 items. And it just do it in 10 or...

.NET Framework

A basic question: Removing duplicate results from Max function

by: aamircheema | last post by:

Hi, Say I have a table Job with columns name, date, salary . I want to get the name ,date and salary for the date when that person earned maximum salary. I am using something like SELECT...

Microsoft Access / VBA

Clearing data from the List ,Dictionary and Tuples

by: psbasha | last post by:

Hi, Is it mandatory to clear the data from the List ,Dictionary and Tuples after using it?. - PSB

Python

Removing reference type members from a generic list clone

by: =?Utf-8?B?Sm9lbCBNZXJr?= | last post by:

I have created a custom class with both value type members and reference type members. I then have another custom class which inherits from a generic list of my first class. This custom listneeds...

Visual Basic .NET

How to turn a list of tuples into a dictionary?

by: mrstephengross | last post by:

Let's say I've got a list of tuples, like so: ( ('a', '1'), ('b', '2'), ('c', '3') And I want to turn it into a dictionary in which the first value of each tuple is a key and the second value...

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware