I m a beginner to python. Could you tell me how should i proceed to remove duplicate rows in a csv file
10 40238
I m a beginner to python. Could you tell me how should i proceed to remove duplicate rows in a csv file
If the order of the information in your csv file doesn't matter, you could put each line of the file into a list, convert the list into a set, and then write the list back into the file. When you convert the list to a set, all duplicate elements disappear. - reader = open("file.csv", "r")
-
lines = reader.read().split("\n")
-
reader.close()
-
-
writer = open("file.csv", "w")
-
for line in set(lines):
-
writer.write(line + "\n")
-
writer.close()
bvdet 2,851
Expert Mod 2GB
This code maintains the order of the data: - >>> rows = open('data.txt').read().split('\n')
-
>>> newrows = []
-
>>> for row in rows:
-
... if row not in newrows:
-
... newrows.append(row)
-
...
-
>>> f = open('data1.txt', 'w')
-
>>> f.write('\n'.join(newrows))
-
>>> f.close()
Here is another way to solve your problem using bvdet's method and the csv module. - import csv
-
rows = csv.reader(open("file.csv", "rb"))
-
newrows = []
-
for row in rows:
-
if row not in newrows:
-
newrows.append(row)
-
writer = csv.writer(open("file.csv", "wb"))
-
writer.writerows(newrows)
Here is another way to solve your problem using bvdet's method and the csv module. - import csv
-
rows = csv.reader(open("file.csv", "rb"))
-
newrows = []
-
for row in rows:
-
if row not in newrows:
-
newrows.append(row)
-
writer = csv.writer(open("file.csv", "wb"))
-
writer.writerows(newrows)
from above code,when i am using set(rows),am getting error' list objects are unhashable'..i think list is hashble (by hashq module)...then y am i getting this error??pls explain
hi...
from above code,when i am using set(rows),am getting error' list objects are unhashable'..i think list is hashble (by hashq module)...then y am i getting this error??pls explain
bvdet 2,851
Expert Mod 2GB
hi...
from above code,when i am using set(rows),am getting error' list objects are unhashable'..i think list is hashble (by hashq module)...then y am i getting this error??pls explain
This error indicates you are attempting to create a set from objects that are mutable. By definition, a set is a group of unique immutable objects. The csv.reader() function returns a list of lists. A list is a mutable object. KaezarRex's earlier example in this thread was applying set() to a list of strings. A string is an immutable object.
Here is another way to solve your problem using bvdet's method and the csv module. - import csv
-
rows = csv.reader(open("file.csv", "rb"))
-
newrows = []
-
for row in rows:
-
if row not in newrows:
-
newrows.append(row)
-
writer = csv.writer(open("file.csv", "wb"))
-
writer.writerows(newrows)
Hey i used this code and i was able to remove the duplicate entries. thanks. actually this csv file is generated by a java code. if the code is modified, the output should remain the same. to acheive this i found that the files should be sorted in some order to compare(since the rows are selected by the java code randomly). could you tell me how to sort the contents for ex. priority: Column 5, Column 8, Column1. is it possible to sort the newrows list before writing.
This code maintains the order of the data: - >>> rows = open('data.txt').read().split('\n')
-
>>> newrows = []
-
>>> for row in rows:
-
... if row not in newrows:
-
... newrows.append(row)
-
...
-
>>> f = open('data1.txt', 'w')
-
>>> f.write('\n'.join(newrows))
-
>>> f.close()
hey thanks for ur reply. i used the logic which KaezarRex said. could you see prev post and tell me your suggestion
bvdet 2,851
Expert Mod 2GB
Hey i used this code and i was able to remove the duplicate entries. thanks. actually this csv file is generated by a java code. if the code is modified, the output should remain the same. to acheive this i found that the files should be sorted in some order to compare(since the rows are selected by the java code randomly). could you tell me how to sort the contents for ex. priority: Column 5, Column 8, Column1. is it possible to sort the newrows list before writing.
I am in Python 2.3. Define a comparison function to pass to the list sort method: - def comp581(a, b):
-
x = cmp(a[5], b[5])
-
if not x:
-
y = cmp(a[8], b[8])
-
if not y:
-
return cmp(a[1], b[1])
-
return y
-
return x
-
-
yourList.sort(comp581)
In Python 2.4: - yourList.sort(key=lambda i: (i[5], i[8], i[1]))
I am in Python 2.3. Define a comparison function to pass to the list sort method: - def comp581(a, b):
-
x = cmp(a[5], b[5])
-
if not x:
-
y = cmp(a[8], b[8])
-
if not y:
-
return cmp(a[1], b[1])
-
return y
-
return x
-
-
yourList.sort(comp581)
In Python 2.4: - yourList.sort(key=lambda i: (i[5], i[8], i[1]))
thanks a lot. im able to sort the list now. (the version im using is 2.5.1 - i used the 'lambda' functionality)
Sign in to post your reply or Sign up for a free account.
Similar topics
by: SeeBelow |
last post by:
I have not been able to find a simple way to have a python script
execute an MS-DOS program repeatedly. execv(), for example, won't run
it more than once. I have been using many lines of a .bat...
|
by: sri2097 |
last post by:
Hi all, I'm storing number of dictionary values into a file using the
'cPickle' module and then am retrieving it. The following is the code
for it -
# Code for storing the values in the file...
|
by: learnerofpython |
last post by:
I have a cygwin application which i have invoked using python script like:
os.system("path of the exe file")
How can i enter text in the cygwin( which has appeared )using the python script itself???
|
by: psbasha |
last post by:
Hi,
Is there any module available to write and read a XML file using Python?
Links for the material will help me.
Thanks in advance
PSB
|
by: joestevens232 |
last post by:
Hello Im having problems figuring out how to remove the duplicate entries in an array...Write a program that accepts a sequence of integers (some of which may repeat) as input
into an array. Write...
|
by: AshishMishra16 |
last post by:
HI friends,
I am using the Flex to upload files to server. I m getting all the details about the file, but I m not able to upload it to Server. Here is the code i m using for both flex & for...
|
by: Viewer T. |
last post by:
I am trying to write a script that deletes certain files based on
certain criteria.
What I am trying to do is to automate the process of deleting certain
malware files that disguise themselves...
|
by: deardp |
last post by:
hi :
i need to convert an image (ex: jpeg or gif) into binary form and generating the image back to the original one using python script(it was some thing like decoding and encoding). pls pass me...
|
by: Arulmanoj |
last post by:
Hi,
I have to write the content from one csv file to another csv file using VB Script. The scource content file contains 5 columns and the destination file contains six column as below.
Source...
|
by: Tatiana Louisa |
last post by:
Hi,
I am a bit new in python. Would be nice to get some help. I would like to remove the first line of 100 files in a folder. How can i proceed to write a python script. I can do it for a single...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
|
by: ryjfgjl |
last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
| |