473,396 Members | 1,995 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Removing Duplicate entries in a file...

Hi all, I'm storing number of dictionary values into a file using the
'cPickle' module and then am retrieving it. The following is the code
for it -

# Code for storing the values in the file
import cPickle

book = {raw_input("Name: "): [int(raw_input("Phone: ")),
raw_input("Address: ")] }
file_object = file(database, 'w+')
cPickle.dump(book, file_object)
file_object.close()

# Code for retrieving values and modifiing them.
tobe_modified_name = raw_input("Enter name to be modified: ")
file_object = file(database)

while file_object.tell() != EOFError:
try:
stored_dict = cPickle.load(file_object)
if stored_dict.has_key(tobe_modified_name):
print ("Entry found !")
# I want to modify the values retrieved from the file and
then put it back to the file without duplicate entry.
file_object = file(database, 'a+')
except EOFError:
break
file_object.close()
Now, my problem is after finding the entry in the file, I want to make
changes to the 'values' under the searched 'key' and then insert it
back to the file. But in doing so I'm having duplicate entries for the
same key. I want to remove the previous key and value entry in the file
and key the latest one. How to solve this problem ?

I actually thought of 2 ways -

1) In Java there is something called 'file_pointer' concept where in
after you find the entry you are looking for you move all the entries
below this entry. Then you get the searched entry at the bottom of the
file. After this truncate the file by a certain bytes to remove the old
entry. Can we do this in Python using the file.truncate([size]) method
?

2) Although this is a really crappy way but nevertheless I'll put it
across. First after finding the entry you are looking for in the file,
make a copy of this file without the entry found in the previous file.
Make the changes to the 'values' under this key and insert this into
the second file what you have created. Before exiting delete the first
file.

Are there any more ways to solve my problem ? Any criticisms are
welcome....

Jan 6 '06 #1
4 4355

sri2097 wrote:
Hi all, I'm storing number of dictionary values into a file using the
'cPickle' module and then am retrieving it. The following is the code
for it -

# Code for storing the values in the file
import cPickle

book = {raw_input("Name: "): [int(raw_input("Phone: ")),
raw_input("Address: ")] }
file_object = file(database, 'w+')
cPickle.dump(book, file_object)
file_object.close()


I may be misunderstanding you - but it seems you just want to read a
pickle, modify it, and then write it back ?

What you're doing is appending the modified pickle to the original one
- which is more complicated than what you want to achieve.

file_object = open(filename, 'rb')
stored_dict = cPickle.load(file_object)
file_object.close()

.... code that modifies stored_dict

file_object = open(filename, 'wb')
cPickle.dump(stored_dict, file_object)
file_object.close()

Any reason why that shouldn't do what you want ?

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml

Jan 6 '06 #2
"sri2097" <sr********@gmail.com> writes:
Hi all, I'm storing number of dictionary values into a file using the
'cPickle' module and then am retrieving it. The following is the code
for it -

# Code for storing the values in the file
import cPickle

book = {raw_input("Name: "): [int(raw_input("Phone: ")),
raw_input("Address: ")] }
file_object = file(database, 'w+')
cPickle.dump(book, file_object)
file_object.close()

# Code for retrieving values and modifiing them.
tobe_modified_name = raw_input("Enter name to be modified: ")
file_object = file(database)

while file_object.tell() != EOFError:
try:
stored_dict = cPickle.load(file_object)
if stored_dict.has_key(tobe_modified_name):
print ("Entry found !")
# I want to modify the values retrieved from the file and
then put it back to the file without duplicate entry.
file_object = file(database, 'a+')
except EOFError:
break
file_object.close()
Now, my problem is after finding the entry in the file, I want to make
changes to the 'values' under the searched 'key' and then insert it
back to the file. But in doing so I'm having duplicate entries for the
same key. I want to remove the previous key and value entry in the file
and key the latest one. How to solve this problem ?
First, file_object.tell won't return EOFError. Nothing should return
EOFError - it's an exception. It should be raised.

As you noticed, cPickle.load will raise EOFError when called on a file
that you've reached the end of. However, you want to narrow the
try clause as much as possible:

try:
stored_dict = cPickle.load(file_object)
except EOFError:
break

# Work with stored dict here.

If you weren't doing a break in the except clause, you'd work with the
dictionary in an else clause.
I actually thought of 2 ways -

1) In Java there is something called 'file_pointer' concept where in
after you find the entry you are looking for you move all the entries
below this entry. Then you get the searched entry at the bottom of the
file. After this truncate the file by a certain bytes to remove the old
entry. Can we do this in Python using the file.truncate([size]) method
?
Yup, this would work. You'd have to save the value from
file_object.tell() before calling cPickle.load, so you could go back
to that point to write the next object. You'd either have to load all
the following objects into memory, or shuttle back and forth between
the read and write positions. The latter sounds "really crappy" to me.
2) Although this is a really crappy way but nevertheless I'll put it
across. First after finding the entry you are looking for in the file,
make a copy of this file without the entry found in the previous file.
Make the changes to the 'values' under this key and insert this into
the second file what you have created. Before exiting delete the first
file.


Actually, there's a good reason for doing it that way. But first,
another alternative.

Unless your file is huge (more than a few hundred megabytes), you
might consider loading the entire thing into memory. Instead of
calling cPickle.dump multiple times, put all the dictionaries in a
list, then call cPickle.dump on the list. When you want to update the
list, cPickle.load will load the entire list, so you can use Python to
work on it.

As for saving the file, best practice for updating a file is to write
it to a temporary file, and then rename the new file to the old name
after the write has successfully finished. This way, if the write
fails for some reason, your working file isn't corrupted. Doing it
this way also makes dealing with the case of the the list being to big
load into memory easy:

# Warning, untested code

while 1:
try:
stored_dict = cPickle.load(input_file)
except EOFError:
break
if stored_dict.has_key(tobe_modified_name):
print "Entry found !"
# Modify stored_dict here
cPickle.dump(stored_dict, output_file)

output_file.close()
os.unlink(database) # May not be required; depends on your os
os.rename(datebase_temp, database)
You'll probably want to handle exceptions from cPickle.dump and
output_file.close cleanly as well.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jan 6 '06 #3
Hi there, I'm just curious to know as to how the changes you have
suggested will solve the problem. Instead of appending (what I was
doing), now we are opening and storing the files in 'binary' format.
All the other entries in my file will be gone when I write into the
file again.

What I actuall need is this -

I have some dictionary values stored in a file. I retrieve these
entries based on the key value specified by the user. Now if I want to
modify the values under a particular key, I first search if that key
exists in the file and if yes retrieve the values associated with the
key and modify them. Now when I re-insert this modified key-value pair
back in the file. I have 2 entries now (one is the old wntry and the
second is the new modified one). So if I search for that key the next
time I'll have 2 entries for it. That's not what we want. So how do I
remove the old entry without the other values getting deleted ? In
other words, keeping the other entries as it is, I want to update a
particular key-value pair.

Let me know in case any bright idea strikes...

Jan 7 '06 #4
Thanx Mike, My problem solved !! I loaded the entire file contnets into
list and my job was a piece of cake after that.

Srikar

Jan 10 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Gary Lundquest | last post by:
It appears to me that MySQL version 4 returns an error messge when doing an Insert that results in duplicate entries. Version 3 did NOT return an error - it dropped the duplicate entries and ran...
3
by: andreas.maurer1971 | last post by:
Hi all, since a few years I use the following statement to find duplicate entries in a table: SELECT t1.id, t2.id,... FROM table AS t1 INNER JOIN table AS t2 ON t1.field = t2.field WHERE...
3
by: Rad | last post by:
I have a table . It has a nullable column called AccountNumber, which is of varchar type. The AccountNumber is alpha-numeric. I want to take data from this table and process it for my application....
5
by: Chris Lasher | last post by:
Hello Pythonistas! I'm looking for a way to duplicate entries in a symmetrical matrix that's composed of genetic distances. For example, suppose I have a matrix like the following: A B ...
10
by: Backwards | last post by:
Hello all, I'll start by explaining what my app does so not to confuss you when i ask my question. ☺ I have a VB.Net 2.0 app that starts a process (process.start ...) and passes a prameter...
5
by: Manish | last post by:
The topic is related to MySQL database. Suppose a table "address" contains the following records ------------------------------------------------------- | name | address | phone |...
1
by: gaikokujinkyofusho | last post by:
Hi, I have been enjoying being able to subscribe to RSS (http://kinja.com/user/thedigestibleaggie) for awhile and have come up with a fairly nice list of feeds but I have run into an annoying...
12
by: joestevens232 | last post by:
Hello Im having problems figuring out how to remove the duplicate entries in an array...Write a program that accepts a sequence of integers (some of which may repeat) as input into an array. Write...
10
by: sathish119 | last post by:
I m a beginner to python. Could you tell me how should i proceed to remove duplicate rows in a csv file
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.