473,885 Members | 2,444 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

using python to parse md5sum list

Hi

I'm new to programming and i'd like to write a program that will parse
a list produced by md5summer and give me a report in a text file on
which md5 sums appear more than once and where they are located.

the end end goal is to have a way of finding duplicate files that are
scattered across a lan of 4 windows computers.

I've dabbled with different languages over the years and i think
python is a good language for this but i have had a lot of trouble
sifting through manual and tutorials finding out with commands i need
and their syntax.

Can someone please help me?

Thanks.

Ben
Jul 18 '05 #1
4 2916
Among many other things:

First, you might want to look at os.path.walk()
Second, look at the string data type.

Third, get the Python essential reference.

Also, Programming Python (O'Riely) actually has a lot in it about stuff like
this. Its a tedious read, but in the end will help a lot for administrative
stuff like you are doing here.

So, with the understanding that you will look at these references, I will
foolishly save you a little time...

If you are using md5sum, tou can grab the md5 and the filename like such:

myfile = open(filename)
md5sums = []
for aline in myfile.readline s():
md5sums.append( aline[:-1].split(" ",1))
myfile.close()

The md5 sum will be in the 0 element of each tuple in the md5sums list, and
the path to the file will be in the 1 element.
James

On Saturday 05 March 2005 07:54 pm, Ben Rf wrote:
Hi

I'm new to programming and i'd like to write a program that will parse
a list produced by md5summer and give me a report in a text file on
which md5 sums appear more than once and where they are located.

the end end goal is to have a way of finding duplicate files that are
scattered across a lan of 4 windows computers.

I've dabbled with different languages over the years and i think
python is a good language for this but i have had a lot of trouble
sifting through manual and tutorials finding out with commands i need
and their syntax.

Can someone please help me?

Thanks.

Ben


--
James Stroud, Ph.D.
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095
Jul 18 '05 #2
Ben Rf wrote:
I'm new to programming and i'd like to write a program that will parse
a list produced by md5summer and give me a report in a text file on
which md5 sums appear more than once and where they are located.


This should do the trick:

"""
import fileinput

md5s = {}
for line in fileinput.input ():
md5, filename = line.rstrip().s plit()
md5s.setdefault (md5, []).append(filena me)

for md5, filenames in md5s.iteritems( ):
if len(filenames) > 1:
print "\t".join(filen ames)
"""

Put this in md5dups.py and you can then use
md5dups.py [FILE]... to find duplicates in any of the files you
specify. They'll then be printed out as a tab-delimited list.

Key things you might want to look up to understand this:

* the dict datatype
* dict.setdefault ()
* dict.iteritems( )
* the fileinput module
--
Michael Hoffman
Jul 18 '05 #3
In <ma************ *************** ************@py thon.org>, James Stroud
wrote:
If you are using md5sum, tou can grab the md5 and the filename like such:

myfile = open(filename)
md5sums = []
for aline in myfile.readline s():
md5sums.append( aline[:-1].split(" ",1))
md5sums.append( aline[:-1].split(None, 1))

That works too if md5sum opened the files in binary mode which is the
default on Windows. The filename is prefixed with a '*' then, leaving
just one space between checksum and filename.
myfile.close()


Ciao,
Marc 'BlackJack' Rintsch
Jul 18 '05 #4
On 5 Mar 2005 19:54:34 -0800, rumours say that be********@gmai l.com (Ben Rf)
might have written:

[snip]
the end end goal is to have a way of finding duplicate files that are
scattered across a lan of 4 windows computers.


Just in case you want to go directly to that goal, check this:

http://groups-beta.google.com/group/...8e292ec9adb82d

It doesn't read a file at all, unless there is a need to do that. For example,
if you have ten small files and one large one, the large one will not be read
(since no other files would be found with the same size).

In your case, you can use the find_duplicate_ files function with arguments like:
r"\\COMPUTER1\S HARE1", r"\\COMPUTER2\S HARE2" etc
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2048
by: Timothy Martens | last post by:
When I run the Python-2.3.exe on my WIN2K box and go through the initial dialogues, the installer frezes at 1% when it's "Copying File C:\Python23\UNWISE.exe" Any ideas ANYONE? tim.
68
5916
by: Lad | last post by:
Is anyone capable of providing Python advantages over PHP if there are any? Cheers, L.
5
2430
by: jwb | last post by:
Hello all, I just was wondering if any one knows how to compare compiled VB.NET executables to determine whether or not they are identical. In the dark ages (read: pre-CLR and .NET) one could simply compute the md5sum of a binary and determine if it was identical to another file. However, under the .NET framework, this has changed; building an executable twice will result in two different sums! I need a quick and reliable way to...
3
4152
by: Gary Townsend | last post by:
Hey all i am looking to see if anyone has created a function or knows how to create an md5sum that would match the md5sum routines found commonly in C++ and Linux. Gary Townsend
9
1788
by: ursache.marius | last post by:
Hi I noticed that the md5 computed with md5 module from python is different then the md5 sum computed with md5sum utility (on slackware and gentoo). i.e. $echo marius|md5sum 0f0f60ac801a9eec2163083a22307deb -
5
2331
by: Avi Kak | last post by:
Folks, Does regular expression processing in Python allow for executable code to be embedded inside a regular expression? For example, in Perl the following two statements $regex = qr/hello(?{print "saw hello\n"})mello(?{print "saw mello\n"})/; "jellohellomello" =~ /$regex/;
0
2061
by: napolpie | last post by:
DISCUSSION IN USER nappie writes: Hello, I'm Peter and I'm new in python codying and I'm using parsying to extract data from one meteo Arpege file. This file is long file and it's composed by word and number arguments like this: GRILLE EURAT5 Coin Nord-Ouest : 46.50/ 0.50 Coin Sud-E Hello, I'm Peter and I'm new in python codying and I'm using parsying to extract data from one meteo Arpege file.
4
5152
by: Udai Kiran | last post by:
Hi all, I have been looking for a c function that can calculate md5sum of file given the path of the file. I know that the md5sum utility is included in gnu coreutils. but how can I use this as a function. Is there any library that can do this for me. Thanks in advance. udai. http://s.udaykiran.googlepages.com/
3
1976
by: Alexnb | last post by:
Okay, I tried to follow that, and it is kinda hard. But since you obviously know what you are doing, where did you learn this? Or where can I learn this? Maric Michaud wrote: class=r>.*?</h2>', -- View this message in context: http://www.nabble.com/using-urllib2-tp18150669p18160312.html
0
10770
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10871
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9592
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
7139
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5808
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
6010
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4627
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4235
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3245
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.