473,396 Members | 2,158 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Question regarding checksuming of a file

Good evening,

I need to generate checksums of a file, store the value in a variable,
and pass it along for later comparison.

The MD5 module would seem to do the trick but I'm sketchy on implementation.
The nearest I can see would be

import md5

m=md5.new()
contents = open(self.file_name,"rb").read()
check=md5.update(contents)

However this does not appear to be actually returning the checksum.

Does anyone have insight into where I am going wrong?

Any help you can provide would be greatly appreciated.

Thanks
May 14 '06 #1
10 1366
Andrew Robert wrote:
m=md5.new()
contents = open(self.file_name,"rb").read()
check=md5.update(contents)

However this does not appear to be actually returning the checksum.


the docs are your friend, use them. hint: first you eat, then you...
http://docs.python.org/lib/module-md5.html

--
Edward Elliott
UC Berkeley School of Law (Boalt Hall)
complangpython at eddeye dot net
May 14 '06 #2
Actually, I think I got it but would like to confirm this looks right.

import md5
checksum = md5.new()
mfn = open(self.file_name, 'r')
for line in mfn.readlines():
checksum.update(line)
mfn.close()
cs = checksum.hexdigest()
print cs

The value cs should contain the MD5 checksum or did I miss something?

Any help you can provide would be greatly appreciated.

Thanks
May 14 '06 #3
In article <12*************@corp.supernews.com>,
Andrew Robert <an************@gmail.com> wrote:
Good evening,

I need to generate checksums of a file, store the value in a variable,
and pass it along for later comparison.

The MD5 module would seem to do the trick but I'm sketchy on implementation.
The nearest I can see would be

import md5

m=md5.new()
contents = open(self.file_name,"rb").read()
check=md5.update(contents)

However this does not appear to be actually returning the checksum.

Does anyone have insight into where I am going wrong?


After calling update(), you need to call digest(). Update() only updates
the internal state of the md5 state machine; digest() returns the hash.
Also, for the code above, it's m.update(), not md5.update(). Update() is a
method of an md5 instance object, not the md5 module itself.

Lastly, the md5 algorithm is known to be weak. If you're doing md5 to
maintain compatability with some pre-existing implementation, that's one
thing. But, if you're starting something new from scratch, I would suggest
using SHA-1 instead (see the sha module). SHA-1 is much stronger
cryptographically than md5. The Python API is virtually identical, so it's
no added work to switch to the stronger algorithm.
May 14 '06 #4
Roy Smith wrote:

However this does not appear to be actually returning the checksum.

Does anyone have insight into where I am going wrong?


After calling update(), you need to call digest(). Update() only updates
the internal state of the md5 state machine; digest() returns the hash.
Also, for the code above, it's m.update(), not md5.update(). Update() is a
method of an md5 instance object, not the md5 module itself.

Lastly, the md5 algorithm is known to be weak. If you're doing md5 to
maintain compatability with some pre-existing implementation, that's one
thing. But, if you're starting something new from scratch, I would suggest
using SHA-1 instead (see the sha module). SHA-1 is much stronger
cryptographically than md5. The Python API is virtually identical, so it's
no added work to switch to the stronger algorithm.


Hi Roy,

This is strictly for checking if a file was corrupted during transit
over an MQSeries channel.

The check is not intended to be used for crypto purposes.
May 14 '06 #5
Ant
A script I use for comparing files by MD5 sum uses the following
function, which you may find helps:

def getSum(self):
md5Sum = md5.new()

f = open(self.filename, 'rb')

for line in f:
md5Sum.update(line)

f.close()

return md5Sum.hexdigest()

May 14 '06 #6
"Ant" <an****@gmail.com> writes:
def getSum(self):
md5Sum = md5.new()
f = open(self.filename, 'rb')
for line in f:
md5Sum.update(line)
f.close()
return md5Sum.hexdigest()


This should work, but there is one hazard if the file is very large
and is not a text file. You're trying to read one line at a time from
it, which means a contiguous string of characters up to a newline.
Depending on the file contents, that could mean gigabytes which get
read into memory. So it's best to read a fixed size amount in each
operation, e.g. (untested):

def getblocks(f, blocksize=1024):
while True:
s = f.read(blocksize)
if not s: return
yield s

then change "for line in f" to "for line in f.getblocks()".

I actually think an iterator like the above should be added to the
stdlib, since the "for line in f" idiom is widely used and sometimes
inadvisable, like the fixed sized buffers in those old C programs
that led to buffer overflow bugs.
May 14 '06 #7

When I run the script, I get an error that the file object does not have
the attribute getblocks.

Did you mean this instead?

def getblocks(f, blocksize=1024):
while True:
s = f.read(blocksize)
if not s: return
yield s

def getsum(self):
md5sum = md5.new()
f = open(self.file_name, 'rb')
for line in getblocks(f) :
md5sum.update(line)
f.close()
return md5sum.hexdigest()

May 14 '06 #8
Am Sonntag 14 Mai 2006 20:51 schrieb Andrew Robert:
def getblocks(f, blocksize=1024):
while True:
s = f.read(blocksize)
if not s: return
yield s


This won't work. The following will:

def getblocks(f,blocksize=1024):
while True:
s = f.read(blocksize)
if not s: break
yield s

--- Heiko.
May 14 '06 #9
Andrew Robert <an************@gmail.com> writes:
When I run the script, I get an error that the file object does not have
the attribute getblocks.


Woops, yes, you have to call getblocks(f). Also, Heiko says you can't
use "return" to break out of the generator; I thought you could but
maybe I got confused.
May 14 '06 #10
Am Sonntag 14 Mai 2006 22:29 schrieb Paul Rubin:
Andrew Robert <an************@gmail.com> writes:
When I run the script, I get an error that the file object does not have
the attribute getblocks.


Woops, yes, you have to call getblocks(f). Also, Heiko says you can't
use "return" to break out of the generator; I thought you could but
maybe I got confused.


Yeah, you can. You can't return <arg> in a generator (of course, this raises a
SyntaxError), but you can use return to generate a raise StopIteration. So,
it wasn't you who was confused... ;-)

--- Heiko.
May 14 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Sean W. Quinn | last post by:
Hey folks, I have a question regarding file handling, and the preservation of class structure. I have a class (and I will post snippets of code later in the post) with both primitive data...
18
by: Andre Laplume via AccessMonster.com | last post by:
I have inherited a bunch of dbs which are are shared among a small group in my dept. We typically use the dbs to write queries to extract data, usually dumping it into Excel. Most dbs originated...
10
by: jojobar | last post by:
Hello, I am trying to use vs.net 2005 to migrate a project originally in vs.net 2003. I started with creation of a "web site", and then created folders for each component of the site. I read...
6
by: Jon | last post by:
All, I'm working in a fairly robust content management system for our company's websites, and have a question regarding the file and directory structure of the site. Currently, I'm populating...
1
by: Terrance | last post by:
I was wondering if someone can help me with my question regarding the configuration system functionality in the .NET Framework 2.0 for VB. My question is, if I have a application configuration...
42
by: mellyshum123 | last post by:
I need to read in a comma separated file, and for this I was going to use fgets. I was reading about it at http://www.cplusplus.com/ref/ and I noticed that the document said: "Reads characters...
1
by: archana | last post by:
Hi all, I am new to asp.net. I have one question regarding code- behind model. I have written page_load event in code behind as well as in aspx page. Means i am doing mixing of code-behind...
1
by: Nathan | last post by:
Hello, I have a programming assignment due for my programming class. Basically, I have to code the program in visual studio..then I have to paste the code and the output into microsoft...
54
by: shuisheng | last post by:
Dear All, I am always confused in using constants in multiple files. For global constants, I got some clues from http://msdn.microsoft.com/en-us/library/0d45ty2d(VS.80).aspx So in header...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.