By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,222 Members | 2,354 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,222 IT Pros & Developers. It's quick & easy.

MD5 module Pythonicity

P: n/a
Hi folks

Recently I have been discussing with a friend about python ease of
use, and it is really good at this. This friend needed to calculate
the MD5 hash of some files and was telling me about the MD5 module.
The way he told me and how it is described in the Python Docs, the
method to calculate hashes did not seemed very pythonic to me, but it
was certainly very simple and easy:

The method is (taken from python official documentation):
import md5
m = md5.new()
m.update("Nobody inspects")
m.update(" the spammish repetition")
m.digest() '\xbbd\x9c\x83\xdd\x1e\xa5\xc9\xd9\xde\xc9\xa1\x8d \xf0\xff\xe9'

The idea to use this for files is: open file, take little chunks of
the file, call update for each one, and when you are done reading the
file, call digest. Well, OK, it is very simples and easy.
But wouldn't it be more pythonic if it did exist some kind of
md5.calculate_from_file("file") ?!
This way, you wouldn't have to split the file by yourself (this
function proposed would do this for you etc) and would make code a lot
more readable:
import md5
md5.calculate_from_file("/home/foo/bar.bz2")
or something like this. (Maybe passing to the md5 calculate_from_file
the open file object, instead of the string)

One alternative also shown in the documentation is to do everything at once:
import md5
md5.new("Nobody inspects the spammish repetition").digest()


Well, OK, this one is a bit more readable (it is not as good as I
think it could be), but has the disadvantage of having to load the
WHOLE file to memory.

What's wrong in having a function like the one I said, that would
split files for you, feed md5.update and, when it is over, return the
digest?
It is easier, doesn't require MD5 objects creation, works well on
small and big files, makes the code more readable and simple. Also,
calculating MD5 of files seems to be a common enough task to be put in
the library (well, at least on GNU/Linux we have one command just for
this - md5sum)

"Although practicality beats purity."
"Readability counts."
"Beautiful is better than ugly."

Have I got the wrong "Pythonic" definition?

--
Thanks in advance
Regards
Leandro Lameiro
Oct 15 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Leandro Lameiro <la*****@gmail.com> writes:
What's wrong in having a function like the one I said, that would
split files for you, feed md5.update and, when it is over, return the
digest?
Nothing in particular; it's just a trivial thing to write. If you add
every usefull utility function to the standard library, you wind up
with a multi-thousand page library documentation. The line has to be
drawn somewhere.
It is easier, doesn't require MD5 objects creation, works well on
small and big files, makes the code more readable and simple. Also,
calculating MD5 of files seems to be a common enough task to be put in
the library (well, at least on GNU/Linux we have one command just for
this - md5sum)


Wanting to sum a file at the command line isn't that uncommon, so a
utility makes some sense. But how often does a program need the md5
sum of a file? Especially compared to how often it wants to take the
md5 sum of some string that isn't in a file?

It might be worth adding. Except that md5 isn't really trusted
anymore; you really want to be using sha1 (and you presumably have an
sha1sum utility. FreeBSD has md5 and sha1 commands). But the sha
module doesn't have a file handler either.

You might try posting a patch to sourceforge, and see if it gets
accepted.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Oct 15 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.