473,657 Members | 2,800 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Calculate sha1 hash of a binary file

Hi -

I'm trying to calculate unique hash values for binary files,
independent of their location and filename, and I was wondering
whether I'm going in the right direction.

Basically, the hash values are calculated thusly:

f = open('binaryfil e.bin')
import hashlib
h = hashlib.sha1()
h.update(f.read ())
hash = h.hexdigest()
f.close()

A quick try-out shows that effectively, after renaming a file, its
hash remains the same as it was before.

I have my doubts however as to the usefulness of this. As f.read()
does not seem to read until the end of the file (for a 3.3MB file only
a string of 639 bytes is being returned, perhaps a 00-byte counts as
EOF?), is there a high danger for collusion?

Are there better ways of calculating hash values of binary files?

Thanks in advance,

Mathieu
Aug 6 '08 #1
6 11363
LaundroMat wrote:
Hi -

I'm trying to calculate unique hash values for binary files,
independent of their location and filename, and I was wondering
whether I'm going in the right direction.

Basically, the hash values are calculated thusly:

f = open('binaryfil e.bin')
import hashlib
h = hashlib.sha1()
h.update(f.read ())
hash = h.hexdigest()
f.close()

A quick try-out shows that effectively, after renaming a file, its
hash remains the same as it was before.

I have my doubts however as to the usefulness of this. As f.read()
does not seem to read until the end of the file (for a 3.3MB file only
a string of 639 bytes is being returned, perhaps a 00-byte counts as
EOF?), is there a high danger for collusion?
Guess: you're running on Windows?

You need to open binary files by using open ("filename", "rb")
to indicate that Windows shouldn't treat certain characters --
specifically character 26 -- as special.

TJG
Aug 6 '08 #2

On Wed, 2008-08-06 at 12:31 -0700, LaundroMat wrote:
Hi -

I'm trying to calculate unique hash values for binary files,
independent of their location and filename, and I was wondering
whether I'm going in the right direction.

Basically, the hash values are calculated thusly:

f = open('binaryfil e.bin')
import hashlib
h = hashlib.sha1()
h.update(f.read ())
hash = h.hexdigest()
f.close()

A quick try-out shows that effectively, after renaming a file, its
hash remains the same as it was before.

I have my doubts however as to the usefulness of this. As f.read()
does not seem to read until the end of the file (for a 3.3MB file only
a string of 639 bytes is being returned, perhaps a 00-byte counts as
EOF?), is there a high danger for collusion?

Are there better ways of calculating hash values of binary files?

Thanks in advance,

Mathieu
--
http://mail.python.org/mailman/listinfo/python-list
Looks like you're doing the right thing from here. file.read( ) with no
size parameter will always return the whole file (for completeness, I'll
mention that the documentation warns this is not the case if the file is
in non-blocking mode, which you're not doing).

Python never treats null bytes as special in strings, so no, you're not
getting an early EOF due to that.

I wouldn't worry about your hashing code, that looks fine, if I were you
I'd try and figure out what's going wrong with your file handles. I
would suspect that in where ever you saw your short read, you were
likely not opening the file in the correct mode or did not rewind the
file ( with file.seek( 0 ) ) after having previously read data from it.

You'll be fine if you use the code above as is, there's no problems I
can see with it.
--
John Krukoff <jk******@ltgc. com>
Land Title Guarantee Company

Aug 6 '08 #3
LaundroMat <La*****@gmail. comwrites:
Hi -

I'm trying to calculate unique hash values for binary files,
independent of their location and filename, and I was wondering
whether I'm going in the right direction.

Basically, the hash values are calculated thusly:

f = open('binaryfil e.bin')
import hashlib
h = hashlib.sha1()
h.update(f.read ())
hash = h.hexdigest()
f.close()

A quick try-out shows that effectively, after renaming a file, its
hash remains the same as it was before.

I have my doubts however as to the usefulness of this. As f.read()
does not seem to read until the end of the file (for a 3.3MB file only
a string of 639 bytes is being returned, perhaps a 00-byte counts as
EOF?), is there a high danger for collusion?

Are there better ways of calculating hash values of binary files?

Apart from opening the file in binary mode, I would consider to read
and update the hash in chunks of e.g. 512 KB. The above code is
probably going to perform horribly for sufficiently large files, since
you try read the entire file into memory.
Best,

-Nikolaus

--
»It is not worth an intelligent man's time to be in the majority.
By definition, there are already enough people to do that.«
-J.H. Hardy

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
Aug 7 '08 #4
Thanks all!
Aug 7 '08 #5
I did some testing, and calculating the hash value of a 1Gb file does
take some time using this method.
Would it be wise to calculate the hash value based on say for instance
the first Mb? Is there a much larger chance of collusion this way (I
suppose not). If it's helpful, the files would primarily be media
(video) files.

Thanks,

Mathieu
Aug 7 '08 #6
LaundroMat <La*****@gmail. comwrites:
Would it be wise to calculate the hash value based on say for instance
the first Mb? Is there a much larger chance of collusion this way (I
suppose not). If it's helpful, the files would primarily be media
(video) files.
The usual purpose of using this type of hash is to detect corruption
and/or tampering. So you want to hash the whole file, not just part
of it. If you're not worried about intentional tampering, md5 should
be somewhat faster than sha, but there are some attacks against it
and you shouldn't use it for high security applications where you
want security against forgery. It should still have almost no chance
of accidental collisions.
Aug 7 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
45035
by: Randell D. | last post by:
Folks, I use md5 hash with some of my cookies and occassionally a hidden form field - I know the physical data on my network is insecure (unless being served via https) but I was wondering if there are any advantages to using md5 over sha1 or versa vicea... I know md5 gives me a unique 32bit hash while sha1 I've read is 'secure' (?) and gives a 40bit hash... Since The technical webpage on sha1 is lengthy and for the most part over my...
0
1701
by: Paul Rubin | last post by:
FYI. From <http://www.schneier.com/blog/archives/2005/02/sha1_broken.html>: The research team of Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu (mostly from Shandong University in China) have been quietly circulating a paper announcing their results: * collisions in the the full SHA-1 in 2**69 hash operations, much less than the brute-force attack of 2**80 operations based on the hash length.
1
13598
by: sg_s123 | last post by:
============================================================================ 02-Feb-04 03-Feb-04 Staff Staff 0800hr- 1300hr- 1700hr- 1900hr- 0800hr- 1300hr- 1700hr- 1900hr- Number Name 1200hr 1700hr 1900hr 2200hr 1200hr 1700hr 1900hr 2200hr ============================================================================ 4654 Saniff A A A C A A A C 4437 Joey B C B C B C B C 4479 Elmer C C B C C C B C
1
8610
by: TOI DAY | last post by:
Hi All, This is what I want to do. Support I have two file name abc.txt, xyz.txt I did created md5 hash for abc.txt and store it some directory Here is how I create md5 for the abc.txt FileStream file1 = new FileStream(abc.txt, FileMode.Open, FileAccess.Read); MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
6
4460
by: Chang | last post by:
How to get SHA1 or MD5 of a big file (+5MB - 20GB) as I can't read 20GB into memory. -- Chang.
0
5156
by: Dil via .NET 247 | last post by:
Fresher to .NET Aiming to produce a resulting hash of length 24 CHARACTERS, using MD5 or SHA1 Algorithms. According to the Class Libraries, the hash size for the SHA1 algorithm is 160 bits, and 128 bits for MD5. After generating the hash, I convert the byte result to a base64 String, and my resulting hash lengths are as below: MD5 - 24 characters SHA1 - 28 Characters
8
43264
by: Adam Carpenter | last post by:
Hello, I have my users passwords stored to my DB hashs created using SHA1CryptoServiceProvider, here is the function: Public Shared Function EncryptPassword(ByVal password As String) As Byte() Dim encoding As New UnicodeEncoding() Dim hashBytes As Byte() = encoding.GetBytes(password) ' Compute the SHA-1 hash Dim sha1 As New SHA1CryptoServiceProvider()
3
2698
by: DurumDara | last post by:
Hi ! I need to speedup my MD5/SHA1 calculator app that working on filesystem's files. I use the Python standard modules, but I think that it can be faster if I use C, or other module for it. I use FSUM before, but I got problems, because I "move" into "DOS area", and the parameterizing of outer process maked me very angry (not working). You will see this in this place:
8
5607
by: sathyashrayan | last post by:
Dear group, For a log-in page I have created a mysql db and user registers with a user name and password. The password field is encrypted with $passwd = sha1($_REQUEST); I insert the $passwd in mysql_insert. The password gets encrypted and stored in mysql. Now I want to check if the user has entered the correct password when he logs in. How can I do that. Any
0
8403
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8833
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8737
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
7345
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5636
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4327
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2735
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1967
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1730
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.