471,339 Members | 1,242 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,339 software developers and data experts.

MD5CryptoServiceProvider Hashing a split file

Hi,

I am very new to C# and NET framework. I am trying to hash (using
MD5CryptoServiceProvider) a source that is split into several files.

Now when the source is in one file I can produce the correct md5 hash.

My issue is how can I reproduce the correct hash when the file is split
into different files.

Thanks :)

Jun 27 '08 #1
7 4220
John Smith wrote:
I am very new to C# and NET framework. I am trying to hash (using
MD5CryptoServiceProvider) a source that is split into several files.

Now when the source is in one file I can produce the correct md5 hash.

My issue is how can I reproduce the correct hash when the file is split
into different files.
A hash is calculated based on the byte content.

Why does it make the difference whether those bytes are read
from a single file or from multiple files ?

Arne
Jun 27 '08 #2
Arne Vajhj wrote:
John Smith wrote:
> I am very new to C# and NET framework. I am trying to hash (using
MD5CryptoServiceProvider) a source that is split into several files.

Now when the source is in one file I can produce the correct md5 hash.

My issue is how can I reproduce the correct hash when the file is
split into different files.

A hash is calculated based on the byte content.

Why does it make the difference whether those bytes are read
from a single file or from multiple files ?

Arne

Thanks Arne.

I think I might not have explained myself. Let me rephrase it I have no
clue how I to do it. :?

I think best way is to show you my problem with quick example code:

------------------------------------------------------------
MD5CryptoServiceProvider oMD5 = new MD5CryptoServiceProvider();
string sRet;

string s1 = "First String Sample";
string s2 = "Second String Sample";
string s3 = s1 + s2;
byte[] bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s1);
sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Re place("-", string.Empty);
System.Diagnostics.Debug.WriteLine(sRet);

bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s2);
sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Re place("-", string.Empty);
System.Diagnostics.Debug.WriteLine(sRet);

bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s3);
sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Re place("-", string.Empty);
System.Diagnostics.Debug.WriteLine(sRet);
-----------------------------------------------------------------

The output hash is as follows:
s1 = 1EC25881AD012D4CA6E73D1986AE93FB
s2 = D8D46AC432C7251F863C2D5B91FE48FC
s3 = 9E158DDEE697EBAEC2A036F459B02448

Now what I want is basically to be able to hash s1 get the
result and then continue hashing s2 and get the final s3 result.

Right now the only way I know of getting s3 hash is by first
concatenating the strings then running it through ComputeHash.

This isn't much of an issue when the input is a small string, however
if I am trying to hash several files then that is a different matter.
**These files can be large, and the only way I know of doing it, is to
basically combining all the files into a single temporary file and then
passing the stream to ComputeHash.

Surely there has to be a better method.

Any advice?

Thanks





Jun 27 '08 #3
John Smith wrote:
Arne Vajhj wrote:
>John Smith wrote:
>> I am very new to C# and NET framework. I am trying to hash (using
MD5CryptoServiceProvider) a source that is split into several files.

Now when the source is in one file I can produce the correct md5 hash.

My issue is how can I reproduce the correct hash when the file is
split into different files.

A hash is calculated based on the byte content.

Why does it make the difference whether those bytes are read
from a single file or from multiple files ?
I think best way is to show you my problem with quick example code:
Example code is always good.
MD5CryptoServiceProvider oMD5 = new MD5CryptoServiceProvider();
string sRet;

string s1 = "First String Sample";
string s2 = "Second String Sample";
string s3 = s1 + s2;
byte[] bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s1);
sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Re place("-",
string.Empty);
System.Diagnostics.Debug.WriteLine(sRet);

bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s2);
sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Re place("-",
string.Empty);
System.Diagnostics.Debug.WriteLine(sRet);

bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s3);
sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Re place("-",
string.Empty);
System.Diagnostics.Debug.WriteLine(sRet);
-----------------------------------------------------------------

The output hash is as follows:
s1 = 1EC25881AD012D4CA6E73D1986AE93FB
s2 = D8D46AC432C7251F863C2D5B91FE48FC
s3 = 9E158DDEE697EBAEC2A036F459B02448

Now what I want is basically to be able to hash s1 get the
result and then continue hashing s2 and get the final s3 result.

Right now the only way I know of getting s3 hash is by first
concatenating the strings then running it through ComputeHash.

This isn't much of an issue when the input is a small string, however
if I am trying to hash several files then that is a different matter.
**These files can be large, and the only way I know of doing it, is to
basically combining all the files into a single temporary file and then
passing the stream to ComputeHash.
You can not "add" MD5 checksums.

But if you use TransformBlock and TransformFinalBlock instead
of ComputeHash, then you should be able to process small
chunks (like 1 MB or 10 MB) at a time - even coming from
multiple files.

Arne
Jun 27 '08 #4
Arne Vajhj wrote:
John Smith wrote:
>Arne Vajhj wrote:
>>John Smith wrote:
I am very new to C# and NET framework. I am trying to hash (using
MD5CryptoServiceProvider) a source that is split into several files.

Now when the source is in one file I can produce the correct md5 hash.

My issue is how can I reproduce the correct hash when the file is
split into different files.

A hash is calculated based on the byte content.

Why does it make the difference whether those bytes are read
from a single file or from multiple files ?
>I think best way is to show you my problem with quick example code:

Example code is always good.
>MD5CryptoServiceProvider oMD5 = new MD5CryptoServiceProvider();
string sRet;

string s1 = "First String Sample";
string s2 = "Second String Sample";
string s3 = s1 + s2;
byte[] bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s1);
sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Re place("-",
string.Empty);
System.Diagnostics.Debug.WriteLine(sRet);

bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s2);
sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Re place("-",
string.Empty);
System.Diagnostics.Debug.WriteLine(sRet);

bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s3);
sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Re place("-",
string.Empty);
System.Diagnostics.Debug.WriteLine(sRet);
-----------------------------------------------------------------

The output hash is as follows:
s1 = 1EC25881AD012D4CA6E73D1986AE93FB
s2 = D8D46AC432C7251F863C2D5B91FE48FC
s3 = 9E158DDEE697EBAEC2A036F459B02448

Now what I want is basically to be able to hash s1 get the
result and then continue hashing s2 and get the final s3 result.

Right now the only way I know of getting s3 hash is by first
concatenating the strings then running it through ComputeHash.

This isn't much of an issue when the input is a small string, however
if I am trying to hash several files then that is a different matter.
**These files can be large, and the only way I know of doing it, is to
basically combining all the files into a single temporary file and then
passing the stream to ComputeHash.

You can not "add" MD5 checksums.

But if you use TransformBlock and TransformFinalBlock instead
of ComputeHash, then you should be able to process small
chunks (like 1 MB or 10 MB) at a time - even coming from
multiple files.
Example:

using System;
using System.Text;
using System.Security.Cryptography;

namespace E
{
public class Program
{
public static void Main(string[] args)
{
MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
string s1 = "First String Sample";

Console.WriteLine(BitConverter.ToString(md5.Comput eHash(Encoding.UTF8.GetBytes(s1))).Replace("-",
""));
string s2 = "Second String Sample";

Console.WriteLine(BitConverter.ToString(md5.Comput eHash(Encoding.UTF8.GetBytes(s2))).Replace("-",
""));
string s3 = s1 + s2;

Console.WriteLine(BitConverter.ToString(md5.Comput eHash(Encoding.UTF8.GetBytes(s3))).Replace("-",
""));
md5.Initialize();
byte[] garbage = new Byte[1000000];
md5.TransformBlock(Encoding.UTF8.GetBytes(s1), 0,
Encoding.UTF8.GetByteCount(s1), garbage, 0);
md5.TransformFinalBlock(Encoding.UTF8.GetBytes(s2) , 0,
Encoding.UTF8.GetByteCount(s2));

Console.WriteLine(BitConverter.ToString(md5.Hash). Replace("-", ""));
Console.ReadKey();
}
}
}

(it may be possible to optimize it a bit, but it should
show the concept)

Arne
Jun 27 '08 #5
(it may be possible to optimize it a bit, but it should
show the concept)

Arne
Ahhhh. I wish I saw the code before. I actually figured it out after you pointed me to the TransformBlock.
Thanks Arne, you've been a great help. Saved me a lot of time.

Still have one final issue and I don't think it can be solved (easily). That is working out the hash at each stage.

So hash for s1
So hash for s1 + s2
So hash for s1 + s2 + s3
etc...

It seems that I can use the TransformBlock but I am unable to get the current "total" hash of processed chunks.

The only way I can think of doing it is if I can make a copy of the md5 object, which to my understanding is a pain in the butt in C#;

Have any suggestions?

Thx for all the help




Jun 27 '08 #6
John Smith wrote:
Still have one final issue and I don't think it can be solved (easily).
That is working out the hash at each stage.

So hash for s1
So hash for s1 + s2
So hash for s1 + s2 + s3
etc...

It seems that I can use the TransformBlock but I am unable to get the
current "total" hash of processed chunks.

The only way I can think of doing it is if I can make a copy of the md5
object, which to my understanding is a pain in the butt in C#;

Have any suggestions?
I don't think that is possible easily.

I think what I would do was to have to MD5 hashers.

One that I reset for each file and one for total. And
then call both of them with the data.

I know that MD5(individual) and MD5(total) is not the
same as MD5(accumulate(individual)) and MD5(total), but
it may be OK.

Arne

Jun 27 '08 #7
Thanks. I think it would have to be separate hashers like you said.
Jun 27 '08 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by snowteo | last post: by
2 posts views Thread by Viktor Popov | last post: by
11 posts views Thread by Wm. Scott Miller | last post: by
8 posts views Thread by Maya | last post: by
2 posts views Thread by =?Utf-8?B?TW91dGhPZk1hZG5lc3M=?= | last post: by
1 post views Thread by Tinku | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.