468,133 Members | 1,221 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,133 developers. It's quick & easy.

Help wanted with md2 hash algorithm

hi all,

below you find my simple python version of MD2 algorithm
as described in RFC1319 (http://rfc1319.x42.com/MD2).
It produces correct results for strings shorter than 16 Bytes and wrong
results for longer strings.

I can't find what's wrong.

Can anybody help?

Regards
Wolfgang

-------------------------------------

#--- MD2 validation data
md2_test = [
('', '8350e5a3e24c153df2275c9f80692773'),
("a", '32ec01ec4a6dac72c0ab96fb34c0b5d1'),
("abc", 'da853b0d3f88d99b30283a69e6ded6bb'),
("message digest", 'ab4f496bfb2a530b219ff33031fe06b0'),
("abcdefghijklmnopqrstuvwxyz",
'4e8ddff3650292ab5a4108c3aa47940b'),
("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz0123456789",
'da33def2a42df13975352846c30338cd'),

("123456789012345678901234567890123456789012345678 90123456789012345678901234567890",

'd5976f79d83d3a0dc9806c3c66f3efd8' )
]
#--- 256-byte "random" permutation constructed from the digits of pi
PI_SUBST = [41, 46, 67, 201, 162, 216, 124, 1, 61, 54, 84, 161, 236,
240, 6,
19, 98, 167, 5, 243, 192, 199, 115, 140, 152, 147, 43, 217, 188,
76, 130, 202, 30, 155, 87, 60, 253, 212, 224, 22, 103, 66, 111, 24,
138, 23, 229, 18, 190, 78, 196, 214, 218, 158, 222, 73, 160, 251,
245, 142, 187, 47, 238, 122, 169, 104, 121, 145, 21, 178, 7, 63,
148, 194, 16, 137, 11, 34, 95, 33, 128, 127, 93, 154, 90, 144, 50,
39, 53, 62, 204, 231, 191, 247, 151, 3, 255, 25, 48, 179, 72, 165,
181, 209, 215, 94, 146, 42, 172, 86, 170, 198, 79, 184, 56, 210,
150, 164, 125, 182, 118, 252, 107, 226, 156, 116, 4, 241, 69, 157,
112, 89, 100, 113, 135, 32, 134, 91, 207, 101, 230, 45, 168, 2, 27,
96, 37, 173, 174, 176, 185, 246, 28, 70, 97, 105, 52, 64, 126, 15,
85, 71, 163, 35, 221, 81, 175, 58, 195, 92, 249, 206, 186, 197,
234, 38, 44, 83, 13, 110, 133, 40, 132, 9, 211, 223, 205, 244, 65,
129, 77, 82, 106, 220, 55, 200, 108, 193, 171, 250, 36, 225, 123,
8, 12, 189, 177, 74, 120, 136, 149, 139, 227, 99, 232, 109, 233,
203, 213, 254, 59, 0, 29, 57, 242, 239, 183, 14, 102, 88, 208, 228,
166, 119, 114, 248, 235, 117, 75, 10, 49, 68, 80, 180, 143, 237,
31, 26, 219, 153, 141, 51, 159, 17, 131, 20]
PADDING = ["".join(map(chr, [i]*i)) for i in range(17)]
SIZE = 16

#----------------------------------------------------------
def md2(m):

## (1) prepare message

#--- append to m i byte with value i, len(m) % 16 == 0
padLen = SIZE - len(m) % SIZE
m += PADDING[padLen]

#--- compute checksum C of m and append it to m
C = [0] * SIZE
L = 0
for i in range(len(m) / SIZE):
m16 = m[i*SIZE : (i+1)*SIZE]
for j in range(SIZE):
c = ord(m16[j])
C[j] = PI_SUBST[ c ^ L ]
L = C[j]
C = "".join( map(chr, C) )
m += C

## (2) compress message

X = [0] * 48 # 'compressor'

for i in range(len(m) / SIZE):

# fill X
m16 = m[i*SIZE : (i+1)*SIZE]
X[16:32] = map(ord, m16)
X[32:48] = [ a^b for (a,b) in zip(X[16:48], X[:16]) ]

# compress m
t = 0
for j in range(18):
for k in range(48):
t = X[k] ^ PI_SUBST[t]
X[k] = t
t = (t+j) % 256

X = "".join(map(lambda d: "%02x" % d, X[:SIZE]))
return X
def test():
for (i, j) in md2_test:
md = md2(i)
print "Message: %s" % i
print "My MD:%s" % md
print "Test MD:%s" % j
print "%s" % (md== j)
print

if __name__ == "__main__":
test()

Jan 7 '06 #1
12 3882
On Fri, 6 Jan 2006 wj****@web.de wrote:
below you find my simple python version of MD2 algorithm
as described in RFC1319 (http://rfc1319.x42.com/MD2).
It produces correct results for strings shorter than 16 Bytes and wrong
results for longer strings.

I can't find what's wrong.

Can anybody help?


Okay, i've reimplemented the code from scratch, based on the RFC, without
even looking at your code, as a basis for comparison.

The trouble is, i get exactly the same results as you!

Here's mine:

http://urchin.earth.li/~twic/md2.py

I guess the thing to do is extract the C code from the RFC and compile it,
verify that it works, then stick loads of print statements in the C and
the python, to see where the states of the checksum engines diverge.

tom

--
Death to all vowels! The Ministry of Truth says vowels are plus
undoublethink. Vowels are a Eurasian plot! Big Brother, leading us proles
to victory!
Jan 8 '06 #2
wj****@web.de writes:
below you find my simple python version of MD2 algorithm
as described in RFC1319 (http://rfc1319.x42.com/MD2).
It produces correct results for strings shorter than 16 Bytes and wrong
results for longer strings.


Why do you want to use MD2? It's very slow and it's also been
deprecated for security reasons. Use SHA1 (built into Python library)
or SHA256/384/512 (implementations are circulating) instead.
Jan 8 '06 #3

Paul Rubin wrote:
wj****@web.de writes:
below you find my simple python version of MD2 algorithm
as described in RFC1319 (http://rfc1319.x42.com/MD2).
It produces correct results for strings shorter than 16 Bytes and wrong
results for longer strings.


Why do you want to use MD2? It's very slow and it's also been
deprecated for security reasons. Use SHA1 (built into Python library)
or SHA256/384/512 (implementations are circulating) instead.


I want to understand it, and -- therefor ;-) -- I want to implement it
in pure Pyhton.

Jan 9 '06 #4
wj****@web.de writes:
I want to understand it, and -- therefor ;-) -- I want to implement it
in pure Pyhton.


OK. It should be pretty easy to implement. You should find the
official rfc at ietf.org. I remember there was some minor erratum in
the original version that may or may not have been corrected, but it
shouldn't cause any serious confusion. I implemented it in Javascript
a long time ago and as I remember, all the test vectors passed.
Jan 9 '06 #5

Paul Rubin wrote:
wj****@web.de writes:
I want to understand it, and -- therefor ;-) -- I want to implement it
in pure Pyhton.


OK. It should be pretty easy to implement. You should find the
official rfc at ietf.org. I remember there was some minor erratum in
the original version that may or may not have been corrected, but it
shouldn't cause any serious confusion. I implemented it in Javascript
a long time ago and as I remember, all the test vectors passed.


I thought I had build a proper implementation in Python. The error you
mention can be avoided by studying the C implementation in RFC 1319.
BUT: Some of the test vectors failed. That's my problem ;-(
And therefore I asked for help.
Wolfgang

Jan 9 '06 #6
wj****@web.de writes:
I thought I had build a proper implementation in Python. The error you
mention can be avoided by studying the C implementation in RFC 1319.
BUT: Some of the test vectors failed. That's my problem ;-(
And therefore I asked for help.


You might check PyCrypt against the test vectors.

http://www.amk.ca/python/writing/pycrypt/
Jan 10 '06 #7

Paul Rubin wrote:
wj****@web.de writes:
I thought I had build a proper implementation in Python. The error you
mention can be avoided by studying the C implementation in RFC 1319.
BUT: Some of the test vectors failed. That's my problem ;-(
And therefore I asked for help.


You might check PyCrypt against the test vectors.

http://www.amk.ca/python/writing/pycrypt/


Already done before my first posting. But the problem was there. I
studied the C sources of MD2 of that package, too. But all test cases
failed.

Jan 10 '06 #8

Paul Rubin wrote:
wj****@web.de writes:
I thought I had build a proper implementation in Python. The error you
mention can be avoided by studying the C implementation in RFC 1319.
BUT: Some of the test vectors failed. That's my problem ;-(
And therefore I asked for help.


You might check PyCrypt against the test vectors.

http://www.amk.ca/python/writing/pycrypt/


Already done before my first posting. But the problem was there. I
studied the C sources of MD2 of that package, too. But all test cases
with more than 16 bytes failed.
Wolfgang

Jan 10 '06 #9
wj****@web.de writes:
Already done before my first posting. But the problem was there. I
studied the C sources of MD2 of that package, too. But all test cases
with more than 16 bytes failed.


Hmm, did the test cases work for the RFC 1319 reference code? What
about OpenSSL?

I thought when I did my JS implementation, I checked all the test
vectors, but it was a long time ago and I can't absolutely be sure.
Jan 10 '06 #10
Paul Rubin <http://ph****@NOSPAM.invalid> writes:
Hmm, did the test cases work for the RFC 1319 reference code? What
about OpenSSL?


I just checked OpenSSL and all the test values it computes match the RFC.
Jan 10 '06 #11
On Sun, 8 Jan 2006, Tom Anderson wrote:
On Fri, 6 Jan 2006 wj****@web.de wrote:
below you find my simple python version of MD2 algorithm as described
in RFC1319 (http://rfc1319.x42.com/MD2). It produces correct results
for strings shorter than 16 Bytes and wrong results for longer strings.


I guess the thing to do is extract the C code from the RFC and compile
it, verify that it works, then stick loads of print statements in the C
and the python, to see where the states of the checksum engines diverge.


Okay, i've done this. I had to fiddle with the source a bit - added a
#include "global.h" to md2.h (it needs it for the PROTO_LIST macro) and
took the corresponding includes out of md2c.c and mddriver.c (to avoid
duplicate definitions) - but after that, it built cleanly with:

gcc -DMD=2 *.c *.h -o mddriver

A couple of pairs of (somewhat spurious) parentheses in mddriver.c, and it
even built cleanly with -Wall.

Running the test suite with mddriver -x gives results matching the test
vectors in the RFC - a good start!

Patching the code to dump the checksums immediately after updating with
the pad, and before updating with the checksum:

*** checksum after padding = 623867b6af52795e5f214e9720beea8d
MD2 ("") = 8350e5a3e24c153df2275c9f80692773
*** checksum after padding = 19739cada3ba281693348e9d256fff31
MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
*** checksum after padding = 19e29d1b7304368e595a276f302f57cc
MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
*** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b
MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
*** checksum after padding = 4a42d3a377b7e9988fb9289699e4d3a3
MD2 ("abcdefghijklmnopqrstuvwxyz") = 4e8ddff3650292ab5a4108c3aa47940b
*** checksum after padding = c3db7592ee1dd9b84505cfb4e2f9a765
MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz0123456789") = da33def2a42df13975352846c30338cd
*** checksum after padding = 59ca5673c8f931bc41214f56b5c6c01
MD2 ("123456789012345678901234567890123456789012345678 90123456789012345678901234567890") = d5976f79d83d3a0dc9806c3c66f3efd8

And here's my python code with the same modification, running the test
suite:

*** checksum after padding = 623867b6af52795e5f214e9720beea8d
MD2 ("") = 8350e5a3e24c153df2275c9f80692773
*** checksum after padding = 19739cada3ba281693348e9d256fff31
MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
*** checksum after padding = 19e29d1b7304368e595a276f302f57cc
MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
*** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b
MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
*** checksum after padding = 539ba695f264f365bcabc5c8b10913c7
MD2 ("abcdefghijklmnopqrstuvwxyz") = 65182bb8c569485fcba44dbc66a02b56
*** checksum after padding = 365fe0617f5f56a56090af1cfd6caac3
MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz0123456789") = a1ccc835ea9654d6a2926c21f0b20813
*** checksum after padding = 9acf39425d22c4e3b4ddbdc563d23716
MD2 ("123456789012345678901234567890123456789012345678 90123456789012345678901234567890") = 8f1f49dc8de490b9aa7c99cec3fbccdf

As you can see, the checksums start to go wrong when we hit 16 bytes.

So, let us turn our attention to the checksum function.

Here's the python i wrote:

def checksum_old(c, buf): # c is checksum array, buf is input block
l = c[-1]
for i in xrange(digest_size):
l = S[(buf[i] ^ l)]
c[i] = l

Here's the C from the RFC:

unsigned int i, j, t;
t = checksum[15];
for (i = 0; i < 16; i++)
t = checksum[i] ^= PI_SUBST[block[i] ^ t];

Spot the difference. Yes, the assignment into the checksum array is a ^=,
not a straight = - checksum bytes get set to
current-value-of-checksum-byte xor S-box-transformation-of (input-byte xor
accumulator). Translating that into python, we get:

def checksum(c, buf):
l = c[-1]
for i in xrange(digest_size):
l = S[(buf[i] ^ l)] ^ c[i]
c[i] = l

And when we put that back into the code, we get the right digests out.
Victory!

However, here's what the pseudocode in the RFC says:

For j = 0 to 15 do
Set c to M[i*16+j].
Set C[j] to S[c xor L].
Set L to C[j].
end /* of loop on j */

I certainly don't see any sign of a xor with the
current-value-of-checksum-byte in there - it looks like the C and
pseudocode in the RFC don't match up.

And, yes, googling for "RFC 1319 errata" brings up a report correcting
this. They really ought to amend RFCs to mention errata!

Correct code here:

http://urchin.earth.li/~twic/md2.py

tom

--
Mathematics is the door and the key to the sciences. -- Roger Bacon
Jan 10 '06 #12

Tom Anderson wrote:
On Sun, 8 Jan 2006, Tom Anderson wrote:
On Fri, 6 Jan 2006 wj****@web.de wrote:
below you find my simple python version of MD2 algorithm as described
in RFC1319 (http://rfc1319.x42.com/MD2). It produces correct results
for strings shorter than 16 Bytes and wrong results for longer strings.


I guess the thing to do is extract the C code from the RFC and compile
it, verify that it works, then stick loads of print statements in the C
and the python, to see where the states of the checksum engines diverge.


Okay, i've done this. I had to fiddle with the source a bit - added a
#include "global.h" to md2.h (it needs it for the PROTO_LIST macro) and
took the corresponding includes out of md2c.c and mddriver.c (to avoid
duplicate definitions) - but after that, it built cleanly with:

gcc -DMD=2 *.c *.h -o mddriver

A couple of pairs of (somewhat spurious) parentheses in mddriver.c, and it
even built cleanly with -Wall.

Running the test suite with mddriver -x gives results matching the test
vectors in the RFC - a good start!

Patching the code to dump the checksums immediately after updating with
the pad, and before updating with the checksum:

*** checksum after padding = 623867b6af52795e5f214e9720beea8d
MD2 ("") = 8350e5a3e24c153df2275c9f80692773
*** checksum after padding = 19739cada3ba281693348e9d256fff31
MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
*** checksum after padding = 19e29d1b7304368e595a276f302f57cc
MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
*** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b
MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
*** checksum after padding = 4a42d3a377b7e9988fb9289699e4d3a3
MD2 ("abcdefghijklmnopqrstuvwxyz") = 4e8ddff3650292ab5a4108c3aa47940b
*** checksum after padding = c3db7592ee1dd9b84505cfb4e2f9a765
MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz0123456789") = da33def2a42df13975352846c30338cd
*** checksum after padding = 59ca5673c8f931bc41214f56b5c6c01
MD2 ("123456789012345678901234567890123456789012345678 90123456789012345678901234567890") = d5976f79d83d3a0dc9806c3c66f3efd8

And here's my python code with the same modification, running the test
suite:

*** checksum after padding = 623867b6af52795e5f214e9720beea8d
MD2 ("") = 8350e5a3e24c153df2275c9f80692773
*** checksum after padding = 19739cada3ba281693348e9d256fff31
MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
*** checksum after padding = 19e29d1b7304368e595a276f302f57cc
MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
*** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b
MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
*** checksum after padding = 539ba695f264f365bcabc5c8b10913c7
MD2 ("abcdefghijklmnopqrstuvwxyz") = 65182bb8c569485fcba44dbc66a02b56
*** checksum after padding = 365fe0617f5f56a56090af1cfd6caac3
MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz0123456789") = a1ccc835ea9654d6a2926c21f0b20813
*** checksum after padding = 9acf39425d22c4e3b4ddbdc563d23716
MD2 ("123456789012345678901234567890123456789012345678 90123456789012345678901234567890") = 8f1f49dc8de490b9aa7c99cec3fbccdf

As you can see, the checksums start to go wrong when we hit 16 bytes.

So, let us turn our attention to the checksum function.

Here's the python i wrote:

def checksum_old(c, buf): # c is checksum array, buf is input block
l = c[-1]
for i in xrange(digest_size):
l = S[(buf[i] ^ l)]
c[i] = l

Here's the C from the RFC:

unsigned int i, j, t;
t = checksum[15];
for (i = 0; i < 16; i++)
t = checksum[i] ^= PI_SUBST[block[i] ^ t];

Spot the difference. Yes, the assignment into the checksum array is a ^=,
not a straight = - checksum bytes get set to
current-value-of-checksum-byte xor S-box-transformation-of (input-byte xor
accumulator). Translating that into python, we get:

def checksum(c, buf):
l = c[-1]
for i in xrange(digest_size):
l = S[(buf[i] ^ l)] ^ c[i]
c[i] = l

And when we put that back into the code, we get the right digests out.
Victory!

However, here's what the pseudocode in the RFC says:

For j = 0 to 15 do
Set c to M[i*16+j].
Set C[j] to S[c xor L].
Set L to C[j].
end /* of loop on j */

I certainly don't see any sign of a xor with the
current-value-of-checksum-byte in there - it looks like the C and
pseudocode in the RFC don't match up.

And, yes, googling for "RFC 1319 errata" brings up a report correcting
this. They really ought to amend RFCs to mention errata!

Correct code here:

http://urchin.earth.li/~twic/md2.py

tom

--
Mathematics is the door and the key to the sciences. -- Roger Bacon

Hi Tom,

thank you very much for your analysis and your solution. Great.
(My knowledge of C language is not that good.)
Wolfgang

Jan 12 '06 #13

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

6 posts views Thread by Weiguang Shi | last post: by
7 posts views Thread by JN | last post: by
34 posts views Thread by pembed2003 | last post: by
31 posts views Thread by Extremest | last post: by
6 posts views Thread by naknak | last post: by
5 posts views Thread by lavu | last post: by
9 posts views Thread by smartbei | last post: by
4 posts views Thread by macm | last post: by
27 posts views Thread by didacticone | last post: by
1 post views Thread by gcdp | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.