473,581 Members | 2,314 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

SHA-based subclass for random module

http://www.nightsong.com/phr/python/sharandom.c

This is intended to be less predicable/have fewer correlations than
the default Mersenne Twister or Wichmann-Hill generators. Comments
are appreciated.
Jul 18 '05 #1
28 3671
[Paul Rubin]
This is intended to be less predicable/have fewer correlations than
the default Mersenne Twister or Wichmann-Hill generators. Comments
are appreciated.


Because SHA-1 is a digest function, additional input can be tacked on
without impacting invertibility. That gives you an opportunity to
incorporate the Mersenne Twister to take advantage of its provably
long period.

Here is a rough cut that extracts a 53 bit float on every pass and
leaves the remaining 107 bits of state hidden:

import struct
import sha
import random

def srandom(seed):
gen = sha.new(seed)
digest = gen.digest
update = gen.update
random.seed(see d)
mt = random.random
unpack = struct.unpack
tofloat = 1.0 / 2 ** 53
while 1:
state = digest()
randint = unpack('Q', state[:8])[0] >> 11 # grab 53 bits
yield randint * tofloat
update(state)
update(str(mt() )) # twister never hurts, but can help period

r = srandom('the quick brown fox jumped')
for i in xrange(10):
print r.next()
Raymond Hettinger
Jul 18 '05 #2
py****@rcn.com (Raymond Hettinger) writes:
Because SHA-1 is a digest function, additional input can be tacked on
without impacting invertibility. That gives you an opportunity to
incorporate the Mersenne Twister to take advantage of its provably
long period.


I don't see any point to including Mersenne Twister output in the SHA
update. If you want to make sure of a long period, it's easier to
just mix in a simple counter. Note that the generator I posted should
have period about 2**160 since in order to repeat, two different SHA
states would have to repeat simultaneously.
Jul 18 '05 #3
[Paul Rubin]
This is intended to be less predicable/have fewer correlations than
the default Mersenne Twister or Wichmann-Hill generators. Comments
are appreciated.


I offer this as an alternative:
from random import Random
from struct import unpack
import md5

class MD5Random(Rando m):
def newrandom(self, tofloat = 1.0 / 2 ** 53):
plaintxt = str(Random.rand om(self))
ciphertxt = md5.new(plaintx t).digest()
randint = unpack('<Q', ciphertxt[:8])[0] >> 11 # grab 53 bits
return randint * tofloat
Advantages over the original:

* Much faster

* Much shorter code

* More readable code

* Threadsafe

* Takes advantage of MT's proven equidistributio n, of its passing
known tests for randomness, and of its known period (in contrast,
SHA-1 was designed as a digest that makes it computationally
intractable to find a different message giving the same signature --
that does *not* imply the non-existence of short cycles for some
keys).

* It uses both MD5 and the Mersenne Twister as they were designed (in
contrast, my google searches show *no* research on using SHA-1 in OFB
and reapplying SHA-1 again to conceal the output).
Raymond Hettinger
Jul 18 '05 #4
Correction to the last posting.
The method name should have been random() instead of newrandom().
Raymond
Jul 18 '05 #5
py****@rcn.com (Raymond Hettinger) writes:
I offer this as an alternative:

from random import Random
from struct import unpack
import md5
Why md5 instead of sha?
class MD5Random(Rando m):
def newrandom(self, tofloat = 1.0 / 2 ** 53):
plaintxt = str(Random.rand om(self))
What's Random.random(s elf) supposed to do?
ciphertxt = md5.new(plaintx t).digest()
I think you mean update.
randint = unpack('<Q', ciphertxt[:8])[0] >> 11 # grab 53 bits
return randint * tofloat

Advantages over the original:

* Much faster
Yes, other version wasn't optimized. Both call the hash function
approx once per output though.
* Much shorter code
But doesn't support all the operations specified in the Random API.
What's up with that?
* More readable code
Perhaps so. It took me a little while to figure out in the previous
version that the security came from revealing only part of the output
of each hash (107 secret bits for SHA). In the md5 version there's
only 75 secret bits, which is cutting things a little close. It may
be harder to do a concrete security proof for this construction too
(I'm not sure, I've never done one).
* Threadsafe
Always a plus.
* Takes advantage of MT's proven equidistributio n, of its passing
known tests for randomness, and of its known period (in contrast,
SHA-1 was designed as a digest that makes it computationally
intractable to find a different message giving the same signature --
that does *not* imply the non-existence of short cycles for some
keys).
I'm missing something here--if you're just trying to avoid possible
cycles, why do you want to use MT instead of a sequential counter?
The counter is guaranteed to never cycle (unless you overflow the 512
bit compression function input, which won't happen in the life of the
universe).

Note that while there's probably some short cycles in the double-SHA
construction, the chance of finding one is infinitesimally small (it
means finding a 320 bit birthday collision). Hitting one by accident
is comparable to guessing an AES key by accident. Finding one by
brute force or by cryptanalysis is again comparable to breaking AES.
* It uses both MD5 and the Mersenne Twister as they were designed (in
contrast, my google searches show *no* research on using SHA-1 in OFB
and reapplying SHA-1 again to conceal the output).


Applying a collision resistant hash function to itself is a standard
construction for making a PRF and is the basis of HMAC. I believe
the relevant paper is

Bellare, M., Canetti, R., and Krawczyk, H., "Pseudorand om Functions
Revisited: The Cascade Construction". Proc. of the 37th IEEE Symp. on
Foundation of Computer Science, 1996, webbed (PostScript) at:

http://www.research.ibm.com/security/bck1.ps

but I haven't slogged all the way through it yet. It has a concrete
security proof and I've always found those things incredibly fussy and
pedantic compared to normal math papers. I haven't yet figured out
the reason why they're like that, or if they're really need to be.
I'd like to attempt to do one for this generator though, once we
settle on an algorithm.

See also the other references to RFC 2104, notably Krawczyk et al's
original HMAC paper from Crypto 96.

My guess is the reason you haven't found any publications specific to
SHA1 with this construction is that nobody has found any results
against SHA1. As far as anyone can tell, the output is
undistinguishab le from random.
Jul 18 '05 #6
Since the size of plaintext is only 2^53, can't I just calculate
all 2^53 md5 values in advance, and invert the output of MD5Random to
get MT outputs, then attack MT just like any LFSR?

A time-space tradeoff could be made between the number of precalculated
hashes and the number of outputs from the MD5Random PRNG needed, too.
A quick search on the internet indicates that about 2n bits are needed
to find the initial state of an n-bit LSFR[1], so something like 40000 bits
would be needed to find the internal state of MT, or 380 53-bit outputs.

So let's say you can precalculate 2^30 MD5 inverses, the chance of any
particular output being in the table is 2^-23. So you'd need something
like 2^31 outputs from MD5Random(). I'm sure the analysis is also
complicated by the fact that your 2n bits won't be consecutive outputs,
but will be widely separated, but it's likely to remain possible.

2^53 is a big number, but it's well less than the 2^128 range for
all MD5 signatures or the 2^64 signatures needed to find a collision.
2^30 storage and 2^31 calls to random is something many of us could at
work, if not at home.

IANAC,
Jeff
[1] http://www.ciphersbyritter.com/RES/COMBCORR.HTM
"(The Berlekamp-Massey algorithm will recover the unknown state of a
simple n-bit LFSR, and its feedback polynomial, with just 2n known
bits.)"

Jul 18 '05 #7
> What's Random.random(s elf) supposed to do?

Generate a random number using the Mersenne Twister random number generator.
ciphertxt = md5.new(plaintx t).digest()
I think you mean update.


Perhaps yes, perhaps no. Certainly the digest of both are dependant on
two inputs (the current internal state of MD5 and the random number
generated by MT). However, unless you can store the series of updates
to MD5, then getstate() followed by setstate() would not be sufficient
to get the same series of random numbers. This is also a "possible
issue" shared by the double-sha algorithm.
But doesn't support all the operations specified in the Random API.
What's up with that?
That is what the subclass was for: class MD5Random(Rando m):


- Josiah
Jul 18 '05 #8
Josiah Carlson <jc******@nospa m.uci.edu> writes:
What's Random.random(s elf) supposed to do?


Generate a random number using the Mersenne Twister random number
generator.


According to the Random doc, Random.random() doesn't take an argument,
so I'm asking what the "self" is for.
ciphertxt = md5.new(plaintx t).digest()

I think you mean update.


Perhaps yes, perhaps no. Certainly the digest of both are dependant
on two inputs (the current internal state of MD5 and the random number
generated by MT). However, unless you can store the series of updates
to MD5, then getstate() followed by setstate() would not be sufficient
to get the same series of random numbers. This is also a "possible
issue" shared by the double-sha algorithm.


Without the update, all that's happening is you're reading an MT output
and hashing it, so there's no secret state without some security assumption
on MT. And the only security assumption I'm willing to make about MT
is that it's completely insecure ;-).
But doesn't support all the operations specified in the Random API.
What's up with that?


That is what the subclass was for:
> class MD5Random(Rando m):


But the methods in the superclass don't do anything to manage the state
of subclass instances.
Jul 18 '05 #9
>>>What's Random.random(s elf) supposed to do?

Generate a random number using the Mersenne Twister random number
generator.


According to the Random doc, Random.random() doesn't take an argument,
so I'm asking what the "self" is for.

But doesn't support all the operations specified in the Random API.
What's up with that?


That is what the subclass was for:
> class MD5Random(Rando m):
But the methods in the superclass don't do anything to manage the state
of subclass instances.


Perhaps you should spend some more time learning how Python classes work.

Random is a class.
Random.random is an unbound class method.
Random.random(s elf) calls the unbound class method 'Random.random' with
the first argument being the MD5Random instance.

You should satisfy yourself that the below is correct behavior for
Python classes.
class A: .... def blah(self):
.... print self.__class__
.... class B(A): .... def blah(self):
.... print "I'm in B, calling A"
.... A.blah(self)
.... a = A()
a.blah() __main__.A b = B()
b.blah() I'm in B, calling A
__main__.B

Can you figure out why the following is also correct?
class C: .... def blah(self):
.... print "I'm in C, calling A"
.... A.blah(self)
.... c = C()
c.blah()

I'm in C, calling A
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 4, in blah
TypeError: unbound method blah() must be called with A instance as first
argument (got C instance instead)
Perhaps now you understand why he can use Random.random(s elf) in the
earlier version, and Random.getrandb its(self, 128) in the current version.

- Josiah
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
1937
by: Florian Lindner | last post by:
Hello, I try to compute SHA hashes for different files: for root, dirs, files in os.walk(sys.argv): for file in files: path = os.path.join(root, file) print path f = open(path) sha = sha.new(f.read())
4
6763
by: Ravi Singh (UCSD) | last post by:
Hello all I am trying to compare the SHA 256 algorithm as implemented by Christophe Devine and on using the .NET "SHA256Managed" but they do not give me similar hashes. Here is the Christophes implementation .
2
8475
by: Augusto Rocha | last post by:
I have stored a password in a data base using the function HashPasswordForStoringInConfigFile, with the SHA method. How do I decrypt it? Augusto
26
5473
by: David Garamond | last post by:
I read that the password hash in pg_shadow is salted with username. Is this still the case? If so, since probably 99% of all PostgreSQL has "postgres" as the superuser name, wouldn't it be better to use standard Unix/Apache MD5 hash instead? -- dave ---------------------------(end of broadcast)---------------------------
5
2648
by: EP | last post by:
This inquiry may either turn out to be about the suitability of the SHA-1 (160 bit digest) for file identification, the sha function in Python ... or about some error in my script. Any insight appreciated in advance. I am trying to reduce duplicate files in storage at home - I have a large number files (e.g. MP3s) which have been stored on...
1
6497
Atli
by: Atli | last post by:
Hi. I've been looking for a function similar to MySQL's sha() function in Postgre and I can't seem to find anything. The reason is that I have a system that is written for MySQL and I've been asked to make it compatible with Postgre. Thanks.
25
2628
by: tooru honda | last post by:
Hi, I have read the source code of the built-in random module, random.py. After also reading Wiki article on Knuth Shuffle algorithm, I wonder if the shuffle method implemented in random.py produces results with modulo bias. The reasoning is as follows: Because the method random() only produces finitely many possible results, we get...
18
2419
by: Tomás Ó hÉilidhe | last post by:
(SHA-1 is a cryptographic hash function. For info on what SHA-1 is: http://en.wikipedia.org/wiki/SHA-1) I'm writing fullportable C89 code that needs to make use of the SHA-1 algorithm. Does anyone know if there's been a fully-portable C89 implementation of the SHA-1 algorithm? If not, could someone please point me to a very good...
3
5502
by: benaroia | last post by:
I'm creating a simple hashing program and wish to use the SHA hash in it. I was wondering if there was an easy way to do that. In Ruby you can just say: require "Digest" sha_hex = Digest::SHA512.hexdigest("some string") print sha_hex Is there a comparable way to do this in c++? I would like to have the speed it offers instead of having to...
6
1748
by: localp | last post by:
i am creating a shopping cart for a project.. i have come across few problems where i couldn't find a satisfiable solution . the problem in detail;; 1. in this web site there are several pages , that contains products. and when a user clicks on a product (clicks on a checkbox) the product should go to a shopping cart which is on another...
0
8144
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8301
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7894
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
8169
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5670
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3803
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3820
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1400
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1132
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.