Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:

unique_id = itertools.count(1234567890)

Sweet!

I really must make itertools second-nature. I always forget it.

Beware that count() output (like enumerate) is always <= sys.maxint.

It raises an exception if it overflows. IMO this is a bug.

def unique_id():

return os.urandom(10).encode('hex')

Any time I see something using a random number to generate IDs, I worry

about collisions. Am I being paranoid? (But even paranoids write code

with bugs...)

Well, the idea is to make the string long enough (I shouldn't have

picked 10 bytes) to make the probability of collisions acceptably low.

The probability is about exp(-k**2/(2*N)) where k is the number of

id's you use and N is the number of possible ID's. So with

os.urandom(20), if you use 1 billion (= approx 2**30) id's,

the probability is about exp(-(2**60 / 2*2**160)) = 1/exp(2**101)

which is extremely small. Using random strings is probably safer

from collisions than maintain keep distinct initial counters

across multiple runs of a program, on multiple computers, etc.

The classic example is ethernet cards. Each one is supposed to have a

unique hardware 48-bit MAC address, with routers etc. relying on the

uniqueness. Some organization is supposed to assign a unique code to

each manufacturer, and then the manufacturer uses that code as a

prefix and assigns addresses in sequence within that space. That

works fine until sales go up, manufacturers start opening multiple

factories operating out of the same space, low-rent manufacturers

start making up their own codes, etc. So companies that buy large

numbers of cards often find multiple cards with the same address,

causing various hassles. If they just assigned 128-bit MAC addresses

at random it's very unlikely that there would ever be a collision.

Here's something which is a little less predictable than a straight

counter:

It's still very easy to generate valid id's from that, or guess the

next one with non-negligible probability. IMO it's almost never worth

messing with a "little less predictable". If you're trying to prevent

an actual opponent from guessing something, use full-strength

cryptography.

You could try something like this:

import sha, os, itertools

radix = 2**32 # keep this == 2**(some even number <= 80)

secret_key = os.urandom(20)

def prp(key, n):

# pseudo-random permutation (8-round Feistel network)

# transform n to a unique number < radix**2

assert 0 <= n < radix**2

def F(i,x):

return int(sha.new('%s,%x,%x'%(key,i,x)).hexdigest(), 16) % radix

a,b = divmod(n, radix)

for i in xrange(8):

a ^= F(i,b)

a,b = b,a

return radix*a + b

unique_id = (prp(secret_key, i) for i in itertools.count())

It should be pretty secure and (unless I made an error) is a

permutation from [0:radix**2] to [0:radix**2], similar to DES.

It's invertible if you know the secret key (I'll leave that as an

exercise). 8 rounds is probably overkill for radix 2**32.