471,349 Members | 1,271 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,349 software developers and data experts.

Generating a unique identifier

I have an application that will be producing many instances, using them
for a while, then tossing them away, and I want each one to have a unique
identifier that won't be re-used for the lifetime of the Python session.

I can't use the id() of the object, because that is only guaranteed to be
unique during the lifetime of the object.

For my application, it doesn't matter if the ids are predictable, so I
could do something as simple as this:

def unique_id():
n = 1234567890
while True:
yield n
n += 1

unique_id = unique_id()

while Application_Is_Running:
make_an_object(id=unique_id())
do_stuff_with_objects()
delete_some_of_them()

which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.

--
Steven.
Sep 7 '07 #1
14 7411
On Fri, Sep 07, 2007 at 12:03:23PM -0000, Steven D'Aprano wrote:
[...]
which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.
2.5 includes the uuid module for RFC 4122 universally-unique IDs:

http://docs.python.org/lib/module-uuid.html

--

[Will Maier]-----------------[wi*******@ml1.net|http://www.lfod.us/]
Sep 7 '07 #2
On Fri, 07 Sep 2007 12:03:23 +0000, Steven D'Aprano wrote:
I have an application that will be producing many instances, using them
for a while, then tossing them away, and I want each one to have a unique
identifier that won't be re-used for the lifetime of the Python session.

I can't use the id() of the object, because that is only guaranteed to be
unique during the lifetime of the object.

For my application, it doesn't matter if the ids are predictable, so I
could do something as simple as this:

def unique_id():
n = 1234567890
while True:
yield n
n += 1

unique_id = unique_id()

while Application_Is_Running:
make_an_object(id=unique_id())
do_stuff_with_objects()
delete_some_of_them()

which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed.
For that easy solution you can use `itertools.count()`.

Ciao,
Marc 'BlackJack' Rintsch
Sep 7 '07 #3
On Sep 7, 7:03 am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.auwrote:
I have an application that will be producing many instances, using them
for a while, then tossing them away, and I want each one to have a unique
identifier that won't be re-used for the lifetime of the Python session.

I can't use the id() of the object, because that is only guaranteed to be
unique during the lifetime of the object.

For my application, it doesn't matter if the ids are predictable, so I
could do something as simple as this:

def unique_id():
n = 1234567890
while True:
yield n
n += 1

unique_id = unique_id()

while Application_Is_Running:
make_an_object(id=unique_id())
do_stuff_with_objects()
delete_some_of_them()

which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.

--
Steven.
You could always use the md5 module along with time.time() to generate
a unique id.

Mike

Sep 7 '07 #4
Will Maier wrote:
On Fri, Sep 07, 2007 at 12:03:23PM -0000, Steven D'Aprano wrote:
[...]
>which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.

2.5 includes the uuid module for RFC 4122 universally-unique IDs:

http://docs.python.org/lib/module-uuid.html
Jesus Christ! What in all the world *doesn't* Python do already?!

Can't we now just write up a module for the standard lib that takes care
of writing all the other code as well and just forget about getting our
hands dirty with programming altogether?

/W
(just in case: ;)!)
Sep 7 '07 #5
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:
def unique_id():
n = 1234567890
while True:
yield n
n += 1
unique_id = itertools.count(1234567890)
which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.
def unique_id():
return os.urandom(10).encode('hex')
Sep 7 '07 #6
Paul Rubin <http://ph****@NOSPAM.invalidwrites:
def unique_id():
return os.urandom(10).encode('hex')
Sorry, make that 32 or 40 instead of 10, if the number of id's is large,
to make birthday collisions unlikely.

If you don't want the id's to be that large, you can implement a
Feistel cipher using md5 or sha as the round function pretty
straightforwardly, then just feed successive integers through it.
That also guarantees uniqueness, at least within one run of the
program. I have some sample code around for that, let me know if you
need it.
Sep 7 '07 #7
Will Maier <wi*******@ml1.netwrites:
On Fri, Sep 07, 2007 at 12:03:23PM -0000, Steven D'Aprano wrote:
[...]
which is easy enough, but I thought I'd check if there was an
existing solution in the standard library that I missed. Also, for
other applications, I might want them to be rather less
predictable.

2.5 includes the uuid module for RFC 4122 universally-unique IDs:

http://docs.python.org/lib/module-uuid.html
I second this recommendation. If you want arbitrary unique IDs that
are not a function of the data they identify, the simplest solution is
a monotonically-increasing serial number. If you want more than that,
you might as well go straight to the standard UUIDs.

Even if you're not using Python 2.5, grab the external 'python-uuid'
package for your operating system and use that. If you use the options
for including random elements, the generated UUIDs will even satisfy
your "unpredictable" requirement.

--
\ "This sentence contradicts itself -- no actually it doesn't." |
`\ -- Douglas Hofstadter |
_o__) |
Ben Finney
Sep 7 '07 #8
On Fri, 07 Sep 2007 08:42:45 -0700, Paul Rubin wrote:
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:
>def unique_id():
n = 1234567890
while True:
yield n
n += 1

unique_id = itertools.count(1234567890)
Sweet!

I really must make itertools second-nature. I always forget it.

>which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.

def unique_id():
return os.urandom(10).encode('hex')
Any time I see something using a random number to generate IDs, I worry
about collisions. Am I being paranoid? (But even paranoids write code
with bugs...)
Here's something which is a little less predictable than a straight
counter:

def unpredictable_counter(n=12345678):
while True:
n += random.randint(1, 68)
yield n

def more_unpredictable_counter(n=1234567):
uc = unpredictable_counter(n)
pool = []
while True:
if not pool:
pool = [None]*99
for i in range(99):
pool[i] = uc.next()
random.shuffle(pool)
yield pool.pop()


--
Steven.
Sep 8 '07 #9
Ben Finney <bi****************@benfinney.id.auwrites:
http://docs.python.org/lib/module-uuid.html
I second this recommendation. If you want arbitrary unique IDs that
are not a function of the data they identify, the simplest solution is
a monotonically-increasing serial number. If you want more than that,
you might as well go straight to the standard UUIDs.
The standard UUID's as described in that url don't make any serious
attempt to be unguessable. It's better to use a string from os.urandom().

To Stephen: double oops, I was thinking of hex digits rather than bytes
when I suggested reading 32 (length of an md5 hash) or 40 (length of
an sha1 hash) from os.urandom. That should have said 16 or 20 bytes.

If k is the number of unique id's you'll actually use, and N is the
number of possible random uid's (so for 16 bytes, N=2**128 since
16 bytes is 128 bits), the probability of a collision occuring is
approximately exp(-k**2 / (2*N)), assuming k is small compared
with sqrt(N).
Sep 8 '07 #10
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:
unique_id = itertools.count(1234567890)

Sweet!

I really must make itertools second-nature. I always forget it.
Beware that count() output (like enumerate) is always <= sys.maxint.
It raises an exception if it overflows. IMO this is a bug.
def unique_id():
return os.urandom(10).encode('hex')

Any time I see something using a random number to generate IDs, I worry
about collisions. Am I being paranoid? (But even paranoids write code
with bugs...)
Well, the idea is to make the string long enough (I shouldn't have
picked 10 bytes) to make the probability of collisions acceptably low.
The probability is about exp(-k**2/(2*N)) where k is the number of
id's you use and N is the number of possible ID's. So with
os.urandom(20), if you use 1 billion (= approx 2**30) id's,
the probability is about exp(-(2**60 / 2*2**160)) = 1/exp(2**101)
which is extremely small. Using random strings is probably safer
from collisions than maintain keep distinct initial counters
across multiple runs of a program, on multiple computers, etc.

The classic example is ethernet cards. Each one is supposed to have a
unique hardware 48-bit MAC address, with routers etc. relying on the
uniqueness. Some organization is supposed to assign a unique code to
each manufacturer, and then the manufacturer uses that code as a
prefix and assigns addresses in sequence within that space. That
works fine until sales go up, manufacturers start opening multiple
factories operating out of the same space, low-rent manufacturers
start making up their own codes, etc. So companies that buy large
numbers of cards often find multiple cards with the same address,
causing various hassles. If they just assigned 128-bit MAC addresses
at random it's very unlikely that there would ever be a collision.
Here's something which is a little less predictable than a straight
counter:
It's still very easy to generate valid id's from that, or guess the
next one with non-negligible probability. IMO it's almost never worth
messing with a "little less predictable". If you're trying to prevent
an actual opponent from guessing something, use full-strength
cryptography.

You could try something like this:

import sha, os, itertools

radix = 2**32 # keep this == 2**(some even number <= 80)
secret_key = os.urandom(20)

def prp(key, n):
# pseudo-random permutation (8-round Feistel network)
# transform n to a unique number < radix**2
assert 0 <= n < radix**2

def F(i,x):
return int(sha.new('%s,%x,%x'%(key,i,x)).hexdigest(), 16) % radix

a,b = divmod(n, radix)
for i in xrange(8):
a ^= F(i,b)
a,b = b,a
return radix*a + b

unique_id = (prp(secret_key, i) for i in itertools.count())

It should be pretty secure and (unless I made an error) is a
permutation from [0:radix**2] to [0:radix**2], similar to DES.
It's invertible if you know the secret key (I'll leave that as an
exercise). 8 rounds is probably overkill for radix 2**32.
Sep 8 '07 #11
On Fri, 07 Sep 2007 08:47:58 -0700, Paul Rubin wrote:
Paul Rubin <http://ph****@NOSPAM.invalidwrites:
>def unique_id():
return os.urandom(10).encode('hex')

Sorry, make that 32 or 40 instead of 10, if the number of id's is large,
to make birthday collisions unlikely.
I did a small empirical test, and with 16 million ids, I found no
collisions.

However, I did find that trying to dispose of a set of 16 million short
strings caused my Python session to lock up for twenty minutes until I
got fed up and killed the process. Should garbage-collecting 16 million
strings really take 20+ minutes?

If you don't want the id's to be that large, you can implement a Feistel
cipher using md5 or sha as the round function pretty straightforwardly,
then just feed successive integers through it. That also guarantees
uniqueness, at least within one run of the program. I have some sample
code around for that, let me know if you need it.
I'm not sure that I need it, but I would certainly be curious to see it.
Thanks,
--
Steven.
Sep 8 '07 #12
On Fri, 07 Sep 2007 19:17:44 -0700, Paul Rubin wrote:
>Here's something which is a little less predictable than a straight
counter:

It's still very easy to generate valid id's from that, or guess the next
one with non-negligible probability.
Well, in my specific application, the only consequences of guessing a
valid id is that you get to congratulate yourself for guessing a valid
id :)

As it turned out, I decided that since the only reason I had for avoiding
consecutive ids was that they didn't look random, and that this would not
really matter because no human being (including myself) was actually
going to be looking at them except during debugging, I used the
intertools.count() solution.
Thanks to all who replied.

--
Steven.
Sep 8 '07 #13
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:
Should garbage-collecting 16 million strings really take 20+
minutes?
It shouldn't. For testing purposes I've created a set of 16 milion
strings like this:

s = set()
for n in xrange(16000000):
s.add('somerandomprefix' + str(n)) # prefix makes the strings a bit larger

It takes maybe about 20 seconds to create the set. Quitting Python
takes 4-5 seconds. This is stock Python 2.5.1.
Sep 8 '07 #14
On Sep 7, 1:03 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.auwrote:
I have an application that will be producing many instances, using them
for a while, then tossing them away, and I want each one to have a unique
identifier that won't be re-used for the lifetime of the Python session.

I can't use the id() of the object, because that is only guaranteed to be
unique during the lifetime of the object.

For my application, it doesn't matter if the ids are predictable, so I
could do something as simple as this:

def unique_id():
n = 1234567890
while True:
yield n
n += 1

unique_id = unique_id()

while Application_Is_Running:
make_an_object(id=unique_id())
do_stuff_with_objects()
delete_some_of_them()

which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.

--
Steven.
This is very simple. Use a class variable that increments every time a
new object is created. If you need to use it in a number of different
types of object then you could use the code below in a metaclass and
tag all of your classes with the metaclass. Unless you are making
billions of objects then i think that this should suffice.

class MyObj:
id_count = 0
def __init__(self):
self.id = self.id_count
MyObj.id_count += 1
print self.id, MyObj.id_count

MyObj()
MyObj()

Sep 10 '07 #15

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by Mary | last post: by
3 posts views Thread by Kilroy Programmer | last post: by
4 posts views Thread by ba.hons | last post: by
4 posts views Thread by Rob Stevens | last post: by
2 posts views Thread by Gary Hasler | last post: by
4 posts views Thread by Mufasa | last post: by
13 posts views Thread by mliptak | last post: by
reply views Thread by XIAOLAOHU | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.