Generating a unique identifier

Steven D'Aprano

I have an application that will be producing many instances, using them
for a while, then tossing them away, and I want each one to have a unique
identifier that won't be re-used for the lifetime of the Python session.

I can't use the id() of the object, because that is only guaranteed to be
unique during the lifetime of the object.

For my application, it doesn't matter if the ids are predictable, so I
could do something as simple as this:

def unique_id():
n = 1234567890
while True:
yield n
n += 1

unique_id = unique_id()

while Application_Is_Running:
make_an_object(id=unique_id())
do_stuff_with_objects()
delete_some_of_them()

which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.

--
Steven.

Sep 7 '07 #1

Subscribe Post Reply

7513

Will Maier

On Fri, Sep 07, 2007 at 12:03:23PM -0000, Steven D'Aprano wrote:
[...]

which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.

2.5 includes the uuid module for RFC 4122 universally-unique IDs:

http://docs.python.org/lib/module-uuid.html

--

[Will Maier]-----------------[wi*******@ml1.net|http://www.lfod.us/]

Sep 7 '07 #2

Marc 'BlackJack' Rintsch

On Fri, 07 Sep 2007 12:03:23 +0000, Steven D'Aprano wrote:

I have an application that will be producing many instances, using them
for a while, then tossing them away, and I want each one to have a unique
identifier that won't be re-used for the lifetime of the Python session.

I can't use the id() of the object, because that is only guaranteed to be
unique during the lifetime of the object.

For my application, it doesn't matter if the ids are predictable, so I
could do something as simple as this:

def unique_id():
n = 1234567890
while True:
yield n
n += 1

unique_id = unique_id()

while Application_Is_Running:
make_an_object(id=unique_id())
do_stuff_with_objects()
delete_some_of_them()

which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed.

For that easy solution you can use `itertools.count()`.

Ciao,
Marc 'BlackJack' Rintsch

Sep 7 '07 #3

kyosohma

On Sep 7, 7:03 am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.auwrote:

I have an application that will be producing many instances, using them
for a while, then tossing them away, and I want each one to have a unique
identifier that won't be re-used for the lifetime of the Python session.

I can't use the id() of the object, because that is only guaranteed to be
unique during the lifetime of the object.

For my application, it doesn't matter if the ids are predictable, so I
could do something as simple as this:

def unique_id():
n = 1234567890
while True:
yield n
n += 1

unique_id = unique_id()

while Application_Is_Running:
make_an_object(id=unique_id())
do_stuff_with_objects()
delete_some_of_them()

which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.

--
Steven.

You could always use the md5 module along with time.time() to generate
a unique id.

Mike

Sep 7 '07 #4

Wildemar Wildenburger

Will Maier wrote:

On Fri, Sep 07, 2007 at 12:03:23PM -0000, Steven D'Aprano wrote:
[...]
>which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.

2.5 includes the uuid module for RFC 4122 universally-unique IDs:

http://docs.python.org/lib/module-uuid.html

Jesus Christ! What in all the world *doesn't* Python do already?!

Can't we now just write up a module for the standard lib that takes care
of writing all the other code as well and just forget about getting our
hands dirty with programming altogether?

/W
(just in case: ;)!)

Sep 7 '07 #5

Paul Rubin

Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:

def unique_id():
n = 1234567890
while True:
yield n
n += 1

unique_id = itertools.count(1234567890)

which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.

def unique_id():
return os.urandom(10).encode('hex')

Sep 7 '07 #6

Paul Rubin

Paul Rubin <http://ph****@NOSPAM.invalidwrites:

def unique_id():
return os.urandom(10).encode('hex')

Sorry, make that 32 or 40 instead of 10, if the number of id's is large,
to make birthday collisions unlikely.

If you don't want the id's to be that large, you can implement a
Feistel cipher using md5 or sha as the round function pretty
straightforwardly, then just feed successive integers through it.
That also guarantees uniqueness, at least within one run of the
program. I have some sample code around for that, let me know if you
need it.

Sep 7 '07 #7

Ben Finney

Will Maier <wi*******@ml1.netwrites:

On Fri, Sep 07, 2007 at 12:03:23PM -0000, Steven D'Aprano wrote:
[...]
which is easy enough, but I thought I'd check if there was an
existing solution in the standard library that I missed. Also, for
other applications, I might want them to be rather less
predictable.

2.5 includes the uuid module for RFC 4122 universally-unique IDs:

http://docs.python.org/lib/module-uuid.html

I second this recommendation. If you want arbitrary unique IDs that
are not a function of the data they identify, the simplest solution is
a monotonically-increasing serial number. If you want more than that,
you might as well go straight to the standard UUIDs.

Even if you're not using Python 2.5, grab the external 'python-uuid'
package for your operating system and use that. If you use the options
for including random elements, the generated UUIDs will even satisfy
your "unpredictable" requirement.

--
\ "This sentence contradicts itself -- no actually it doesn't." |
`\ -- Douglas Hofstadter |
_o__) |
Ben Finney

Sep 7 '07 #8

Steven D'Aprano

On Fri, 07 Sep 2007 08:42:45 -0700, Paul Rubin wrote:

Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:
>def unique_id():
n = 1234567890
while True:
yield n
n += 1

unique_id = itertools.count(1234567890)

Sweet!

I really must make itertools second-nature. I always forget it.

>which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.

def unique_id():
return os.urandom(10).encode('hex')

Any time I see something using a random number to generate IDs, I worry
about collisions. Am I being paranoid? (But even paranoids write code
with bugs...)
Here's something which is a little less predictable than a straight
counter:

def unpredictable_counter(n=12345678):
while True:
n += random.randint(1, 68)
yield n

def more_unpredictable_counter(n=1234567):
uc = unpredictable_counter(n)
pool = []
while True:
if not pool:
pool = [None]*99
for i in range(99):
pool[i] = uc.next()
random.shuffle(pool)
yield pool.pop()

--
Steven.

Sep 8 '07 #9

Paul Rubin

Ben Finney <bi****************@benfinney.id.auwrites:

http://docs.python.org/lib/module-uuid.html
I second this recommendation. If you want arbitrary unique IDs that
are not a function of the data they identify, the simplest solution is
a monotonically-increasing serial number. If you want more than that,
you might as well go straight to the standard UUIDs.

The standard UUID's as described in that url don't make any serious
attempt to be unguessable. It's better to use a string from os.urandom().

To Stephen: double oops, I was thinking of hex digits rather than bytes
when I suggested reading 32 (length of an md5 hash) or 40 (length of
an sha1 hash) from os.urandom. That should have said 16 or 20 bytes.

If k is the number of unique id's you'll actually use, and N is the
number of possible random uid's (so for 16 bytes, N=2**128 since
16 bytes is 128 bits), the probability of a collision occuring is
approximately exp(-k**2 / (2*N)), assuming k is small compared
with sqrt(N).

Sep 8 '07 #10

Paul Rubin

Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:

unique_id = itertools.count(1234567890)

Sweet!

I really must make itertools second-nature. I always forget it.

Beware that count() output (like enumerate) is always <= sys.maxint.
It raises an exception if it overflows. IMO this is a bug.

def unique_id():
return os.urandom(10).encode('hex')

Any time I see something using a random number to generate IDs, I worry
about collisions. Am I being paranoid? (But even paranoids write code
with bugs...)

Well, the idea is to make the string long enough (I shouldn't have
picked 10 bytes) to make the probability of collisions acceptably low.
The probability is about exp(-k**2/(2*N)) where k is the number of
id's you use and N is the number of possible ID's. So with
os.urandom(20), if you use 1 billion (= approx 2**30) id's,
the probability is about exp(-(2**60 / 2*2**160)) = 1/exp(2**101)
which is extremely small. Using random strings is probably safer
from collisions than maintain keep distinct initial counters
across multiple runs of a program, on multiple computers, etc.

The classic example is ethernet cards. Each one is supposed to have a
unique hardware 48-bit MAC address, with routers etc. relying on the
uniqueness. Some organization is supposed to assign a unique code to
each manufacturer, and then the manufacturer uses that code as a
prefix and assigns addresses in sequence within that space. That
works fine until sales go up, manufacturers start opening multiple
factories operating out of the same space, low-rent manufacturers
start making up their own codes, etc. So companies that buy large
numbers of cards often find multiple cards with the same address,
causing various hassles. If they just assigned 128-bit MAC addresses
at random it's very unlikely that there would ever be a collision.

Here's something which is a little less predictable than a straight
counter:

It's still very easy to generate valid id's from that, or guess the
next one with non-negligible probability. IMO it's almost never worth
messing with a "little less predictable". If you're trying to prevent
an actual opponent from guessing something, use full-strength
cryptography.

You could try something like this:

import sha, os, itertools

radix = 2**32 # keep this == 2**(some even number <= 80)
secret_key = os.urandom(20)

def prp(key, n):
# pseudo-random permutation (8-round Feistel network)
# transform n to a unique number < radix**2
assert 0 <= n < radix**2

def F(i,x):
return int(sha.new('%s,%x,%x'%(key,i,x)).hexdigest(), 16) % radix

a,b = divmod(n, radix)
for i in xrange(8):
a ^= F(i,b)
a,b = b,a
return radix*a + b

unique_id = (prp(secret_key, i) for i in itertools.count())

It should be pretty secure and (unless I made an error) is a
permutation from [0:radix**2] to [0:radix**2], similar to DES.
It's invertible if you know the secret key (I'll leave that as an
exercise). 8 rounds is probably overkill for radix 2**32.

Sep 8 '07 #11

Steven D'Aprano

On Fri, 07 Sep 2007 08:47:58 -0700, Paul Rubin wrote:

Paul Rubin <http://ph****@NOSPAM.invalidwrites:
>def unique_id():
return os.urandom(10).encode('hex')

Sorry, make that 32 or 40 instead of 10, if the number of id's is large,
to make birthday collisions unlikely.

I did a small empirical test, and with 16 million ids, I found no
collisions.

However, I did find that trying to dispose of a set of 16 million short
strings caused my Python session to lock up for twenty minutes until I
got fed up and killed the process. Should garbage-collecting 16 million
strings really take 20+ minutes?

If you don't want the id's to be that large, you can implement a Feistel
cipher using md5 or sha as the round function pretty straightforwardly,
then just feed successive integers through it. That also guarantees
uniqueness, at least within one run of the program. I have some sample
code around for that, let me know if you need it.

I'm not sure that I need it, but I would certainly be curious to see it.
Thanks,
--
Steven.

Sep 8 '07 #12

Steven D'Aprano

On Fri, 07 Sep 2007 19:17:44 -0700, Paul Rubin wrote:

>Here's something which is a little less predictable than a straight
counter:

It's still very easy to generate valid id's from that, or guess the next
one with non-negligible probability.

Well, in my specific application, the only consequences of guessing a
valid id is that you get to congratulate yourself for guessing a valid
id :)

As it turned out, I decided that since the only reason I had for avoiding
consecutive ids was that they didn't look random, and that this would not
really matter because no human being (including myself) was actually
going to be looking at them except during debugging, I used the
intertools.count() solution.
Thanks to all who replied.

--
Steven.

Sep 8 '07 #13

Hrvoje Niksic

Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:

Should garbage-collecting 16 million strings really take 20+
minutes?

It shouldn't. For testing purposes I've created a set of 16 milion
strings like this:

s = set()
for n in xrange(16000000):
s.add('somerandomprefix' + str(n)) # prefix makes the strings a bit larger

It takes maybe about 20 seconds to create the set. Quitting Python
takes 4-5 seconds. This is stock Python 2.5.1.

Sep 8 '07 #14

stephen.lewitowski

On Sep 7, 1:03 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.auwrote:

I have an application that will be producing many instances, using them
for a while, then tossing them away, and I want each one to have a unique
identifier that won't be re-used for the lifetime of the Python session.

I can't use the id() of the object, because that is only guaranteed to be
unique during the lifetime of the object.

For my application, it doesn't matter if the ids are predictable, so I
could do something as simple as this:

def unique_id():
n = 1234567890
while True:
yield n
n += 1

unique_id = unique_id()

while Application_Is_Running:
make_an_object(id=unique_id())
do_stuff_with_objects()
delete_some_of_them()

which is easy enough, but I thought I'd check if there was an existing
solution in the standard library that I missed. Also, for other
applications, I might want them to be rather less predictable.

--
Steven.

This is very simple. Use a class variable that increments every time a
new object is created. If you need to use it in a number of different
types of object then you could use the code below in a metaclass and
tag all of your classes with the metaclass. Unless you are making
billions of objects then i think that this should suffice.

class MyObj:
id_count = 0
def __init__(self):
self.id = self.id_count
MyObj.id_count += 1
print self.id, MyObj.id_count

MyObj()
MyObj()

Sep 10 '07 #15

Similar topics

Generating an Index

by: Mary | last post by:

Hi, I need some assistance with a query. To be honest, I'm not even sure it can be done. I'll try to keep the information limited to only what's relevant to what I have and what I am trying to...

Microsoft Access / VBA

Unique, persistent numeric identifiers for TreeNodes?

by: Kilroy Programmer | last post by:

Is there a way to store a unique numeric identifier (say, for example, an int) into a TreeNode, so that when the TreeNode is checked (since CheckBoxes is enabled) the eventhandler AfterCheck() can...

C# / C Sharp

Generating a number...simple but

by: sparks | last post by:

ok I was generating a combo of things into a unique identifier. Private Sub generatenumber() Dim db As DAO.Database Dim rs As DAO.Recordset Dim StudentNumber As Integer Set db = CurrentDb Set...

Microsoft Access / VBA

Unique Identifier

by: ba.hons | last post by:

Hello all, Was wondering if anyone could provide some info on what could be a possible solution to a problem am having. I have to generate a Unique Identifier in C# which I will use to assign...

C# / C Sharp

unique identifier problem MS ACCESS project

by: Ken | last post by:

Hi, I have a form whose control source is a view from SQL server 2005 database. The view has a primary key that is a unique identifier field with keys generated by newid() function from SQL server...

Microsoft Access / VBA

Unique identifier in every treenode?

by: Rob Stevens | last post by:

Is there some sort of unique identifier in every treenode that is consistent? I was looking at the handle of every treenode, but it appears that the handle changes everytime the tree is built. ...

C# / C Sharp

Generating GUID tags with php

by: Gary Hasler | last post by:

Does anyone have any suggestions on generating GUID (Globally Unique Identifier) tags with php? They would need to be in the format 4402bd8a-cd51-40ea-99d7-b510e89e344b Specifically this is for...

PHP

Truly unique Identifier for a machine.

by: Mufasa | last post by:

I'm looking for a way to get a truly unique identifier for a machine for our client software. I'd like to have it so that there's little or no setup by the end user. (We set up the machines and...

C# / C Sharp

creating unique message identifiers

by: mliptak | last post by:

I'm trying to implement logging in my application, so that each log message has its unique identifier, e.g. log(identifier, text) What I want to achieve is that the compiler screams if the log()...

C / C++

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware