On Tue, 21 Oct 2003 06:53:13 -0700, Jovo Mirkovic <yo**@sezampro.yu>

wrote:

Hi,

I have to make a program which will generate 100,000 different ID

numbers (for example 2345-9341-0903-3432-3432 ...) It must be really

different, I meen it can not be a similar (like

2345-9341-0903-3432-343<b>1</b>) Does exist some algorithm for that...

Thanks.

Consider the maximum number of IDs - 100000. That's 6 decimal digits.

Your format indicated that you want a string of four-digit numbers

separated by "-" signs. (Even if that's not the case, I'll use it to

demonstrate the principal.)

So let's say the output format is:

DDDD-DDDD-DDDD-DDDD-DDDD

We could convert our integer (the counter going from 1 to 100000) into

an ASCII string with leading zeros, and cut each character and place

it into a particular position in the output string.

e.g.

D1D2-D34D-DDDD-D5DD-D6DD

(The numbers indicate the digit positions on the input string)

The "D" digits could therefore be filled with random data. See that?

Ok, you can actually do something more useful than simply using random

data. You can use that extra bandwidth for testing the validity of the

original number.

In the example I gave, there are 14 digit spaces that we can make use

of (the D positions without the numbers in).

Taking our original number, we can then produce a hash (SHA, MD5 etc.)

of that value. Then convert it to an ASCII string (padding or cutting

the end if necessary) so that it has 14 digits. We can then insert

those digits into the empty positions. When we parse the number (and

extract the number digits) we can then apply the same hash and test

that the hash matches.

As an additional note: 14 digits for a "checksum" is a little extreme.

If we reduce that number to 6, we are left with 8 digits that are now

redundant. We can reserve those for future expansion (supporting more

than 100000 IDs). The most useful thing we can do is to introduce a

"format version code" - a single digit number that will appear in the

first position.

This leads to a well-known method of allocating IDs:

1) We need to identify the version of the ID format

2) We need to encode the original numeric value

3) We need to encode a checksum of that value

e.g.:

Example: 1162-2341-2763-1532-1623

Version: 1... .... .... .... ....

Number: .1.2 .34. .... .5.. .6..

Hash: ..6. 2..1 27.. 1... ....

Unused: .... .... ..63 ..32 1.23

We could parse that number and immediately see that it is in "format

1". We therefore know where the digits for the number and the digits

for the hash are.

i.e. Format=1 Number=123456 Hash=621271 (The rest are unused - random)

Probably more than you're after, but if you're thinking about the

format, you might as well think about its future too...

Rgds,