By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,772 Members | 935 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,772 IT Pros & Developers. It's quick & easy.

encryption with python

P: n/a
Hi!

I was wondering if someone can recommend a good encryption algorithm
written in python. My goal is to combine two different numbers and
encrypt them to create a new number that cann't be traced back to the
originals.

It would be great if there exists a library already written to do this,
and if there is, can somebody please point me to it??

Thanks in advance,
J

Sep 7 '05 #1
Share this Question
Share on Google+
34 Replies


P: n/a
Aloha,

jl***@fau.edu wrote:
I was wondering if someone can recommend a good encryption algorithm
written in python.
It would be great if there exists a library already written to do this,
and if there is, can somebody please point me to it??


M2Crypto, interface to OpenSSL
http://sandbox.rulemaker.net/ngps/m2

Wishing a happy day
LOBI
Sep 7 '05 #2

P: n/a
In article <11**********************@g44g2000cwa.googlegroups .com>,
jl***@fau.edu wrote:
Hi!

I was wondering if someone can recommend a good encryption algorithm
written in python. My goal is to combine two different numbers and
encrypt them to create a new number that cann't be traced back to the
originals.

It would be great if there exists a library already written to do this,
and if there is, can somebody please point me to it??


I recommend you investigate PyCrypto:
http://www.amk.ca/python/code/crypto
http://sourceforge.net/projects/pycrypto

Cheers,
-M

--
Michael J. Fromberger | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA
Sep 7 '05 #3

P: n/a
>My goal is to combine two different numbers and
encrypt them to create a new number that cann't be traced back to the
originals.

Here's one:
def encrypt(x, y):
"""Return a number that combines x and y but cannot be traced back
to them."""
return x + y

Sep 7 '05 #4

P: n/a
ncf
Steve M wrote:
My goal is to combine two different numbers and

encrypt them to create a new number that cann't be traced back to the
originals.

Here's one:
def encrypt(x, y):
"""Return a number that combines x and y but cannot be traced back
to them."""
return x + y


Or you can use sha1 so you can't do basic checks to find out. :)
It seems to me like he's trying to do some DH like thing, so yea, he
might rather a hash

**** UNTESTED ****

import sha1
def encrypt(x,y):
''' Return a number that combines x and y but cannot be traced back
to them. Number returned is in xrange(2**24). '''
def _dosha(v): return sha1.new(str(v)).hexdigest()
return int(_dosha(_dosha(x)+_dosha(y))[5:11],16)

Sep 7 '05 #5

P: n/a
This is either a very simple or a very open-ended question you have asked. Do
you want to be able to recover the original numbers arbitrarily from the
combination? What properties do you want the combination to have? Do you want
to take the combination and a number and see if the number is in the
combination without revealing any other constituent numbers? Do you want to
be able to provide any arbitrary number of the combination and recover all or
some subset of the constituent numbers (depending on the supplied number)?

What do you want to do with the combination and the individual numbers?

James

On Wednesday 07 September 2005 07:00 am, jl***@fau.edu wrote:
Hi!

I was wondering if someone can recommend a good encryption algorithm
written in python. My goal is to combine two different numbers and
encrypt them to create a new number that cann't be traced back to the
originals.

It would be great if there exists a library already written to do this,
and if there is, can somebody please point me to it??

Thanks in advance,
J


--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Sep 7 '05 #6

P: n/a
Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!!

Sep 7 '05 #7

P: n/a
jl***@fau.edu writes:
Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!!


Why do you want to include the birth date, given that the SSN will
already be unique? It won't be a big obstacle to brute forcing the
SSN out of a keyless hash, since knowing the student's year of
graduation will in most cases be enough to narrow his or her DOB down
to a few hundred possibilities.

How many digits can the student number have? What happens if two
different students get assigned the same number?

If you have a secure database where the actual DOB and SSN are held,
why not just have it issue a student ID number at the time the DOB/SSN
row is added?

I'm feeling that you're working in a subtle and tricky area without
really knowing what you're doing, and that people's privacy is at
risk. Most of the good answers to your question are going to begin
with "choose a random string K that you're able to keep secret through
the entire lifetime of the whole system". The security of your system
will rest on being able to keep K secret against determined attackers.
You then have a key management problem, which has to be handled
through careful procedures and possibly special hardware, not by an
algorithm.

Please get a copy of the book "Security Engineering", by Ross
Anderson, to get an idea of what you're getting into.
Sep 7 '05 #8

P: n/a
jl***@fau.edu wrote:
Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!!


Why don't you assign an arbitrary ID number to each student that is
entirely unrelated to sensitive information (except via the database
which is hopefully secure)?

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Sep 7 '05 #9

P: n/a
On Wednesday 07 September 2005 14:31, jl***@fau.edu wrote:
Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!!


Then your best bet is to take a reasonable number of bits from an sha hash.
But you do not need pycrypto for this. The previous answer by "ncf" is good,
but use the standard library and take 9 digits to lessen probability for
clashes

import sha
def encrypt(x,y):
def _dosha(v): return sha.new(str(v)).hexdigest()
return int(_dosha(_dosha(x)+_dosha(y))[5:13],16)
Example:

py> encrypt(843921299,20050906)
522277004

Each student ID should be unique until you get a really big class. If your
class might grow to several million, consider taking more bits of the hash.

Also, as long as you remember the function, you can get back the student ID
from the birthday and SS, in case they drop out and re-enroll next year.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Sep 7 '05 #10

P: n/a
Also, I should note that the sha function will, to the limits of anyone's
ability to analyze it, decouple the information from the hash. So, to be
careful, you should keep the algorithm to generate the IDs secret. The
advantage of creating an ID from info in this way is that the ID is ("should
be") unique and unchanging. The disadvantage is that you have to keep the
algorithm secret. Because by knowing it, people could generate IDs from
birthdays with only 10**10 calculations (possible 9 digit SS numbers) and
match them to the IDs. All they need to do this is to ask someone what their
birthday is and try SS#s until they get the corresponding ID.

You could keep the algorithm encrypted and decrypt it temporarily to generate
a new ID. Or, you could memorize it and type it in at the beginning of the
semester and generate the IDs for that semester. You might also have to do
this if you loose the IDs somehow.

But beware of the "rubber hose cryptanalyitic attack". This is where an
adversary beats you with a rubber hose then asks you for the ID generation
algorithm (or key to the encrypted version). They then check your algorithm
against a known birthday-SS#-ID triplet. If you lied, they repeat until they
verify your algorithm. This has historically been a very successful attack.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Sep 7 '05 #11

P: n/a
James Stroud <js*****@mbi.ucla.edu> writes:
Then your best bet is to take a reasonable number of bits from an sha hash.
But you do not need pycrypto for this. The previous answer by "ncf" is good,
but use the standard library and take 9 digits to lessen probability for
clashes

import sha
def encrypt(x,y):
def _dosha(v): return sha.new(str(v)).hexdigest()
return int(_dosha(_dosha(x)+_dosha(y))[5:13],16)
...
Each student ID should be unique until you get a really big class. If your
class might grow to several million, consider taking more bits of the hash.


Please don't give advice like this unless you know what you're doing.
You're taking 8 hex digits and turning them into an integer. That
means you'll probably have a collision after around 65,000 id's, not
several million. "Probably" means > 50%. You'll have a significant
chance (say more than 1%) of collision after maybe 10,000.

Also, if you know the student's graduation year, in most cases there
are just a few hundred likely birthdates for that student, so by brute
force search you can crunch the output of your function to a fairly
small number of DOB/SSN combinations.

The only approach that makes sense is for the secure database to
assign arbitrary numbers that aren't algorithmically related to any
sensitive data. Answers involving encryption will need to use either
large ID numbers or secret keys, both of which will cause hassles.
Sep 7 '05 #12

P: n/a
Paul Rubin wrote:
James Stroud <js*****@mbi.ucla.edu> writes:
Then your best bet is to take a reasonable number of bits from an sha hash.
But you do not need pycrypto for this. The previous answer by "ncf" is good,
but use the standard library and take 9 digits to lessen probability for
clashes

import sha
def encrypt(x,y):
def _dosha(v): return sha.new(str(v)).hexdigest()
return int(_dosha(_dosha(x)+_dosha(y))[5:13],16)
...
Each student ID should be unique until you get a really big class. If your
class might grow to several million, consider taking more bits of the hash.

Please don't give advice like this unless you know what you're doing.
You're taking 8 hex digits and turning them into an integer. That
means you'll probably have a collision after around 65,000 id's, not
several million. "Probably" means > 50%. You'll have a significant
chance (say more than 1%) of collision after maybe 10,000.

Also, if you know the student's graduation year, in most cases there
are just a few hundred likely birthdates for that student, so by brute
force search you can crunch the output of your function to a fairly
small number of DOB/SSN combinations.

The only approach that makes sense is for the secure database to
assign arbitrary numbers that aren't algorithmically related to any
sensitive data. Answers involving encryption will need to use either
large ID numbers or secret keys, both of which will cause hassles.


This is indubitably true. There's absolutely no excuse for making the
primary key a function of the data that record contains, as doing so
will assist any cryptanalytical attacks.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Sep 7 '05 #13

P: n/a
On Wed, 07 Sep 2005 15:52:19 -0700, James Stroud wrote:
Also, I should note that the sha function will, to the limits of anyone's
ability to analyze it, decouple the information from the hash. So, to be
careful, you should keep the algorithm to generate the IDs secret.


Security by obscurity is very little security at all. If there is any
motive at all to reverse-engineer the algorithm, people will reverse
engineer the algorithm. Keeping a weak algorithm secret does not make it
strong.

--
Steven.

Sep 9 '05 #14

P: n/a
On Wed, 07 Sep 2005 14:31:03 -0700, jlocc wrote:
Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!!

There are "one-way" encryption functions where the result can't easily be
traced back to the input, but why do you need the input anyway? Here is my
quick-and-dirty student ID algorithm:

last_number_used = 123 # or some other appropriate value

def make_studentID():
global last_number_used
last_number_used = last_number_used + 1
return last_number_used

For a real application, I'd check the database to see if the number has
already been used before returning the number. Also, if you need more
than four digits in your IDs, I'd add a checksum to the end so you can
detect many typos and avoid much embarrassment.

Since the ID is entirely random (a factor of what order the students are
entered into the database) no attacker can regenerate their SSN from their
student ID. At worst, an attacker might be able to work out roughly what
day they were added to the database. Big deal. And if that is a problem,
you might do something like this:

last_number_used = 12345
usable_IDs = []

def make_studentID():
global last_number_used
global usable_IDs
if not usable_IDs:
# generate another batch of IDs in random order
usable_IDs = range(last_number_used, last_number_used + 1000)
usable_IDs.sort(random.random())
last_number_used += 1000
return usable_IDs.pop()

In a real application you would need to store the global variables in a
database, otherwise each time you reload the Python script you start
generating the same IDs over and over again.
--
Steven.

Sep 9 '05 #15

P: n/a
Steven D'Aprano <st***@REMOVETHIScyber.com.au> writes:
On Wed, 07 Sep 2005 14:31:03 -0700, jlocc wrote:
Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!! last_number_used = 123 # or some other appropriate value

def make_studentID():
global last_number_used
last_number_used = last_number_used + 1
return last_number_used

For a real application, I'd check the database to see if the number has
already been used before returning the number. Also, if you need more
than four digits in your IDs, I'd add a checksum to the end so you can
detect many typos and avoid much embarrassment.

[...] In a real application you would need to store the global variables in a
database, otherwise each time you reload the Python script you start
generating the same IDs over and over again.


For real applications (ignoring your theoretical need to generate the
numbers in a random order) I'd not only store the number in the
database - I'd let the databae generate it. Most have some form of
counter that does exactly what you want without needing to keep track
of it and check the database for consistency.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Sep 9 '05 #16

P: 1
agree with michael, using pycrypto,I found AES pretty good and fast for me]


In article <1126101629.243503.299310@g44g2000cwa.googlegroups .com>,
jlocc@fau.edu wrote:
[color=blue]
> Hi!
>
> I was wondering if someone can recommend a good encryption algorithm
> written in python. My goal is to combine two different numbers and
> encrypt them to create a new number that cann't be traced back to the
> originals.
>
> It would be great if there exists a library already written to do this,
> and if there is, can somebody please point me to it??[/color]

I recommend you investigate PyCrypto:
http://www.amk.ca/python/code/crypto
http://sourceforge.net/projects/pycrypto

Cheers,
-M

--
Michael J. Fromberger | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA
Sep 10 '05 #17

P: n/a
Steven D'Aprano <st***@REMOVETHIScyber.com.au> writes:
On Wed, 07 Sep 2005 14:31:03 -0700, jlocc wrote:
Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!!

There are "one-way" encryption functions where the result can't easily be
traced back to the input, but why do you need the input anyway?


Well, there is a form of security design that involves one-way
encryption of confidential information. You might want to be able to
search on SSN, but not have the actual SSN stored in the database. So,
you are prepared to deal with the inevetable, "I lost my
password/student ID, can you still look up my records?"

Don't think it applies in this case, but might in some other cases.

--
Steven.


--
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust." --Scary Go Round
Sep 10 '05 #18

P: n/a
Kirk Job Sluder <ki**@jobsluder.net> writes:
Well, there is a form of security design that involves one-way
encryption of confidential information. You might want to be able to
search on SSN, but not have the actual SSN stored in the database. So,
you are prepared to deal with the inevetable, "I lost my
password/student ID, can you still look up my records?"


The minute you provide a way to do that without secret keys, you have
a security hole. SSN's are 9 digits which means there are 1 billion
of them. If there are 100,000 hashed SSN's in the database, the
attacker (since this is clpy) can read them all into a Python dict.
S/he then starts generating SSN's at random and hashing them and
checking whether those hashes appear in the dict. Doing stuff like
iterated hashes to slow the attacker down doesn't help that much: the
attacker needs to hash only 10,000 or so SSN's to be likely to hit one
that's in the dict. If the attacker can hash all 10**9 SSN's, which
isn't all that terribly many, every SSN in the database spills.

Bottom line: to keep confidential stuff secure, you need actual security.
Sep 10 '05 #19

P: n/a
Paul Rubin <http://ph****@NOSPAM.invalid> writes:
Kirk Job Sluder <ki**@jobsluder.net> writes:
Well, there is a form of security design that involves one-way
encryption of confidential information. You might want to be able to
search on SSN, but not have the actual SSN stored in the database. So,
you are prepared to deal with the inevetable, "I lost my
password/student ID, can you still look up my records?"
The minute you provide a way to do that without secret keys, you have
a security hole.


Providing any kind of access to data involves creating a security hole.
This is the biggest flaw in most discussions of computer security. Too
much of it depends on everyone remembering (and using) unique
cryptographically strong keys.

You have a client on the phone who needs access to information, but has
forgotten or lost the 10-digit unique ID and the PIN you gave them two
years ago. How do you provide that client with the information he or
she needs? This is the kind of dilemma that one-way encryption is
designed to make a tiny bit safer.

SSNs + some other secret (such as mother's maiden name) is certainly
crappy security. However, I don't think we are going to see widespread
adoption of anything better in the near future.

But even if we go with "more secure" authentication tokens, there is usually
no reason to store the authentication token in plaintext.
SSN's are 9 digits which means there are 1 billion
of them. If there are 100,000 hashed SSN's in the database, the
attacker (since this is clpy) can read them all into a Python dict.
S/he then starts generating SSN's at random and hashing them and
checking whether those hashes appear in the dict. Doing stuff like
iterated hashes to slow the attacker down doesn't help that much: the
attacker needs to hash only 10,000 or so SSN's to be likely to hit one
that's in the dict. If the attacker can hash all 10**9 SSN's, which
isn't all that terribly many, every SSN in the database spills.
Of course, an additional step I didn't mention was that in actual
practice the SSNs would be hashed with a strong random secret key. But
from my point of view, the possibility for dictionary attacks is pretty
much unavoidable as long as we are dealing just with memorized tokens.

We've been bitching, whining and moaning about the small keyspace and
poor quality of what users are willing to memorize for 20 years. We can
complain about it for the next 10 which is about how long it will take
for any kind of alternative to be adopted. I still think that one-way
hashing of authentication "secrets" is better than plain-text storage.
Bottom line: to keep confidential stuff secure, you need actual security.


The only way to keep confidential stuff secure is to shred it, burn it,
and grind the ashes.

I think the fundamental problem is that that most customers don't want
actual security. They want to be able to get their information by
calling a phone number and saying a few words/phrases they memorized in
childhood. Given the current market, it seems to be cheaper to deal
with breaks after the fact than to expect more from customers.

--
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust." --Scary Go Round
Sep 10 '05 #20

P: n/a
Kirk Job Sluder <ki**@jobsluder.net> writes:
You have a client on the phone who needs access to information, but has
forgotten or lost the 10-digit unique ID and the PIN you gave them two
years ago. How do you provide that client with the information he or
she needs? This is the kind of dilemma that one-way encryption is
designed to make a tiny bit safer.
You need secret keys then, and you need to secure them. If you have a
secure secret key K, you can store something like HMAC(K, SSN) and
that is pretty safe from offline attacks.
Of course, an additional step I didn't mention was that in actual
practice the SSNs would be hashed with a strong random secret key.
But now you have to maintain that secret key and its secrecy, which is
not a trivial task. It's not an unsolveable problem but you can't
handwave it.

We're told there is already a secure database in the picture
somewhere, or at least one that unescapeably contains cleartext SSN's,
so that's the system that should assign the ID numbers and handle
SSN-based queries.
I think the fundamental problem is that that most customers don't
want actual security. They want to be able to get their information
by calling a phone number and saying a few words/phrases they
memorized in childhood.
A voice exemplar stored at enrollment time plus a question or two like
"what classes did you take last term" could easily give a pretty good
clue that the person saying the words/phrases is the legitimate
student.
Given the current market, it seems to be
cheaper to deal with breaks after the fact than to expect more from
customers.


Customers legitimately want actual security without having to care how
hash functions work, just like they want safe transportation without
having to care about how jet engine turbopumps work. Air travel is
pretty safe because if the airline fails to maintain the turbopumps
and a plane goes down, there is hell to pay. There is huge legal and
financial incentive for travel vendors (airlines) to not cut corners
with airplane safety. But vendors who deploy incompetently designed
IT systems full of confidential data resulting in massive privacy
breaches face no liability at all.

There is no financial incentive for them to do it right, so they
instead spend the money on more marketing or on executive massages or
whatever, and supply lousy security. THAT is the fundamental problem.
Sep 10 '05 #21

P: n/a
Paul Rubin <http://ph****@NOSPAM.invalid> writes:
Kirk Job Sluder <ki**@jobsluder.net> writes:
We're told there is already a secure database in the picture
somewhere, or at least one that unescapeably contains cleartext SSN's,
so that's the system that should assign the ID numbers and handle
SSN-based queries.
Well, IMO just having cleartext SSNs is questionable practice unless you
need those SSNs to report to some other agency that takes SSNs. And
even so, you might want to limit access to plaintext SSNs to a limited
group, and give access to the hashed SSNs as a search key to a different
group.
I think the fundamental problem is that that most customers don't
want actual security. They want to be able to get their information
by calling a phone number and saying a few words/phrases they
memorized in childhood.


A voice exemplar stored at enrollment time plus a question or two like
"what classes did you take last term" could easily give a pretty good
clue that the person saying the words/phrases is the legitimate
student.


In my experience the typical student has trouble remembering what
happened last week, much less last term. In addition, universities
frequently need to field questions from people who were students years
ago.

Are voice exemplars at that stage yet?
Customers legitimately want actual security without having to care how
hash functions work, just like they want safe transportation without
having to care about how jet engine turbopumps work. Air travel is
pretty safe because if the airline fails to maintain the turbopumps
and a plane goes down, there is hell to pay. There is huge legal and
financial incentive for travel vendors (airlines) to not cut corners
with airplane safety. But vendors who deploy incompetently designed
IT systems full of confidential data resulting in massive privacy
breaches face no liability at all.


I'm more than happy to agree to disagree on this, but I see it
differently. In aviation there certainly is a bit of risk-benefit
analysis going on in thinking about whether the cost of a given safety
is justified given the benefits in risk reduction.

Likewise, credit companies are currently making money hand-over-fist.
If an identity is compromised, it's cheaper for them to just close the
account, refund the money, and do their own fraud investigation after
the fact. Meanwhile, for every person who gets stung, there are a
hundred wanting convenience. In addition, the losses due to bad
cryptographic implementation appear to be trivial compared to the losses
due to social engineering.

--
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust." --Scary Go Round
Sep 10 '05 #22

P: n/a
Kirk Job Sluder wrote:
Paul Rubin <http://ph****@NOSPAM.invalid> writes:
Kirk Job Sluder <ki**@jobsluder.net> writes:
We're told there is already a secure database in the picture
somewhere, or at least one that unescapeably contains cleartext SSN's,
so that's the system that should assign the ID numbers and handle
SSN-based queries.


Well, IMO just having cleartext SSNs is questionable practice unless you
need those SSNs to report to some other agency that takes SSNs.


Colleges generally do have such needs.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Sep 10 '05 #23

P: n/a
On Saturday 10 September 2005 14:01, Kirk Job Sluder wrote:
Providing any kind of access to data involves creating a security hole.
This is the biggest flaw in most discussions of computer security.
On 9/9/05 Steven D'Aprano wrote: There are "one-way" encryption functions where the result can't easily be
traced back to the input, but why do you need the input anyway? Here is my
quick-and-dirty student ID algorithm:


I have invented the perfect security protocol that solves a major problem with
the one-time-pad. The problem with most one-time-pad protocols is that you
still need to have the pad around, creating a major security hole. I have
solved that problem here. It has all of the steps of the usual one-time-pad
plus an extra step.

1. Generate a random number the size of your data.
2. XOR your data with it.
3. Destroy the original data.

Here is the additional step:

4. Destroy the random number.

You can see now that no adversary can resonably reconstruct the plain text.
This protocol might be terribly inconvenient, though, because it makes the
origina data unaccessible. Oh well, just a necessary byproduct of
theoritcally perfect security.

I hereby place this algorithm in the public domain. Use it freely.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Sep 10 '05 #24

P: n/a
Kirk Job Sluder <ki**@jobsluder.net> writes:
I'm more than happy to agree to disagree on this, but I see it
differently. In aviation there certainly is a bit of risk-benefit
analysis going on in thinking about whether the cost of a given safety
is justified given the benefits in risk reduction.

Likewise, credit companies are currently making money hand-over-fist.
If an identity is compromised, it's cheaper for them to just close the
account, refund the money, and do their own fraud investigation after
the fact.


You don't get it. Refunding the money improperly charged on a single
card doesn't begin to compensate for the hassle of undoing an identity
theft. If airlines worked the way you're suggesting the credit
industry should work, and a plane went down, the airline would be off
the hook by refunding your estate the price of your ticket. It's only
because they face much further-reaching liability than that, that they
pay so much attention to safety.
Sep 10 '05 #25

P: n/a
Kirk Job Sluder wrote:
The only way to keep confidential stuff secure is to shred it, burn it,
and grind the ashes.

I think the fundamental problem is that that most customers don't want
actual security. They want to be able to get their information by
calling a phone number and saying a few words/phrases they memorized in
childhood. Given the current market, it seems to be cheaper to deal
with breaks after the fact than to expect more from customers.


Security = Privacy in this context, and most customers do want privacy.

But also in this case, you are referring to two party security
situations, where the data is shared between a service provider and a
service consumer.

I would think that any n digit random number not already in the data
base would work for an id along with a randomly generated password that
the student can change if they want. The service provider has full
access to the data with their own set of id's and passwords, so in the
case of a lost id, they can just look it up using the customers name
and/or ssn, or whatever they decide is appropriate. In the case of a
lost password, they can reset it and get another randomly generated
password.

Or am I missing something?

Cheers,
Ron

Sep 10 '05 #26

P: n/a
In <pa****************************@REMOVETHIScyber.co m.au>, Steven
D'Aprano wrote:
last_number_used = 12345
usable_IDs = []

def make_studentID():
global last_number_used
global usable_IDs
if not usable_IDs:
# generate another batch of IDs in random order
usable_IDs = range(last_number_used, last_number_used + 1000) - usable_IDs.sort(random.random())
+ random.shuffle(usable_IDs) last_number_used += 1000
return usable_IDs.pop()


Ciao,
Marc 'BlackJack' Rintsch
Sep 10 '05 #27

P: n/a
On Saturday 10 September 2005 15:02, Ron Adam wrote:
Kirk Job Sluder wrote:
I would think that any n digit random number not already in the data
base would work for an id along with a randomly generated password that
the student can change if they want. The service provider has full
access to the data with their own set of id's and passwords, so in the
case of a lost id, they can just look it up using the customers name
and/or ssn, or whatever they decide is appropriate. In the case of a
lost password, they can reset it and get another randomly generated
password.

Or am I missing something?


Yes and no. Yes, you are theoretically correct. No, I don't think you have the
OP's original needs in mind (though I am mostly guessing here). The OP was
obviously a TA who needed to assign students a number so that they could
"anonymously" check their publicly posted grades and also so that he could do
some internal record keeping.
But, I'm thinking no one remembers college here anymore.

When I was in college (and when I TA'd) security was kind of flimsy. TAs kept
all records of SS#s, etc. (etc. includes birthdays here) in a gradebook (or
the rich ones kept them on a 5 1/4" floppy). Grades were reported publicly by
full SS#s, usually on a centralized cork-board. That was back in the
good-ole-days, before financial fraud was euphemised to "identity theft".

When I TA'd several years later, grades were reported by the last n digits of
the SS#. Some very security conscious TAs--or was it just me? I think it was
just me--solicited pass phrases from each student and grades were reported
based on the student generated pass phrase--and not on SS# or the like. These
phrases usually came in the form of "Buffs1" or "Kitty1979" (the latter
possibly revealing some information about a birthday, perhaps?). Some
students didn't submit pass phrases, for whatever reason. I think I did the
less convenient of the two most reasonable options, which was to withold
reporting the grade to the student until they gave me a phrase. The other
option was to use a default pass phrase of the last n digits of the SS#.

The idea of combining ID information and encrypting it to create another ID is
a quantum leap beyond the primitive "last n digits of the SS#". Does it beat,
in theoretical terms, assigning random numbers? No. And it certainly doesn't
beat, in theoretical terms, my improved one-time-pad protocol (see my
previous email). I challenge even the most capable cryptographer to beat my
improved one-time-pad protocol for security (Oh wait, here it is: 1. Destroy
Data.) But it is convenient, especially if you discard the original
identifying information and store just the hashes. And as far as collisions
go, even if a class of 10,000 gives a 1% chance of collision, who is going to
TA a class of 10,000 students. If you can promise that kind of enrolment for
any department, much less any single class, there is a job in an Economics
department waiting for you out there, my friend.

So what would be the alternative to ID information generated IDs? Have a 3xDES
encrypted database with the SS# and birthday stored as plain-text? Better
keep the encryption protocol secret! Oops. Screwed up already. I figured out
the encryption protocol: Encrypt database with 3xDES using a secret key.
Dang, security through obscurity. All they have to do is to get that secret
key and all those records are easily readable.

The point is that *something has to be kept secret* for encryption security to
work. Theoretically best would be a passphrase, or a passphrase to a really
big key. So, perhaps we could modify the algorithm from a few messages back,
in order to address the (assumed) *practical* considerations of the OP's
original query:

import sha
def encrypt(x,y, password):
def _dosha(v): return sha.new(str(v)+str(password)).hexdigest()
return int(_dosha(_dosha(x)+_dosha(y))[5:13],16)

So now what is the criticism? That its still a "secret algorithm" because the
password is "secret"?

James
--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Sep 10 '05 #28

P: n/a
James Stroud <js*****@mbi.ucla.edu> writes:
Yes and no. Yes, you are theoretically correct. No, I don't think
you have the OP's original needs in mind (though I am mostly
guessing here). The OP was obviously a TA who needed to assign
students a number so that they could "anonymously" check their
publicly posted grades and also so that he could do some internal
record keeping.
If that's all it's about, it's not a big deal. If it's for some central
administrative database that's more of a target, more care is warranted.
The idea of combining ID information and encrypting it to create
The info to be combined was the student's birthdate. Why would the TA
have access to either that or the SSN?
import sha
def encrypt(x,y, password):
def _dosha(v): return sha.new(str(v)+str(password)).hexdigest()
return int(_dosha(_dosha(x)+_dosha(y))[5:13],16)

So now what is the criticism? That its still a "secret algorithm"
because the password is "secret"?


That's sort of reasonable as long as the password really is secret and
you don't mind a small chance of two students getting the same ID
number once in a while. If the password is something that a TA types
into a laptop when entering grades and which goes away after the
course ends, it's not such a big deal. If it's a long-term key that
has to stay resident in a 24/7 server through the students' entire
time at the university and beyond, then the algorithm is the trivial
part and keeping the key secret is a specialized problem in its own
right. For example, financial institutions use special, tamper
resistant hardware modules for the purpose.

Could the OP please say what the exact application is? That might get
more useful responses if the question still matters.
Sep 10 '05 #29

P: n/a
On Saturday 10 September 2005 16:30, Paul Rubin wrote:
The info to be combined was the student's birthdate. *Why would the TA
have access to either that or the SSN?


Speaking as a former TA, we had all that and a little more, if I remember
correctly. The "why" aspect is a little beyond me.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Sep 10 '05 #30

P: n/a
James Stroud wrote:
On Saturday 10 September 2005 15:02, Ron Adam wrote:
Kirk Job Sluder wrote:
I would think that any n digit random number not already in the data
base would work for an id along with a randomly generated password that
the student can change if they want. The service provider has full
access to the data with their own set of id's and passwords, so in the
case of a lost id, they can just look it up using the customers name
and/or ssn, or whatever they decide is appropriate. In the case of a
lost password, they can reset it and get another randomly generated
password.

Or am I missing something?

Yes and no. Yes, you are theoretically correct. No, I don't think you have the
OP's original needs in mind (though I am mostly guessing here). The OP was
obviously a TA who needed to assign students a number so that they could
"anonymously" check their publicly posted grades and also so that he could do
some internal record keeping.

But, I'm thinking no one remembers college here anymore.


Last semester I took, I was able to check my grades by logging into a
web page with my student ID and using a password. The password default
was my SSN, we could change it. In any case students have read only
access and are not able to change anything. Not a big deal and very
little personal information was visible. If any one would have bothered
to look they would have simply found out I had very good grades. <shrug>

The point is that *something has to be kept secret* for encryption security to
work. Theoretically best would be a passphrase, or a passphrase to a really
big key. So, perhaps we could modify the algorithm from a few messages back,
in order to address the (assumed) *practical* considerations of the OP's
original query:


The actual database files should not be directly reachable, except by
the appropriate data base administrators, it should send and retrieve
information based on the users access rights via a server.

Is this a case where each account is encrypted with a different key in
addition to the access rights given to each user?

Cheers,
Ron

Sep 11 '05 #31

P: n/a
Paul Rubin <http://ph****@NOSPAM.invalid> writes:
Kirk Job Sluder <ki**@jobsluder.net> writes:
Likewise, credit companies are currently making money hand-over-fist.
If an identity is compromised, it's cheaper for them to just close the
account, refund the money, and do their own fraud investigation after
the fact.


You don't get it. Refunding the money improperly charged on a single
card doesn't begin to compensate for the hassle of undoing an identity
theft. If airlines worked the way you're suggesting the credit
industry should work, and a plane went down, the airline would be off
the hook by refunding your estate the price of your ticket. It's only
because they face much further-reaching liability than that, that they
pay so much attention to safety.


Oh, I'm not suggesting the credit industry should work that way. I'm
just saying that's the way they will work as long as they can push off
the costs for dealing with problems onto interest rates and other fees.
--
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust." --Scary Go Round
Sep 11 '05 #32

P: n/a
Ron Adam <rr*@ronadam.com> writes:
Kirk Job Sluder wrote:
They want to be able to get their information by
calling a phone number and saying a few words/phrases they memorized in
childhood. Given the current market, it seems to be cheaper to deal
with breaks after the fact than to expect more from customers.
I would think that any n digit random number not already in the data
base would work for an id along with a randomly generated password
that the student can change if they want. The service provider has
full access to the data with their own set of id's and passwords, so
in the case of a lost id, they can just look it up using the customers
name and/or ssn, or whatever they decide is appropriate. In the case
of a lost password, they can reset it and get another randomly
generated password.

Or am I missing something?


Not really. My suggestion is that in many cases, if the data is being
used only as a backup password or authentication token, there is no need
for that data to be stored in plaintext. For example, with the
ubiquitous "mother's maiden name" * there is frequently no need to
actually have "Smith," "Jones," or "Gunderson" in the database.
"bf65d781795bb91ee731d25f9a68a5aeb7172bc7" serves the same purpose.

There are other cases where one-way anonymity is better than a table
linking people to randomly generated userIDs. I'd rather use
cryptographic hashes for research databases than keep a table matching
people to random numbers hanging around. But I'm weird that way.

* I think "mother's maiden name" is a really poor method for backup
authentication because for a fair number of people in the U.S., it
will be identical to their current surname, and for the rest, it's
trivial to discover.

Cheers,
Ron


--
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust." --Scary Go Round
Sep 11 '05 #33

P: n/a
Kirk Job Sluder wrote:
Ron Adam <rr*@ronadam.com> writes:
I would think that any n digit random number not already in the data
base would work for an id along with a randomly generated password
that the student can change if they want. The service provider has
full access to the data with their own set of id's and passwords, so
in the case of a lost id, they can just look it up using the customers
name and/or ssn, or whatever they decide is appropriate. In the case
of a lost password, they can reset it and get another randomly
generated password.

Or am I missing something?

Not really. My suggestion is that in many cases, if the data is being
used only as a backup password or authentication token, there is no need
for that data to be stored in plaintext. For example, with the
ubiquitous "mother's maiden name" * there is frequently no need to
actually have "Smith," "Jones," or "Gunderson" in the database.
"bf65d781795bb91ee731d25f9a68a5aeb7172bc7" serves the same purpose.


For that matter if the encrypted data is used a the key, then there is
no need to store the data period. OH... lets see, we'll just store the
password, and give them the data instead. Never mind it's a few thousand
characters or more. ;-) "Oh, and don't loose your account number BTW."

There are other cases where one-way anonymity is better than a table
linking people to randomly generated userIDs. I'd rather use
cryptographic hashes for research databases than keep a table matching
people to random numbers hanging around. But I'm weird that way.


Why would you need a table hanging around?

Most databases today are relational, so they are made up of lots of
linked tables of records and fields. And each user, can have access to
some parts without having access to other parts. So couldn't you
create a separate account to access, names and id numbers only?

Cheers,
Ron

Sep 11 '05 #34

P: n/a
Thank you to Mike Meyer, Kirk Sluder, and anyone who made constructive
comments and/or corrections to my earlier post about generating student
IDs as random numbers.

Especially thanks to Marc Rintsch who corrected a stupid coding mistake I
made. Serves me right for not testing the code.

Kirk pointed out that there is a good usage case for using a one-way
encryption function to encrypt a Social Security Number to the student ID:
you are prepared to deal with the inevetable, "I lost my
password/student ID, can you still look up my records?"


Whether the usefulness of that use outweighs the risks is not something we
can decide, but I hope the original poster is considering these issues and
not just blindly going for the technical solution.

For example, this is one possible way of dealing with students who have
lost their student ID:

- ask student for their name, d.o.b. and SSN;
- search the database for students whose name, d.o.b. and SSN match;
- if you have more than one match, there is a serious problem;
- otherwise you may consider that the student has proven their own
identity to you sufficiently, so you can safely tell them the student ID.

No need for a function that calculates the ID from the SSN, with the
associated risk that Black Hats will break the algorithm and use the
student ID to steal students' SSNs.

In effect, this scheme uses the algorithm "look it up in a secure
database" as the one-way function. It is guaranteed to be
mathematically secure, although it is vulnerable to bad guys cracking
into the database.
Thanks also to James Stroud for his amusing extension to the one-time pad
algorithm. If you have a need to be able to reconstruct the data, then
of course you need some sort of cryptographic function that can encrypt
the data and decrypt it. But that begs the question of whether or not you
actually do need to be able to reconstruct the data. The point of my post
was that you may not need to, in which case a random number is as good as
any other ID.

James also protested that passwords are "security through obscurity",
since "All they have to do is to get that secret key and all those
records are easily readable."

Of course this is technically correct, but that's not what security
through obscurity means to folks in the security business. The difference
between security through obscurity and security through a secret key is
profound: if I reverse-engineer your secret algorithm, I can read
every record you have. But if I discover the secret key belonging to one
person, I can only read that person's messages, not anyone else's.

As James says, "The point is that *something has to be kept secret* for
encryption security to work." Absolutely correct. But now think of the
difference between having keys to your door locks, compared to merely
keeping the principle of the door handle secret.

--
Steven.

Sep 12 '05 #35

This discussion thread is closed

Replies have been disabled for this discussion.