Python for Reverse Engineering

Brad Tilley

A friend of mine wrote an algorithm that generates strings. He says that
it's impossible to figure out exactly how the algorithm works. That may
be true, but I think if I had enough sample strings that I could write a
program to identify patterns in the strings and then produce strings
with similar patterns. He disagrees with me. He gave me 100 strings to
analyze.

I wrote a script to read each string and build a list of characters
present. I've discovered that he only uses 20 alpha/numeric chars to
generate the strings and that each string sums up to a value between
1000 and 1200 when I assign each char in the string its ASCII value.

What else might I do to analyze these things? As it stands now, I can
generate an acceptable string on ever 100th attempt or so, but I'd like
to do better than this.

-b

Jul 18 '05 #1

Subscribe Reply

3547

Mike Meyer

Brad Tilley <rt*****@vt.edu> writes:

A friend of mine wrote an algorithm that generates strings. He says
that it's impossible to figure out exactly how the algorithm
works. That may be true, but I think if I had enough sample strings
that I could write a program to identify patterns in the strings and
then produce strings with similar patterns. He disagrees with me. He
gave me 100 strings to analyze.

I wrote a script to read each string and build a list of characters
present. I've discovered that he only uses 20 alpha/numeric chars to
generate the strings and that each string sums up to a value between
1000 and 1200 when I assign each char in the string its ASCII value.

What else might I do to analyze these things? As it stands now, I can
generate an acceptable string on ever 100th attempt or so, but I'd
like to do better than this.

Build a markov chain generator. Basically, build tuples of characters
of length N, and use those to index a dictionary containing a list (or
string) of characters that follow that list of characters in your
input strings. So, with N equal 2, and d[('a', 'b')] = ['c', 'd', 'e']
you'd know that the strings abc, abd and abe appeared in the text. To
gennerate a string, pull a random key from the dictionary, then choose
random characters from the list that the last N characters treated as
a tuple index in the dictionary. Larger values of N generate strings
that more closely resemble the input text.

Optionally, make the objects stored in the dictionary keep a count of
how many times each character appears, and choose one randomly
weighted by those counts.

I've got a markov chain generator that works on words if you really
want to see one.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Jul 18 '05 #2

Michael Fuhr

Brad Tilley <rt*****@vt.edu> writes:

A friend of mine wrote an algorithm that generates strings. He says that
it's impossible to figure out exactly how the algorithm works. That may
be true, but I think if I had enough sample strings that I could write a
program to identify patterns in the strings and then produce strings
with similar patterns. He disagrees with me. He gave me 100 strings to
analyze.
What input, if any, does the algorithm take? What are the strings
used for? Care to post the 100 strings so others can have a look?
What else might I do to analyze these things? As it stands now, I can
generate an acceptable string on ever 100th attempt or so, but I'd like
to do better than this.

Reading about cryptanalysis might prove fruitful. I suspect the
number of "impossible to figure out" algorithms is far less than
the number of algorithms whose developers initially made that claim.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Jul 18 '05 #3

Brad Tilley

Michael Fuhr wrote:

Brad Tilley <rt*****@vt.edu> writes:
A friend of mine wrote an algorithm that generates strings. He says that
it's impossible to figure out exactly how the algorithm works. That may
be true, but I think if I had enough sample strings that I could write a
program to identify patterns in the strings and then produce strings
with similar patterns. He disagrees with me. He gave me 100 strings to
analyze.

What input, if any, does the algorithm take?

I don't know, but I'm guessing that there is none.
What are the strings used for?
To unlock an application he wrote (software key).
Care to post the 100 strings so others can have a look?
I'd rather not do that.
Reading about cryptanalysis might prove fruitful. I suspect the
number of "impossible to figure out" algorithms is far less than
the number of algorithms whose developers initially made that claim.

I suspect you're right.

Jul 18 '05 #4

Jeremy Bowers

On Fri, 05 Nov 2004 23:36:57 -0500, Brad Tilley wrote:

What are the strings used for?

To unlock an application he wrote (software key).

Ah.

Tell your friend the usual order of business when cracking these apps is
to find the conditional that does the final check for validity and
hard-code it to true (or false or whatever). Ask him if he really thinks
he can do better than massive companies like Electronic Arts and the other
large gaming companies, which still routinely experience 0-day (or less!)
warezing of their works.

The only remotely foolproof way to secure an app is to require an online
component, and to check that part. But few apps can really reasonably do
that, and customers *do* notice.

I say this mostly because people should not fool themselves into thinking
they can make this work, when hundreds or thousands of others can try. If
your software can't profit without that kind of protection, it probably
can't profit with it, either.

(I've read several rants from shareware developers bemoaning the piracy
rate and complaining that while people pirate their work, nobody pays for
it. Inevitably, I dig deeper and find they are peddling Yet Another FTP
program or the like (HTML editor, XML editor, text editor, simple image
editor, image viewer, macro program, etc.), *which the market has clearly
set a value of $0 for*. This is off-topic, but I'd check this too; you may
be protecting something of no value on the open market. I don't know, but
based on my readings and a quick browse through Tucows, the odds are
pretty decent.)

Jul 18 '05 #5

James Stroud

Care to post the 100 strings so others can have a look?
I'd rather not do that.

If your friend was worth his salt as an algorithm writer, he'd be more than
willing to let you post a mere 100 strings. For any good generator, this
would be paltry compared to the number of strings it would take to figure out
the algorithm.

But--he is probably making impossible claims here, anyway...
Reading about cryptanalysis might prove fruitful.

Perhaps your friend should pay heed to this as well.
James

--
James Stroud, Ph.D.
UCLA-DOE Institute for Genomics and Proteomics
611 Charles E. Young Dr. S.
MBI 205, UCLA 951570
Los Angeles CA 90095-1570
http://www.jamesstroud.com/

Jul 18 '05 #6

Duncan Smith

"James Stroud" <js*****@mbi.ucla.edu> wrote in message
news:ma**************************************@pyth on.org...

Care to post the 100 strings so others can have a look?
I'd rather not do that.

If your friend was worth his salt as an algorithm writer, he'd be more

than willing to let you post a mere 100 strings. For any good generator, this
would be paltry compared to the number of strings it would take to figure out the algorithm.

But--he is probably making impossible claims here, anyway...

[snip]

Doesn't that depend on exactly what is meant by "figure out exactly how the
algorithm works"? If it means identify (with absolute certainty) the
algorithm used to generate the strings, then it surely can't be possible.

Duncan

Jul 18 '05 #7

Brad Tilley

Duncan Smith wrote:

If your friend was worth his salt as an algorithm writer, he'd be more...
But--he is probably making impossible claims here, anyway...

[snip]

Doesn't that depend on exactly what is meant by "figure out exactly how the
algorithm works"? If it means identify (with absolute certainty) the
algorithm used to generate the strings, then it surely can't be possible.

Duncan

That's my thought as well. I don't want to know exactly how the
algorithm generates strings. But, I think that if I analyze enough
strings I should know, on some level, what an acceptable string looks like.

Samba coders never see Microsoft's file and print sharing source code,
yet they are able to emulate an NT server quite well just by observing
packets.

Jul 18 '05 #8

Duncan Smith

"Brad Tilley" <rt*****@vt.edu> wrote in message
news:cm**********@solaris.cc.vt.edu...

Duncan Smith wrote:
If your friend was worth his salt as an algorithm writer, he'd be more...But--he is probably making impossible claims here, anyway...
[snip]

Doesn't that depend on exactly what is meant by "figure out exactly how the algorithm works"? If it means identify (with absolute certainty) the
algorithm used to generate the strings, then it surely can't be possible.
Duncan

That's my thought as well. I don't want to know exactly how the
algorithm generates strings. But, I think that if I analyze enough
strings I should know, on some level, what an acceptable string looks

like.
Samba coders never see Microsoft's file and print sharing source code,
yet they are able to emulate an NT server quite well just by observing
packets.

Right, so you might be able to come up with something that produces similar
output. Do you know if the strings are generated independently? If so,
there must be some stochastic component (or unknown inputs) or the strings
would be identical. How about the frequencies of the characters? Are some
(significantly) more frequent than others? Do some characters follow others
with unusually high frequency? Do characters tend to cluster (more or less
that you'd expect from independently generated characters)? Unless the
strings are very long you probably can't answer these questions too reliably
with only 100 strings.

Markov Chains are a possibility (as already mentioned). I'd probably start
by looking at the simpler things first. The 1000 to 1200 sum might be a
clue, particularly if you're having trouble emulating it. Of course, if
this turns out to be some sort of code and you're looking at some encoded
text, then that's something I know little about, and the above might be next
to useless.

Maybe the basic statistical tests in Gary Strangman's stats.py would be
useful, unless you already have R and RPy?

Duncan

Jul 18 '05 #9

James Sapara

Well for the fun of it I'm gonna post 100 strings that are probably like
the strings referred to earlier. I'll answer a limited set of questions
about the keys if you want, but for the most part they are a set of 100
unqiue keys, with two variable inputs and a few static inputs.

#0 5V6XB-TV6N6-5H7J3-WWTWQ-6H74B
#1 CBPPC-LTJ5X-1S8ZS-5LVBZ-YRFVW
#2 QJCT6-VXYLT-2S9QZ-SQ02G-MJD9S
#3 SF46P-67SK3-6BWFD-BQ9Y0-XJ4J2
#4 ZR4VZ-BS499-TPMDT-Y2MBF-LN2VL
#5 YKR4K-ZNJ41-W2N2D-0G2FL-ZZSFK
#6 4ZBVQ-388NT-H7742-MM7N8-NYRSM
#7 468HK-3DX11-H13CD-MVPQL-N9G5K
#8 0W853-1V66Q-HPMHK-6H105-45LX1
#9 0XSRZ-1XW7N-H3LLK-620JR-4ND0J
#10 JF3XN-C7SCR-QBP0G-VQPJ2-0JG0G
#11 4BKKK-TZW8T-ZV713-80QB6-J8NV7
#12 R5W5D-V1YV8-83DRQ-DXTPX-9P7ZH
#13 C5S1Z-NJQMS-89JP0-45LVP-R7WY4
#14 CS3VY-L4SY9-1MPNM-50P0T-Y8GXX
#15 ZY7JC-P8KRK-K1511-80RZN-J8VQY
#16 KQ5D3-2F4P2-DRT19-347KH-K1R6Z
#17 KZ0ZH-QXC2Z-NH4WS-S9KDD-M0P7D
#18 5LLXY-G0GNY-F0GJW-KD1WH-8LL4Z
#19 5K7C3-G9VBQ-FBP3K-K43M5-81ZL1
#20 4TX4X-TJ8H6-Z5QTL-8F8RR-JSYCJ
#21 47LHR-TH59K-Z2Y8C-8XPL0-JPGD2
#22 DDK4B-YWD46-ZD123-3SGFQ-KX3FB
#23 4T8D6-346BS-HTM9C-MM17Z-NYL8W
#24 YTK41-ZH3H0-WKJTP-0Y3R9-ZWZCT
#25 JRH3Q-CF1HG-QYXMB-VRX4Y-0QTPP
#26 RL645-V06HF-807TF-DDTR1-9L7C8
#27 K66DH-Q3HPB-NJR1J-SH5KQ-M5C6B
#28 B6X7T-7RNKW-3S2MW-752K2-B7S6G
#29 CP30J-Y2SQB-STPBQ-H6PPB-WFGZ0
#30 4GPG5-TJGT4-Z11WX-8PWQP-JBM54
#31 K5Y7N-QD0K3-NV0M6-S70KD-MMD6D
#32 S0J7B-6MDXK-65VCV-BKVW1-XKF48
#33 DYGQK-YK1WP-ZKQNN-3C9L8-KD4DM
#34 5D6F1-G7VPC-F4H7F-K3H5M-8V9WR
#35 YQVX4-Z2YCF-W0607-0T6JF-Z3K0L
#36 6MHYP-50DQG-36NY4-T07GB-T8RM0
#37 C9W27-N2SJ3-8HB1P-4VD0F-R9QXL
#38 S0CXD-6MZ08-65X8Q-BKH6X-XK92H
#39 J5KC0-Q3HJP-GBWSC-5WXJV-YHT03
#40 JPRCN-QK7QP-G04G7-5GQF3-YZNFN
#41 C07RC-NGS0W-83G95-4444H-R15PZ
#42 4MJT3-3C192-HY369-MBJQH-N6H5Z
#43 LSVN2-2SYJC-LW6JM-QP617-7BK3F
#44 J100C-CBQQX-QVWBS-VTWPZ-03MZW
#45 T8PLF-XGTG8-WNSXX-L16VJ-GGKY5
#46 T9T3Y-X5YS9-WB04M-L9LHT-G0WTX
#47 J4VMP-Q79GS-GLY38-56JG5-YFHM1
#48 YG31S-BLSFJ-M8P0Y-BNPY4-X4GJV
#49 J45D0-C14BN-QXT9R-V977C-00R86
#50 JHXZ6-QF21M-GD189-5VCXK-Y9JBC
#51 KST2Q-QHL4T-NC8P2-SB868-M6Y2M
#52 4W3MS-G6ST5-7FPV6-YTPST-L3GKX
#53 SNWGZ-KC91N-Y445K-NY43R-VW5HJ
#54 Y8HWH-B310B-MXX2J-BQXLQ-XJTDB
#55 09H9B-1GDX6-HVNR3-6N73Q-44RHB
#56 6JRBR-59J85-34NP0-TJ216-TCS37
#57 S3B01-KP820-YZ72P-NC709-VDRXT
#58 L5VSF-21LLM-L3GJN-QXWGV-7PMM3
#59 5SFHJ-GHP1B-FCSCQ-KBCQB-86J50
#60 C7WG1-N9D6C-8DZ7F-40LJM-R8W0R
#61 JM04J-QSY3J-GX5GD-5M8Y7-YYYJF
#62 S7XLQ-K39TG-YQBNB-N3R5Y-VVVWP
#63 SV40X-KVSFL-YHWTY-NW9BK-VH4VC
#64 DY6YC-98HCK-Q1R61-F054N-H8CPY
#65 D7TXK-YGY0P-ZG08N-3FL68-KSW2M
#66 TWV4G-XV9VM-WPYKV-LHJ2G-G5H9S
#67 5GSBC-TLLMK-582F1-WNNCN-6481Y
#68 BRC6N-7SM63-3P5P6-725MD-BNCLD
#69 5QT9F-TSL88-5H8HX-WG8FJ-6ZYF5
#70 JSJ7J-C4D6B-QMVWQ-V0V7B-08F80
#71 KJ5V3-29HNQ-D4K4K-3JKN5-KCPS1
#72 BD6S1-77HL0-34RJP-735G9-BVCMT
#73 5DXDD-GW9BX-FDB9Z-KSR7L-8XV8K
#74 ZSDCJ-BHMBP-TCC3G-YBSMN-L61LY
#75 YKW9P-PLZKS-CV008-NH9S5-V54K1
#76 5BB3S-35M4J-PJZWY-7XKT4-BPPRV
#77 488XZ-TD88L-ZFV5B-8202S-JND9Q
#78 DXCG2-9KMDC-QC5XM-FR5F7-HQCFF
#79 J0M4B-CZTVK-QXDKV-VWZ21-0H698
#80 TWBL7-X6XGH-WFHX1-LTXV7-G3TYF
#81 Q99DN-VW5F2-2T64L-SY5PX-MWCZH
#82 ZZRHH-B8WRB-T7DVJ-YMDCQ-LYQ1B
#83 DV9BG-9H8MZ-QR1FL-FKMCS-HK21Q
#84 CJ0RQ-Y92LG-S4MBB-HJ6VY-WCKYP
#85 RWQ9X-66JML-0FG7Y-QTGRK-733CC
#86 CRKJ9-LS33V-1PJS5-5239S-YNZNQ
#87 C06Y5-LZHCF-1XR6F-5W541-YHCP8
#88 0PPLS-12TG5-HTSX6-666VT-4FKYX
#89 SR2MR-6FFG5-6YR30-BRRG6-XQVM7
#90 JDXYQ-Q4WYG-G9C7B-5NGVY-Y43YP
#91 40YL4-3M032-H505H-MK0J3-NKD0N
#92 JZ6K4-QZWG7-GQ3QK-58Z7J-YT685
#93 YKSL9-Z0WTV-WTLN5-0S05S-ZXDWQ
#94 57CL4-T3M32-5Q55H-W35J3-6VC0N
#95 CJ0PK-Y9251-S4MZD-HJ6BL-WCKVK
#96 C6G4K-LD1HP-11QTN-5V9R8-Y94CM
#97 LZSP2-287WC-L7BGM-QMBZ7-7YBQF
#98 YQ70R-PNLBY-C1SZQ-NJ7WT-VCR4X
#99 42LNF-GB358-71QSX-YFQQJ-LSN55

James
Duncan Smith wrote:

Right, so you might be able to come up with something that produces similar
output. Do you know if the strings are generated independently? If so,
there must be some stochastic component (or unknown inputs) or the strings
would be identical. How about the frequencies of the characters? Are some
(significantly) more frequent than others? Do some characters follow others
with unusually high frequency? Do characters tend to cluster (more or less
that you'd expect from independently generated characters)? Unless the
strings are very long you probably can't answer these questions too reliably
with only 100 strings.

Markov Chains are a possibility (as already mentioned). I'd probably start
by looking at the simpler things first. The 1000 to 1200 sum might be a
clue, particularly if you're having trouble emulating it. Of course, if
this turns out to be some sort of code and you're looking at some encoded
text, then that's something I know little about, and the above might be next
to useless.

Maybe the basic statistical tests in Gary Strangman's stats.py would be
useful, unless you already have R and RPy?

Duncan

--
-----------------------------------------------------------------------
James Sapara
Software Architect

Front Logic Inc. Tel: 306.653.2725 x14
Suite 300, Scotia Center Toll Free: 1.800.521.4510
111 Second Ave South Fax: 306.653.0972
Saskatoon, SK S7K 1K6
Canada
http://www.frontlogic.com ja***@frontlogic.com

Find out what TYPENGO(tm) N300 Search Technology can do for your
company: http://www.frontlogic.com/interactiv...ngo/index.html
-----------------------------------------------------------------------

Jul 18 '05 #10

James S

I got a few questions about the strings, so I figured I would share the
answers.

You wrote the algorithm that generated these strings?

Yes

How are the strings verified and proven to be legitimate?

There is a checksum encoded in the string that provides a basic check.
There is a unquice number encoded in each string that represents each
possible key.

There are 28629150 valid keys in the space of 28629150^5

Does the same algorithm that created them verify them?

The same string is feed back through the algo in reverse. Providing the
origional information that generated it.

James

James Sapara wrote:

Well for the fun of it I'm gonna post 100 strings that are probably like
the strings referred to earlier. I'll answer a limited set of questions
about the keys if you want, but for the most part they are a set of 100
unqiue keys, with two variable inputs and a few static inputs.

Jul 18 '05 #11

James Sapara

Here's some statistical data from the set of keys I posted. I didn't do
any pattern matching, but I think this shows that the keys are fairly
random in composition. I have also included stats for a much larger set
of keys. I noticed in the set of 100 that I didn't get any keys in the
fourth deviation, so I wanted to make sure such keys do exist. If you
can think of any other stats from the ord values of character I could do
that as well. Since I know the algo I also know that the sets of 5 are
only for readability.

Pattern matching based on Number/Alpha would probably show that over a
large set of keys it matchs the ratio of numbers to alpha's used in the
keys. You could do binary comparisons of the N,A's and try to find
common patterns.

Min/Max ord values ('0', 48) ('Z', 90)
Min Sum/Max Sum/Average Sum 1200 2250 1725
Average value of byte 69
Number of Keys 100
Average sum of keys 1745
Range of Key Sums 1564 to 1894
STD range -3 to 3
Squard Root of Key Sums 79.1896457878

Min/Max ord values ('0', 48) ('Z', 90)
Min Sum/Max Sum/Average Sum 1200 2250 1725
Average value of byte 69
Number of Keys 10000
Average sum of keys 1753
Range of Key Sums 1460 to 1977
STD range -4 to 4
Squard Root of Key Sums 75.0266619276

James

****** wrote:

I'm not a cryptanalysis guy, is there any value in doing this to the
strings, uploading this data to a DB and then searching for patterns:

;5V6XBTV6N65H7J3WWTWQ6H74B;5V6XB;TV6N6;5H7J3;WWTWQ ;6H74B;NANAAAANANNANANAAAAANANNA;NANAA;AANAN;NANAN ;AAAAA;NANNA;[53,
86, 54, 88, 66, 84, 86, 54, 78, 54, 53, 72, 55, 74, 51, 87, 87, 84,
87, 81, 54, 72, 55, 52, 66];[53, 86, 54, 88, 66];[84, 86, 54, 78,
54];[53, 72, 55, 74, 51];[87, 87, 84, 87, 81];[54, 72, 55, 52,
66];1733;347;356;305;426;299
;CBPPCLTJ5X1S8ZS5LVBZYRFVW;CBPPC;LTJ5X;1S8ZS;5LVBZ ;YRFVW;AAAAAAAANANANAANAAAAAAAAA;AAAAA;AAANA;NANAA ;NAAAA;AAAAA;[67,
66, 80, 80, 67, 76, 84, 74, 53, 88, 49, 83, 56, 90, 83, 53, 76, 86,
66, 90, 89, 82, 70, 86, 87];[67, 66, 80, 80, 67];[76, 84, 74, 53,
88];[49, 83, 56, 90, 83];[53, 76, 86, 66, 90];[89, 82, 70, 86,
87];1881;360;375;361;371;414
;QJCT6VXYLT2S9QZSQ02GMJD9S;QJCT6;VXYLT;2S9QZ;SQ02G ;MJD9S;AAAANAAAAANANAAAANAAAANA;AAAAN;AAAAA;NANAA; AANAA;AANA;[81,
74, 67, 84, 54, 86, 88, 89, 76, 84, 50, 83, 57, 81, 90, 83, 81, 79,
50, 71, 77, 74, 68, 57, 83];[81, 74, 67, 84, 54];[86, 88, 89, 76,
84];[50, 83, 57, 81, 90];[83, 81, 79, 50, 71];[77, 74, 68, 57,
83];1867;360;423;361;364;359

Jul 18 '05 #12

Andrew Dalke

James S wrote:

I got a few questions about the strings, so I figured I would share the
answers.

I decided to play around with it for a bit.
The first digit of any group of 5 has less diversity. Here's a
histogram of the distinct number of characters for each position.

0 0456BCDJKLQRSTYZ
1 0123456789BDFGHJKLMNPQRSTVWXYZ
2 023456789BCDFGHJKLMPQRSTVWXY
3 012345679BCDFGHJKLMNPQRSTVWXYZ
4 012345679BCDFGHJKNPQRSTXYZ
5 1235679BCGKLNPQTVXYZ
6 0123456789BCDFGHJKLMNPRSTVWXZ
7 0123456789CDFGHJKLMNPQSTVWXYZ
8 0123456789BCDFGHJKLMNPQRSTVWXY
9 0123456789BCFGHJKLMNPQRSTVWXYZ
10 01235678CDFGHKLMNPQSTWYZ
11 0123456789BCDFGHJKLMNPQRSTVWXYZ
12 0123456789BCDGHJKLMNPQRSTVWXYZ
13 0123456789BCDFGHJKLMNPQRSTVWXYZ
14 0123456789BCDFGHJKLMNPQRSTVWXYZ
15 0345678BDFHKLMNQSTVWY
16 0123456789BCDFGHJKLMNPQRSTVWXY
17 0123456789BCDGHJKLMNPQRSTVWXZ
18 012345679BCDFGHJKLMNPQRSTVWXYZ
19 0123456789BCDFGHJKLMNPQRSTVXYZ
20 046789BGHJKLMNRTVWXYZ
21 013456789BCDFGHJKLMNPQRSTVWXYZ
22 123456789BCDFGHJKLMNPQRSTVWYZ
23 0123456789BCDFHJKLMNPQRSTVWXYZ
24 012345678BCDFGHJKLMNPQRSTVWXYZ
There are 31 characters in the encoding. They are the
digits 0-9 and the consonants B-Z (including Y). I suspect
the vowels were omitted so no English words would be
generated by accident.

This gives a maximum space of 31**25 or
19232792489931358333837313998767870751
values.

Given
There are 28629150 valid keys in the space of 28629150^5
then that's
19232789130978948891037625982187500000
or smaller than 31**25 by
3358952409442799688016580370751

124 bits are needed to encode the full space because

2**123 < 28629150**5 < 2**124 True
However it's also the case that
2**123 < 31**25 < 2**124 True

This means at most a fraction of a bit (about 0.85)
is available for any checksum, since you say
There is a checksum encoded in the string that provides a basic check.
There is a unquice number encoded in each string that represents each
possible key.

If all possible keys in the space can be encoded by
this scheme then there's no extra space for a checksum.
I therefore don't know what that number means for this problem.

Supposing there is a checksum, it's probably the first digit
in the group. The loss of randomness is a clue -- the checksums
I know don't have that effect.
I can't tell if transpositions change the checksum. The only
anagram group is

#59 5SFHJ-GHP1B-FCSCQ-KBCQB-86J50
^^^^
#37 C9W27-N2SJ3-8HB1P-4VD0F-R9QXL
^^^^

and there may be a seed to the checksum which changes for
each group number, or the mapping from characters to numbers
can change based on the index in the string.

In any case, as others pointed out in this thread, if I really
had the code on my computer I would break out my disassembler
and start tracing code to reverse engineer it. Much easier
than trying to figure out the algorithm on these clues.
Andrew
da***@dalkescientific.com

Jul 18 '05 #13

Nick Craig-Wood

James S <ja**************@echoblast.com> wrote:

I got a few questions about the strings, so I figured I would share the
answers.

You wrote the algorithm that generated these strings?

Yes

How are the strings verified and proven to be legitimate?

There is a checksum encoded in the string that provides a basic check.
There is a unquice number encoded in each string that represents each
possible key.

There are 28629150 valid keys in the space of 28629150^5
ie about 24 bits of serial number in about 123 bits of key. If done
properly 123 bits of keys will ensure it isn't broken.
Does the same algorithm that created them verify them?

The same string is feed back through the algo in reverse. Providing the
origional information that generated it.

Here is my solution to the problem ;-) If your solution is anything
like this then it will never be broken! However if you keep the keys
to the algorithm in code you give to the user then it can be reverse
engineered of course.

$ python serial_key.py 314159265 141421356

Original serial 314159265
User Key is CRYX7T2FedtoJteoeeJx4MKiNvAgYSmKOB
Decoded serial number is 314159265
This is a valid key True

Original serial 141421356
User Key is CL33jINQiZo8xMqI0gnh3CCtutc1cXdfxH
Decoded serial number is 141421356
This is a valid key True
"""
Make a user key using md5

The secret keys must be kept secret, apart from that no harm in
publishing the algorithm

The user key is serial + md5(serial)

The serial numbers are fixed length strings and are encoded
unencrypted (but obfuscated) into the user key.

This is encoded into base 62 using str_to_long from ASCII to long then
long_to_str to get the long into base 62

This method is crypto secure IMHO (perhaps you'd better use SHA256 if
you are feeling really paranoid though!)

"""

import md5

SERIAL_LEN = 9
ASCII = "".join([chr(i) for i in range(256)])
KEYBASE = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvw xyz0123456789"
KEY1 = "secret key"
KEY2 = "another secret key"

def str_to_long(s, chars):
"Return a long representation of a string"
base = len(chars)
n = 0L
for c in s:
n *= base
n += chars.index(c)
return n

def long_to_str(n, chars):
"Return a string representation of a long"
base = len(chars)
s = []
while n:
n, c = divmod(n, base)
s.append(chars[c])
s.reverse()
return "".join(s)

def make_digest(serial):
"Makes a digest from a serial number and secret keys"
return md5.md5(KEY1 + serial + KEY2).digest()

def encode_key(serial, digest):
"Encode the serial + digest in human readable form - returns a string"
assert len(serial) == SERIAL_LEN
return long_to_str(str_to_long(serial + digest, ASCII), KEYBASE)

def decode_key(key):
"Returns a serial and digest decoded from a key"
decoded = long_to_str(str_to_long(key, KEYBASE), ASCII)
return decoded[:SERIAL_LEN], decoded[SERIAL_LEN:]

def check_digest(serial, digest):
"Returns bool whether serial matches digest"
return make_digest(serial) == digest

if __name__ == "__main__":
import sys
if len(sys.argv) <= 1:
print "Put some serials on the command line"
for serial in sys.argv[1:]:
serial = serial.ljust(SERIAL_LEN)[:SERIAL_LEN]
print
print "Original serial", serial
digest = make_digest(serial)
user_key = encode_key(serial, digest)
print "User Key is ", user_key

# Now decode user_key to show how its done
decoded_serial, decoded_digest = decode_key(user_key)
print "Decoded serial number is", decoded_serial
print "This is a valid key", check_digest(decoded_serial, decoded_digest)

--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick

Jul 18 '05 #14

Python for Reverse Engineering

Similar topics