473,837 Members | 1,771 Online

# Python for Reverse Engineering

A friend of mine wrote an algorithm that generates strings. He says that
it's impossible to figure out exactly how the algorithm works. That may
be true, but I think if I had enough sample strings that I could write a
program to identify patterns in the strings and then produce strings
with similar patterns. He disagrees with me. He gave me 100 strings to
analyze.

I wrote a script to read each string and build a list of characters
present. I've discovered that he only uses 20 alpha/numeric chars to
generate the strings and that each string sums up to a value between
1000 and 1200 when I assign each char in the string its ASCII value.

What else might I do to analyze these things? As it stands now, I can
generate an acceptable string on ever 100th attempt or so, but I'd like
to do better than this.

-b

Jul 18 '05 #1
13 3585
A friend of mine wrote an algorithm that generates strings. He says
that it's impossible to figure out exactly how the algorithm
works. That may be true, but I think if I had enough sample strings
that I could write a program to identify patterns in the strings and
then produce strings with similar patterns. He disagrees with me. He
gave me 100 strings to analyze.

I wrote a script to read each string and build a list of characters
present. I've discovered that he only uses 20 alpha/numeric chars to
generate the strings and that each string sums up to a value between
1000 and 1200 when I assign each char in the string its ASCII value.

What else might I do to analyze these things? As it stands now, I can
generate an acceptable string on ever 100th attempt or so, but I'd
like to do better than this.

Build a markov chain generator. Basically, build tuples of characters
of length N, and use those to index a dictionary containing a list (or
string) of characters that follow that list of characters in your
input strings. So, with N equal 2, and d[('a', 'b')] = ['c', 'd', 'e']
you'd know that the strings abc, abd and abe appeared in the text. To
gennerate a string, pull a random key from the dictionary, then choose
random characters from the list that the last N characters treated as
a tuple index in the dictionary. Larger values of N generate strings
that more closely resemble the input text.

Optionally, make the objects stored in the dictionary keep a count of
how many times each character appears, and choose one randomly
weighted by those counts.

I've got a markov chain generator that works on words if you really
want to see one.

<mike
--
Mike Meyer <mw*@mired.or g> http://www.mired.org/home/mwm/
Jul 18 '05 #2
A friend of mine wrote an algorithm that generates strings. He says that
it's impossible to figure out exactly how the algorithm works. That may
be true, but I think if I had enough sample strings that I could write a
program to identify patterns in the strings and then produce strings
with similar patterns. He disagrees with me. He gave me 100 strings to
analyze.
What input, if any, does the algorithm take? What are the strings
used for? Care to post the 100 strings so others can have a look?
What else might I do to analyze these things? As it stands now, I can
generate an acceptable string on ever 100th attempt or so, but I'd like
to do better than this.

number of "impossible to figure out" algorithms is far less than
the number of algorithms whose developers initially made that claim.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/
Jul 18 '05 #3
Michael Fuhr wrote:
A friend of mine wrote an algorithm that generates strings. He says that
it's impossible to figure out exactly how the algorithm works. That may
be true, but I think if I had enough sample strings that I could write a
program to identify patterns in the strings and then produce strings
with similar patterns. He disagrees with me. He gave me 100 strings to
analyze.

What input, if any, does the algorithm take?

I don't know, but I'm guessing that there is none.
What are the strings used for?
To unlock an application he wrote (software key).
Care to post the 100 strings so others can have a look?
I'd rather not do that.
number of "impossible to figure out" algorithms is far less than
the number of algorithms whose developers initially made that claim.

I suspect you're right.
Jul 18 '05 #4
On Fri, 05 Nov 2004 23:36:57 -0500, Brad Tilley wrote:
What are the strings used for?

To unlock an application he wrote (software key).

Ah.

Tell your friend the usual order of business when cracking these apps is
to find the conditional that does the final check for validity and
hard-code it to true (or false or whatever). Ask him if he really thinks
he can do better than massive companies like Electronic Arts and the other
large gaming companies, which still routinely experience 0-day (or less!)
warezing of their works.

The only remotely foolproof way to secure an app is to require an online
component, and to check that part. But few apps can really reasonably do
that, and customers *do* notice.

I say this mostly because people should not fool themselves into thinking
they can make this work, when hundreds or thousands of others can try. If
your software can't profit without that kind of protection, it probably
can't profit with it, either.

(I've read several rants from shareware developers bemoaning the piracy
rate and complaining that while people pirate their work, nobody pays for
it. Inevitably, I dig deeper and find they are peddling Yet Another FTP
program or the like (HTML editor, XML editor, text editor, simple image
editor, image viewer, macro program, etc.), *which the market has clearly
set a value of \$0 for*. This is off-topic, but I'd check this too; you may
be protecting something of no value on the open market. I don't know, but
based on my readings and a quick browse through Tucows, the odds are
pretty decent.)

Jul 18 '05 #5
Care to post the 100 strings so others can have a look?
I'd rather not do that.

If your friend was worth his salt as an algorithm writer, he'd be more than
willing to let you post a mere 100 strings. For any good generator, this
would be paltry compared to the number of strings it would take to figure out
the algorithm.

But--he is probably making impossible claims here, anyway...

Perhaps your friend should pay heed to this as well.
James

--
James Stroud, Ph.D.
UCLA-DOE Institute for Genomics and Proteomics
611 Charles E. Young Dr. S.
MBI 205, UCLA 951570
Los Angeles CA 90095-1570
http://www.jamesstroud.com/
Jul 18 '05 #6

"James Stroud" <js*****@mbi.uc la.edu> wrote in message
news:ma******** *************** *************** @python.org...
Care to post the 100 strings so others can have a look?
I'd rather not do that.

If your friend was worth his salt as an algorithm writer, he'd be more

than willing to let you post a mere 100 strings. For any good generator, this
would be paltry compared to the number of strings it would take to figure out the algorithm.

But--he is probably making impossible claims here, anyway...

[snip]

Doesn't that depend on exactly what is meant by "figure out exactly how the
algorithm works"? If it means identify (with absolute certainty) the
algorithm used to generate the strings, then it surely can't be possible.

Duncan

Jul 18 '05 #7
Duncan Smith wrote:
If your friend was worth his salt as an algorithm writer, he'd be more...
But--he is probably making impossible claims here, anyway...

[snip]

Doesn't that depend on exactly what is meant by "figure out exactly how the
algorithm works"? If it means identify (with absolute certainty) the
algorithm used to generate the strings, then it surely can't be possible.

Duncan

That's my thought as well. I don't want to know exactly how the
algorithm generates strings. But, I think that if I analyze enough
strings I should know, on some level, what an acceptable string looks like.

Samba coders never see Microsoft's file and print sharing source code,
yet they are able to emulate an NT server quite well just by observing
packets.
Jul 18 '05 #8

"Brad Tilley" <rt*****@vt.edu > wrote in message
news:cm******** **@solaris.cc.v t.edu...
Duncan Smith wrote:
If your friend was worth his salt as an algorithm writer, he'd be more...But--he is probably making impossible claims here, anyway...
[snip]

Doesn't that depend on exactly what is meant by "figure out exactly how the algorithm works"? If it means identify (with absolute certainty) the
algorithm used to generate the strings, then it surely can't be possible.
Duncan

That's my thought as well. I don't want to know exactly how the
algorithm generates strings. But, I think that if I analyze enough
strings I should know, on some level, what an acceptable string looks

like.
Samba coders never see Microsoft's file and print sharing source code,
yet they are able to emulate an NT server quite well just by observing
packets.

Right, so you might be able to come up with something that produces similar
output. Do you know if the strings are generated independently? If so,
there must be some stochastic component (or unknown inputs) or the strings
would be identical. How about the frequencies of the characters? Are some
(significantly) more frequent than others? Do some characters follow others
with unusually high frequency? Do characters tend to cluster (more or less
that you'd expect from independently generated characters)? Unless the
strings are very long you probably can't answer these questions too reliably
with only 100 strings.

Markov Chains are a possibility (as already mentioned). I'd probably start
by looking at the simpler things first. The 1000 to 1200 sum might be a
clue, particularly if you're having trouble emulating it. Of course, if
this turns out to be some sort of code and you're looking at some encoded
text, then that's something I know little about, and the above might be next
to useless.

Maybe the basic statistical tests in Gary Strangman's stats.py would be
useful, unless you already have R and RPy?

Duncan
Jul 18 '05 #9
Well for the fun of it I'm gonna post 100 strings that are probably like
the strings referred to earlier. I'll answer a limited set of questions
about the keys if you want, but for the most part they are a set of 100
unqiue keys, with two variable inputs and a few static inputs.

#0 5V6XB-TV6N6-5H7J3-WWTWQ-6H74B
#1 CBPPC-LTJ5X-1S8ZS-5LVBZ-YRFVW
#2 QJCT6-VXYLT-2S9QZ-SQ02G-MJD9S
#3 SF46P-67SK3-6BWFD-BQ9Y0-XJ4J2
#4 ZR4VZ-BS499-TPMDT-Y2MBF-LN2VL
#5 YKR4K-ZNJ41-W2N2D-0G2FL-ZZSFK
#6 4ZBVQ-388NT-H7742-MM7N8-NYRSM
#7 468HK-3DX11-H13CD-MVPQL-N9G5K
#8 0W853-1V66Q-HPMHK-6H105-45LX1
#9 0XSRZ-1XW7N-H3LLK-620JR-4ND0J
#10 JF3XN-C7SCR-QBP0G-VQPJ2-0JG0G
#11 4BKKK-TZW8T-ZV713-80QB6-J8NV7
#12 R5W5D-V1YV8-83DRQ-DXTPX-9P7ZH
#13 C5S1Z-NJQMS-89JP0-45LVP-R7WY4
#14 CS3VY-L4SY9-1MPNM-50P0T-Y8GXX
#15 ZY7JC-P8KRK-K1511-80RZN-J8VQY
#16 KQ5D3-2F4P2-DRT19-347KH-K1R6Z
#17 KZ0ZH-QXC2Z-NH4WS-S9KDD-M0P7D
#18 5LLXY-G0GNY-F0GJW-KD1WH-8LL4Z
#19 5K7C3-G9VBQ-FBP3K-K43M5-81ZL1
#20 4TX4X-TJ8H6-Z5QTL-8F8RR-JSYCJ
#21 47LHR-TH59K-Z2Y8C-8XPL0-JPGD2
#22 DDK4B-YWD46-ZD123-3SGFQ-KX3FB
#23 4T8D6-346BS-HTM9C-MM17Z-NYL8W
#24 YTK41-ZH3H0-WKJTP-0Y3R9-ZWZCT
#25 JRH3Q-CF1HG-QYXMB-VRX4Y-0QTPP
#26 RL645-V06HF-807TF-DDTR1-9L7C8
#27 K66DH-Q3HPB-NJR1J-SH5KQ-M5C6B
#28 B6X7T-7RNKW-3S2MW-752K2-B7S6G
#29 CP30J-Y2SQB-STPBQ-H6PPB-WFGZ0
#30 4GPG5-TJGT4-Z11WX-8PWQP-JBM54
#31 K5Y7N-QD0K3-NV0M6-S70KD-MMD6D
#32 S0J7B-6MDXK-65VCV-BKVW1-XKF48
#33 DYGQK-YK1WP-ZKQNN-3C9L8-KD4DM
#34 5D6F1-G7VPC-F4H7F-K3H5M-8V9WR
#35 YQVX4-Z2YCF-W0607-0T6JF-Z3K0L
#36 6MHYP-50DQG-36NY4-T07GB-T8RM0
#37 C9W27-N2SJ3-8HB1P-4VD0F-R9QXL
#38 S0CXD-6MZ08-65X8Q-BKH6X-XK92H
#39 J5KC0-Q3HJP-GBWSC-5WXJV-YHT03
#40 JPRCN-QK7QP-G04G7-5GQF3-YZNFN
#41 C07RC-NGS0W-83G95-4444H-R15PZ
#42 4MJT3-3C192-HY369-MBJQH-N6H5Z
#43 LSVN2-2SYJC-LW6JM-QP617-7BK3F
#44 J100C-CBQQX-QVWBS-VTWPZ-03MZW
#45 T8PLF-XGTG8-WNSXX-L16VJ-GGKY5
#46 T9T3Y-X5YS9-WB04M-L9LHT-G0WTX
#47 J4VMP-Q79GS-GLY38-56JG5-YFHM1
#48 YG31S-BLSFJ-M8P0Y-BNPY4-X4GJV
#49 J45D0-C14BN-QXT9R-V977C-00R86
#50 JHXZ6-QF21M-GD189-5VCXK-Y9JBC
#51 KST2Q-QHL4T-NC8P2-SB868-M6Y2M
#52 4W3MS-G6ST5-7FPV6-YTPST-L3GKX
#53 SNWGZ-KC91N-Y445K-NY43R-VW5HJ
#54 Y8HWH-B310B-MXX2J-BQXLQ-XJTDB
#55 09H9B-1GDX6-HVNR3-6N73Q-44RHB
#56 6JRBR-59J85-34NP0-TJ216-TCS37
#57 S3B01-KP820-YZ72P-NC709-VDRXT
#58 L5VSF-21LLM-L3GJN-QXWGV-7PMM3
#59 5SFHJ-GHP1B-FCSCQ-KBCQB-86J50
#60 C7WG1-N9D6C-8DZ7F-40LJM-R8W0R
#61 JM04J-QSY3J-GX5GD-5M8Y7-YYYJF
#62 S7XLQ-K39TG-YQBNB-N3R5Y-VVVWP
#63 SV40X-KVSFL-YHWTY-NW9BK-VH4VC
#64 DY6YC-98HCK-Q1R61-F054N-H8CPY
#65 D7TXK-YGY0P-ZG08N-3FL68-KSW2M
#66 TWV4G-XV9VM-WPYKV-LHJ2G-G5H9S
#67 5GSBC-TLLMK-582F1-WNNCN-6481Y
#68 BRC6N-7SM63-3P5P6-725MD-BNCLD
#69 5QT9F-TSL88-5H8HX-WG8FJ-6ZYF5
#70 JSJ7J-C4D6B-QMVWQ-V0V7B-08F80
#71 KJ5V3-29HNQ-D4K4K-3JKN5-KCPS1
#72 BD6S1-77HL0-34RJP-735G9-BVCMT
#73 5DXDD-GW9BX-FDB9Z-KSR7L-8XV8K
#74 ZSDCJ-BHMBP-TCC3G-YBSMN-L61LY
#75 YKW9P-PLZKS-CV008-NH9S5-V54K1
#76 5BB3S-35M4J-PJZWY-7XKT4-BPPRV
#77 488XZ-TD88L-ZFV5B-8202S-JND9Q
#78 DXCG2-9KMDC-QC5XM-FR5F7-HQCFF
#79 J0M4B-CZTVK-QXDKV-VWZ21-0H698
#80 TWBL7-X6XGH-WFHX1-LTXV7-G3TYF
#81 Q99DN-VW5F2-2T64L-SY5PX-MWCZH
#82 ZZRHH-B8WRB-T7DVJ-YMDCQ-LYQ1B
#83 DV9BG-9H8MZ-QR1FL-FKMCS-HK21Q
#84 CJ0RQ-Y92LG-S4MBB-HJ6VY-WCKYP
#85 RWQ9X-66JML-0FG7Y-QTGRK-733CC
#86 CRKJ9-LS33V-1PJS5-5239S-YNZNQ
#87 C06Y5-LZHCF-1XR6F-5W541-YHCP8
#88 0PPLS-12TG5-HTSX6-666VT-4FKYX
#89 SR2MR-6FFG5-6YR30-BRRG6-XQVM7
#90 JDXYQ-Q4WYG-G9C7B-5NGVY-Y43YP
#91 40YL4-3M032-H505H-MK0J3-NKD0N
#92 JZ6K4-QZWG7-GQ3QK-58Z7J-YT685
#93 YKSL9-Z0WTV-WTLN5-0S05S-ZXDWQ
#94 57CL4-T3M32-5Q55H-W35J3-6VC0N
#95 CJ0PK-Y9251-S4MZD-HJ6BL-WCKVK
#96 C6G4K-LD1HP-11QTN-5V9R8-Y94CM
#97 LZSP2-287WC-L7BGM-QMBZ7-7YBQF
#98 YQ70R-PNLBY-C1SZQ-NJ7WT-VCR4X
#99 42LNF-GB358-71QSX-YFQQJ-LSN55

James
Duncan Smith wrote:

Right, so you might be able to come up with something that produces similar
output. Do you know if the strings are generated independently? If so,
there must be some stochastic component (or unknown inputs) or the strings
would be identical. How about the frequencies of the characters? Are some
(significantly) more frequent than others? Do some characters follow others
with unusually high frequency? Do characters tend to cluster (more or less
that you'd expect from independently generated characters)? Unless the
strings are very long you probably can't answer these questions too reliably
with only 100 strings.

Markov Chains are a possibility (as already mentioned). I'd probably start
by looking at the simpler things first. The 1000 to 1200 sum might be a
clue, particularly if you're having trouble emulating it. Of course, if
this turns out to be some sort of code and you're looking at some encoded
text, then that's something I know little about, and the above might be next
to useless.

Maybe the basic statistical tests in Gary Strangman's stats.py would be
useful, unless you already have R and RPy?

Duncan

--
-----------------------------------------------------------------------
James Sapara
Software Architect

Front Logic Inc. Tel: 306.653.2725 x14
Suite 300, Scotia Center Toll Free: 1.800.521.4510
111 Second Ave South Fax: 306.653.0972
http://www.frontlogic.com ja***@frontlogi c.com

Find out what TYPENGO(tm) N300 Search Technology can do for your
company: http://www.frontlogic.com/interactiv...ngo/index.html
-----------------------------------------------------------------------
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.