473,837 Members | 1,771 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Python for Reverse Engineering

A friend of mine wrote an algorithm that generates strings. He says that
it's impossible to figure out exactly how the algorithm works. That may
be true, but I think if I had enough sample strings that I could write a
program to identify patterns in the strings and then produce strings
with similar patterns. He disagrees with me. He gave me 100 strings to
analyze.

I wrote a script to read each string and build a list of characters
present. I've discovered that he only uses 20 alpha/numeric chars to
generate the strings and that each string sums up to a value between
1000 and 1200 when I assign each char in the string its ASCII value.

What else might I do to analyze these things? As it stands now, I can
generate an acceptable string on ever 100th attempt or so, but I'd like
to do better than this.

-b

Jul 18 '05 #1
13 3585
Brad Tilley <rt*****@vt.edu > writes:
A friend of mine wrote an algorithm that generates strings. He says
that it's impossible to figure out exactly how the algorithm
works. That may be true, but I think if I had enough sample strings
that I could write a program to identify patterns in the strings and
then produce strings with similar patterns. He disagrees with me. He
gave me 100 strings to analyze.

I wrote a script to read each string and build a list of characters
present. I've discovered that he only uses 20 alpha/numeric chars to
generate the strings and that each string sums up to a value between
1000 and 1200 when I assign each char in the string its ASCII value.

What else might I do to analyze these things? As it stands now, I can
generate an acceptable string on ever 100th attempt or so, but I'd
like to do better than this.


Build a markov chain generator. Basically, build tuples of characters
of length N, and use those to index a dictionary containing a list (or
string) of characters that follow that list of characters in your
input strings. So, with N equal 2, and d[('a', 'b')] = ['c', 'd', 'e']
you'd know that the strings abc, abd and abe appeared in the text. To
gennerate a string, pull a random key from the dictionary, then choose
random characters from the list that the last N characters treated as
a tuple index in the dictionary. Larger values of N generate strings
that more closely resemble the input text.

Optionally, make the objects stored in the dictionary keep a count of
how many times each character appears, and choose one randomly
weighted by those counts.

I've got a markov chain generator that works on words if you really
want to see one.

<mike
--
Mike Meyer <mw*@mired.or g> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jul 18 '05 #2
Brad Tilley <rt*****@vt.edu > writes:
A friend of mine wrote an algorithm that generates strings. He says that
it's impossible to figure out exactly how the algorithm works. That may
be true, but I think if I had enough sample strings that I could write a
program to identify patterns in the strings and then produce strings
with similar patterns. He disagrees with me. He gave me 100 strings to
analyze.
What input, if any, does the algorithm take? What are the strings
used for? Care to post the 100 strings so others can have a look?
What else might I do to analyze these things? As it stands now, I can
generate an acceptable string on ever 100th attempt or so, but I'd like
to do better than this.


Reading about cryptanalysis might prove fruitful. I suspect the
number of "impossible to figure out" algorithms is far less than
the number of algorithms whose developers initially made that claim.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/
Jul 18 '05 #3
Michael Fuhr wrote:
Brad Tilley <rt*****@vt.edu > writes:
A friend of mine wrote an algorithm that generates strings. He says that
it's impossible to figure out exactly how the algorithm works. That may
be true, but I think if I had enough sample strings that I could write a
program to identify patterns in the strings and then produce strings
with similar patterns. He disagrees with me. He gave me 100 strings to
analyze.

What input, if any, does the algorithm take?


I don't know, but I'm guessing that there is none.
What are the strings used for?
To unlock an application he wrote (software key).
Care to post the 100 strings so others can have a look?
I'd rather not do that.
Reading about cryptanalysis might prove fruitful. I suspect the
number of "impossible to figure out" algorithms is far less than
the number of algorithms whose developers initially made that claim.


I suspect you're right.
Jul 18 '05 #4
On Fri, 05 Nov 2004 23:36:57 -0500, Brad Tilley wrote:
What are the strings used for?


To unlock an application he wrote (software key).


Ah.

Tell your friend the usual order of business when cracking these apps is
to find the conditional that does the final check for validity and
hard-code it to true (or false or whatever). Ask him if he really thinks
he can do better than massive companies like Electronic Arts and the other
large gaming companies, which still routinely experience 0-day (or less!)
warezing of their works.

The only remotely foolproof way to secure an app is to require an online
component, and to check that part. But few apps can really reasonably do
that, and customers *do* notice.

I say this mostly because people should not fool themselves into thinking
they can make this work, when hundreds or thousands of others can try. If
your software can't profit without that kind of protection, it probably
can't profit with it, either.

(I've read several rants from shareware developers bemoaning the piracy
rate and complaining that while people pirate their work, nobody pays for
it. Inevitably, I dig deeper and find they are peddling Yet Another FTP
program or the like (HTML editor, XML editor, text editor, simple image
editor, image viewer, macro program, etc.), *which the market has clearly
set a value of $0 for*. This is off-topic, but I'd check this too; you may
be protecting something of no value on the open market. I don't know, but
based on my readings and a quick browse through Tucows, the odds are
pretty decent.)

Jul 18 '05 #5
Care to post the 100 strings so others can have a look?
I'd rather not do that.


If your friend was worth his salt as an algorithm writer, he'd be more than
willing to let you post a mere 100 strings. For any good generator, this
would be paltry compared to the number of strings it would take to figure out
the algorithm.

But--he is probably making impossible claims here, anyway...
Reading about cryptanalysis might prove fruitful.


Perhaps your friend should pay heed to this as well.
James

--
James Stroud, Ph.D.
UCLA-DOE Institute for Genomics and Proteomics
611 Charles E. Young Dr. S.
MBI 205, UCLA 951570
Los Angeles CA 90095-1570
http://www.jamesstroud.com/
Jul 18 '05 #6

"James Stroud" <js*****@mbi.uc la.edu> wrote in message
news:ma******** *************** *************** @python.org...
Care to post the 100 strings so others can have a look?
I'd rather not do that.


If your friend was worth his salt as an algorithm writer, he'd be more

than willing to let you post a mere 100 strings. For any good generator, this
would be paltry compared to the number of strings it would take to figure out the algorithm.

But--he is probably making impossible claims here, anyway...


[snip]

Doesn't that depend on exactly what is meant by "figure out exactly how the
algorithm works"? If it means identify (with absolute certainty) the
algorithm used to generate the strings, then it surely can't be possible.

Duncan

Jul 18 '05 #7
Duncan Smith wrote:
If your friend was worth his salt as an algorithm writer, he'd be more...
But--he is probably making impossible claims here, anyway...

[snip]

Doesn't that depend on exactly what is meant by "figure out exactly how the
algorithm works"? If it means identify (with absolute certainty) the
algorithm used to generate the strings, then it surely can't be possible.

Duncan


That's my thought as well. I don't want to know exactly how the
algorithm generates strings. But, I think that if I analyze enough
strings I should know, on some level, what an acceptable string looks like.

Samba coders never see Microsoft's file and print sharing source code,
yet they are able to emulate an NT server quite well just by observing
packets.
Jul 18 '05 #8

"Brad Tilley" <rt*****@vt.edu > wrote in message
news:cm******** **@solaris.cc.v t.edu...
Duncan Smith wrote:
If your friend was worth his salt as an algorithm writer, he'd be more...But--he is probably making impossible claims here, anyway...
[snip]

Doesn't that depend on exactly what is meant by "figure out exactly how the algorithm works"? If it means identify (with absolute certainty) the
algorithm used to generate the strings, then it surely can't be possible.
Duncan


That's my thought as well. I don't want to know exactly how the
algorithm generates strings. But, I think that if I analyze enough
strings I should know, on some level, what an acceptable string looks

like.
Samba coders never see Microsoft's file and print sharing source code,
yet they are able to emulate an NT server quite well just by observing
packets.


Right, so you might be able to come up with something that produces similar
output. Do you know if the strings are generated independently? If so,
there must be some stochastic component (or unknown inputs) or the strings
would be identical. How about the frequencies of the characters? Are some
(significantly) more frequent than others? Do some characters follow others
with unusually high frequency? Do characters tend to cluster (more or less
that you'd expect from independently generated characters)? Unless the
strings are very long you probably can't answer these questions too reliably
with only 100 strings.

Markov Chains are a possibility (as already mentioned). I'd probably start
by looking at the simpler things first. The 1000 to 1200 sum might be a
clue, particularly if you're having trouble emulating it. Of course, if
this turns out to be some sort of code and you're looking at some encoded
text, then that's something I know little about, and the above might be next
to useless.

Maybe the basic statistical tests in Gary Strangman's stats.py would be
useful, unless you already have R and RPy?

Duncan
Jul 18 '05 #9
Well for the fun of it I'm gonna post 100 strings that are probably like
the strings referred to earlier. I'll answer a limited set of questions
about the keys if you want, but for the most part they are a set of 100
unqiue keys, with two variable inputs and a few static inputs.

#0 5V6XB-TV6N6-5H7J3-WWTWQ-6H74B
#1 CBPPC-LTJ5X-1S8ZS-5LVBZ-YRFVW
#2 QJCT6-VXYLT-2S9QZ-SQ02G-MJD9S
#3 SF46P-67SK3-6BWFD-BQ9Y0-XJ4J2
#4 ZR4VZ-BS499-TPMDT-Y2MBF-LN2VL
#5 YKR4K-ZNJ41-W2N2D-0G2FL-ZZSFK
#6 4ZBVQ-388NT-H7742-MM7N8-NYRSM
#7 468HK-3DX11-H13CD-MVPQL-N9G5K
#8 0W853-1V66Q-HPMHK-6H105-45LX1
#9 0XSRZ-1XW7N-H3LLK-620JR-4ND0J
#10 JF3XN-C7SCR-QBP0G-VQPJ2-0JG0G
#11 4BKKK-TZW8T-ZV713-80QB6-J8NV7
#12 R5W5D-V1YV8-83DRQ-DXTPX-9P7ZH
#13 C5S1Z-NJQMS-89JP0-45LVP-R7WY4
#14 CS3VY-L4SY9-1MPNM-50P0T-Y8GXX
#15 ZY7JC-P8KRK-K1511-80RZN-J8VQY
#16 KQ5D3-2F4P2-DRT19-347KH-K1R6Z
#17 KZ0ZH-QXC2Z-NH4WS-S9KDD-M0P7D
#18 5LLXY-G0GNY-F0GJW-KD1WH-8LL4Z
#19 5K7C3-G9VBQ-FBP3K-K43M5-81ZL1
#20 4TX4X-TJ8H6-Z5QTL-8F8RR-JSYCJ
#21 47LHR-TH59K-Z2Y8C-8XPL0-JPGD2
#22 DDK4B-YWD46-ZD123-3SGFQ-KX3FB
#23 4T8D6-346BS-HTM9C-MM17Z-NYL8W
#24 YTK41-ZH3H0-WKJTP-0Y3R9-ZWZCT
#25 JRH3Q-CF1HG-QYXMB-VRX4Y-0QTPP
#26 RL645-V06HF-807TF-DDTR1-9L7C8
#27 K66DH-Q3HPB-NJR1J-SH5KQ-M5C6B
#28 B6X7T-7RNKW-3S2MW-752K2-B7S6G
#29 CP30J-Y2SQB-STPBQ-H6PPB-WFGZ0
#30 4GPG5-TJGT4-Z11WX-8PWQP-JBM54
#31 K5Y7N-QD0K3-NV0M6-S70KD-MMD6D
#32 S0J7B-6MDXK-65VCV-BKVW1-XKF48
#33 DYGQK-YK1WP-ZKQNN-3C9L8-KD4DM
#34 5D6F1-G7VPC-F4H7F-K3H5M-8V9WR
#35 YQVX4-Z2YCF-W0607-0T6JF-Z3K0L
#36 6MHYP-50DQG-36NY4-T07GB-T8RM0
#37 C9W27-N2SJ3-8HB1P-4VD0F-R9QXL
#38 S0CXD-6MZ08-65X8Q-BKH6X-XK92H
#39 J5KC0-Q3HJP-GBWSC-5WXJV-YHT03
#40 JPRCN-QK7QP-G04G7-5GQF3-YZNFN
#41 C07RC-NGS0W-83G95-4444H-R15PZ
#42 4MJT3-3C192-HY369-MBJQH-N6H5Z
#43 LSVN2-2SYJC-LW6JM-QP617-7BK3F
#44 J100C-CBQQX-QVWBS-VTWPZ-03MZW
#45 T8PLF-XGTG8-WNSXX-L16VJ-GGKY5
#46 T9T3Y-X5YS9-WB04M-L9LHT-G0WTX
#47 J4VMP-Q79GS-GLY38-56JG5-YFHM1
#48 YG31S-BLSFJ-M8P0Y-BNPY4-X4GJV
#49 J45D0-C14BN-QXT9R-V977C-00R86
#50 JHXZ6-QF21M-GD189-5VCXK-Y9JBC
#51 KST2Q-QHL4T-NC8P2-SB868-M6Y2M
#52 4W3MS-G6ST5-7FPV6-YTPST-L3GKX
#53 SNWGZ-KC91N-Y445K-NY43R-VW5HJ
#54 Y8HWH-B310B-MXX2J-BQXLQ-XJTDB
#55 09H9B-1GDX6-HVNR3-6N73Q-44RHB
#56 6JRBR-59J85-34NP0-TJ216-TCS37
#57 S3B01-KP820-YZ72P-NC709-VDRXT
#58 L5VSF-21LLM-L3GJN-QXWGV-7PMM3
#59 5SFHJ-GHP1B-FCSCQ-KBCQB-86J50
#60 C7WG1-N9D6C-8DZ7F-40LJM-R8W0R
#61 JM04J-QSY3J-GX5GD-5M8Y7-YYYJF
#62 S7XLQ-K39TG-YQBNB-N3R5Y-VVVWP
#63 SV40X-KVSFL-YHWTY-NW9BK-VH4VC
#64 DY6YC-98HCK-Q1R61-F054N-H8CPY
#65 D7TXK-YGY0P-ZG08N-3FL68-KSW2M
#66 TWV4G-XV9VM-WPYKV-LHJ2G-G5H9S
#67 5GSBC-TLLMK-582F1-WNNCN-6481Y
#68 BRC6N-7SM63-3P5P6-725MD-BNCLD
#69 5QT9F-TSL88-5H8HX-WG8FJ-6ZYF5
#70 JSJ7J-C4D6B-QMVWQ-V0V7B-08F80
#71 KJ5V3-29HNQ-D4K4K-3JKN5-KCPS1
#72 BD6S1-77HL0-34RJP-735G9-BVCMT
#73 5DXDD-GW9BX-FDB9Z-KSR7L-8XV8K
#74 ZSDCJ-BHMBP-TCC3G-YBSMN-L61LY
#75 YKW9P-PLZKS-CV008-NH9S5-V54K1
#76 5BB3S-35M4J-PJZWY-7XKT4-BPPRV
#77 488XZ-TD88L-ZFV5B-8202S-JND9Q
#78 DXCG2-9KMDC-QC5XM-FR5F7-HQCFF
#79 J0M4B-CZTVK-QXDKV-VWZ21-0H698
#80 TWBL7-X6XGH-WFHX1-LTXV7-G3TYF
#81 Q99DN-VW5F2-2T64L-SY5PX-MWCZH
#82 ZZRHH-B8WRB-T7DVJ-YMDCQ-LYQ1B
#83 DV9BG-9H8MZ-QR1FL-FKMCS-HK21Q
#84 CJ0RQ-Y92LG-S4MBB-HJ6VY-WCKYP
#85 RWQ9X-66JML-0FG7Y-QTGRK-733CC
#86 CRKJ9-LS33V-1PJS5-5239S-YNZNQ
#87 C06Y5-LZHCF-1XR6F-5W541-YHCP8
#88 0PPLS-12TG5-HTSX6-666VT-4FKYX
#89 SR2MR-6FFG5-6YR30-BRRG6-XQVM7
#90 JDXYQ-Q4WYG-G9C7B-5NGVY-Y43YP
#91 40YL4-3M032-H505H-MK0J3-NKD0N
#92 JZ6K4-QZWG7-GQ3QK-58Z7J-YT685
#93 YKSL9-Z0WTV-WTLN5-0S05S-ZXDWQ
#94 57CL4-T3M32-5Q55H-W35J3-6VC0N
#95 CJ0PK-Y9251-S4MZD-HJ6BL-WCKVK
#96 C6G4K-LD1HP-11QTN-5V9R8-Y94CM
#97 LZSP2-287WC-L7BGM-QMBZ7-7YBQF
#98 YQ70R-PNLBY-C1SZQ-NJ7WT-VCR4X
#99 42LNF-GB358-71QSX-YFQQJ-LSN55

James
Duncan Smith wrote:

Right, so you might be able to come up with something that produces similar
output. Do you know if the strings are generated independently? If so,
there must be some stochastic component (or unknown inputs) or the strings
would be identical. How about the frequencies of the characters? Are some
(significantly) more frequent than others? Do some characters follow others
with unusually high frequency? Do characters tend to cluster (more or less
that you'd expect from independently generated characters)? Unless the
strings are very long you probably can't answer these questions too reliably
with only 100 strings.

Markov Chains are a possibility (as already mentioned). I'd probably start
by looking at the simpler things first. The 1000 to 1200 sum might be a
clue, particularly if you're having trouble emulating it. Of course, if
this turns out to be some sort of code and you're looking at some encoded
text, then that's something I know little about, and the above might be next
to useless.

Maybe the basic statistical tests in Gary Strangman's stats.py would be
useful, unless you already have R and RPy?

Duncan

--
-----------------------------------------------------------------------
James Sapara
Software Architect

Front Logic Inc. Tel: 306.653.2725 x14
Suite 300, Scotia Center Toll Free: 1.800.521.4510
111 Second Ave South Fax: 306.653.0972
Saskatoon, SK S7K 1K6
Canada
http://www.frontlogic.com ja***@frontlogi c.com

Find out what TYPENGO(tm) N300 Search Technology can do for your
company: http://www.frontlogic.com/interactiv...ngo/index.html
-----------------------------------------------------------------------
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
5736
by: Alan Sheehan | last post by:
Hi pythonistas, I am looking for methods of deploying applications with end users so that the python code is tamper proof. What are my options ? I understand I can supply .pyc or .pyo files but they can easily be reverse engineered I am told. Is it possible to load the scripts from zip files where the zip files are password protected ?
3
2213
by: Paul McGuire | last post by:
I just found out that my favorite UML modeling tool, Enterprise Architect, has just released a new version, *with* Python code support. You can download EA at http://www.sparxsystems.com.au. It is not free, but I think it is very reasonably priced for the features it provides (under $200 gives you good UML graphical modeling, including sequence and class diagrams, with code reverse engineering and round-tripping, vs. $000's for comparable...
75
4681
by: Xah Lee | last post by:
http://python.org/doc/2.4.1/lib/module-re.html http://python.org/doc/2.4.1/lib/node114.html --------- QUOTE The module defines several functions, constants, and an exception. Some of the functions are simplified versions of the full featured methods for compiled regular expressions. Most non-trivial applications always use the compiled form UNQUOTE
53
3573
by: john67 | last post by:
The company I work for is about to embark on developing a commercial application that will cost us tens-of-millions to develop. When all is said and done it will have thousands of business objects/classes, some of which will have hundreds-of-thousands of instances stored in a DB. Our clients will probably have somewhere between 50-200 users working on the app during the day, possibly in mutiple offices, and then a large number of batch...
8
675
by: xiao zhang yu | last post by:
me was sorry if this question are present before DotNet, no matter VB.Net or C# all they are compiled to IL, and yes, that IL will totally same as "open-sourse", every IL will easy to decompile and get the source nicety, although there have comeout with some "obfuscators" solution, but the structure still remain exactly same as the source after obfuscate. me unable to understand what the benefit will IL bring to me, is that benefit of...
159
13537
by: petantik | last post by:
Are there any commercial, or otherwise obfuscators for python source code or byte code and what are their relative advantages or disadvantages. I wonder because there are some byte code protection available for java and .NET, although from what i've read these seem to be not comprehensive as protection schemes
15
5093
by: Fady Anwar | last post by:
Hi while browsing the net i noticed that there is sites publishing some software that claim that it can decompile .net applications i didn't bleave it in fact but after trying it i was surprised that i could retrieve my code from my applications after i compile it so i need to know to prevent this from happening to my applications Thanx in advance
6
7386
by: Herby | last post by:
Hi, Im interested in Reverse Engineering C++ source code into a form more comprehensible than the source itself. I want to write a basic one myself, obviously i need to write a parser for the source code. Although this has some overlap with say a compiler it would also seem significantly different too.
7
5339
by: Gui | last post by:
Hi folks, I have unpacked an all-in-one exe produced by InstallShield. I've made the necessary changes and now I'm ready to repack the files. How can I do that? Which program should I use? For unpacking I used IsXunpack. But which one should I use to repack the files into an all-in-one exe again? Maybe I should use InstallShield itself, but how?
5
2713
by: xkenneth | last post by:
Hi All, I'll shortly be distributing a number of python applications that use proprietary. The software is part of a much larger system and it will need to be distributed securely. How can i achieve this? Regards, Ken
0
9846
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10897
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10638
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10280
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9419
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
7009
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5679
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5859
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4481
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.