By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,547 Members | 1,435 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,547 IT Pros & Developers. It's quick & easy.

Obfuscator, EXE, etc. - a solution

P: n/a
There have been many many many many discussions about obfuscating
python. To my dismay, most who answer are those who frequently post,
and they say things such as:
1) what's the point, in theory anything could eventually be decompiled
2) python is used for mostly internal stuff anyway, cuz its a "glue"
language, so why bother
3) use licensing and a good lawyer, it's the ONLY way
4) many programmers seem comfortable releasing their java and .net and
other interpreted code products into the market, so why not you?

I found most of these comments dismissive, and sometimes quite arrogant.
Frankly, the reasons why anyone would want to protect their code is
simple and should be observed because we are all programmers: we want to
protect our hard work.

Addressing the above points:
1) Anything could eventually be decompiled.... yes that's true. In a
perfect world. Have you ever tried to decompile C code and make sense
of it? Try a large C program. Good luck, you philosophers.
2) I don't see Python as merely a glue language. I see it as a serious
language for serious applications. Indeed, there are many commercial
examples of this, and Python works very well and is cost-efficient to
use. Incidentally, IBM and Microsoft have adopted Python for various
applications.... not that in itself should necessarily mean anything.
3) Using licensing and a good lawyer. I'm all for that! Now your code
has been stolen... and you are going to hire a lawyer to fight it out in
court. Months go by, maybe into years. The law offers no guarantees,
except to law makers. You've mortgaged your house to protect your
investment. If you win.
4) Others release their java and .net programs. Many obfuscate their
code before doing so, for the very same reasons a Python programmer
would want to do so.

I'm sick and tired of intelligent people acting like idiots.
Programmers should offer solutions, rather than anecdotal discussions
based on obvious points.

Here's my solution, it's not perfect, but it works well:
Use Pyrex, which translates your python sources (virtually unchanged) to
..c and then links them. You get natively compiled .pyd files (ie: dll),
just as though you had written a C program and compiled & linked it
yourself. I used this on all my source files except the one that starts
my program. I used py2exe (latest version) on the source file that
starts my program to create an EXE, and it also puts all my .pyd files
into the library.zip. The result is a program that is as difficult to
understand after decompile as a natively compiled C program, except for
the beginning source file (which should contain only a very small
fraction of your program logic anyway). I have done this on a
client-side python program that is composed of over 40 .py files and
from between 200 to 500 lines each file. It uses the wxPython widgets
for the GUI, Twisted for client/server communication, Pyro for
peer-to-peer communication, and the Crypto package for RSA public key
encryption. It runs without problems of any kind, especially ones that
may be related to the GUI or Twisted or Pyro or Crypto, and the increase
in speed of execution is very obvious.

Note on Pyrex: it can't handle "import *" and this addition construct "x
+= 1". So you may have to do a little bit of recoding, but that is all
the recoding I found that I had to do.

If you would like to discuss this constructively, email me at
ap**********@yahoo.com . I welcome a good programmer's discussion.

Jul 18 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Jason wrote:
There have been many many many many discussions about obfuscating
python. To my dismay, most who answer are those who frequently post,
and they say things such as:
1) what's the point, in theory anything could eventually be decompiled
2) python is used for mostly internal stuff anyway, cuz its a "glue"
language, so why bother
Haven't heard #2 much, btw.
3) use licensing and a good lawyer, it's the ONLY way
4) many programmers seem comfortable releasing their java and .net and
other interpreted code products into the market, so why not you?

I found most of these comments dismissive, and sometimes quite arrogant.
Frankly, the reasons why anyone would want to protect their code is
simple and should be observed because we are all programmers: we want to
protect our hard work.
I think you missed a very large class of responders who feel that some minor
obfuscation is okay - something that keeps the honest people honest - but that
anything beyond that is a waste of time because no wall you put up will be high
enough to keep out someone bent on cracking your code. So, e.g., if your build
process spits out all your .py files in a zip file that has been passed through
some simple encryption and you use a custom import hook, then you're good to
go.

IMO the real problem with so many of the schemes people have proposed is that
they place an inordinate burden on the developer for very little improvement in
"security". And THAT is what leads the discussion to some of the answers you
listed above. If an obfuscation scheme makes debugging too difficult, or makes
the programmer avoid a big set of useful features, etc, then it's not worth it
because at best it will improve security only slightly.

I'd wager that many people aren't opposed to obfuscation per se, they're
opposed to obfuscation whose cost is greater than the benefit, and that is the
case with 99% of the schemes suggested. That's the group I fall into - I think
code obfuscation is interesting but rarely worth it.

I'd like to add to your list:

5) Source code is much less valuable than many people think. With the exception
of stuff like code to read/write proprietary file formats or wildly efficient
implementations of certain algorithms, there's just not a lot of code out there
that, in itself, is all that special. I mean, I take pride in my work and don't
want to get ripped off, but I'm not fooling myself - it's stuff people could
figure out on their own (so it begs the question: specifically what program do
you have that is so innovative that warrants something more than a license and
perhaps the most trivial code obfuscation?)

Note that file formats and algorithms themselves are patentable and (setting
aside whether or not you agree that these types of patents are a Good Thing)
are even better than protecting the code because you're protected even in the
case of a clean room implementation.

Also, just getting your hands on the source code is a far cry from
understanding it well enough to maintain it, improve it, and extend it. In
order to do that reliably, you have to become quite familiar with the code, and
to do that you go through much of the same process one would go through to
write it from scratch in the first place (this is also why, for example, when a
developer leaves a company it's not uncommon for that developer's code to get
tossed/replaced soon - the company still has the code but it doesn't have the
knowledge and understanding that went into making the code, and re-acquiring
that knowledge requires about as much work as rewriting it does).

A common use case for code protection is for code that ensures the user has the
rights to use the software (e.g. you can't play the game unless you have a
valid CD key). This is an interesting one to me, but the abundance of cracks
and warez out there is a clear indication that hiding the source code does
little to hinder the crackers. The solutions that do work or at least provide
some relief apply to Python programs as well as they do to C programs.

6) When people talk about obfuscating their code, they don't really seem to
spend much time thinking about who exactly they are hiding it from. There are
people who will take your code to make money off it (businesses) and everyone
else. Businesses are generally _very_ careful about following license
agreements wrt code because of the liabilities - I'd LOVE it if a business
stole my code. To a business, a license is far bigger detriment than any code
hiding.

The non-business group of people who will not be stopped by simple code
obfuscation are also the people who can't really be beat by ANY obfuscation:
for any amount of time T that you're willing to spend creating schemes to hide
code from them, they are willing to spend some amount of time U to circumvent
it, where U >> T. Heck, for many of them it's a game. So you have to decide how
much time you're willing to spend in an arms race with people who you're not
going to make any money off anyway. It's not like these are would-be customers,
and most of the people who use their cracks aren't would-be customers either,
and hiding the source code does little to prevent cracks anyway.
1) Anything could eventually be decompiled.... yes that's true. In a
perfect world. Have you ever tried to decompile C code and make sense
of it?
It's non-trivial, but certainly not extremely hard, especially once you know
the compiler that was used (and that's often easy to determine / guess).
Try a large C program. Good luck, you philosophers.
Philosophy? Not needed - there are *excellent* commercial decompilers
available. That's why most software licenses explicitly forbid reverse
engineering and decompiling - because any Joe can find the tools to do it.
2) I don't see Python as merely a glue language.
Does anybody?
3) Using licensing and a good lawyer. I'm all for that! Now your code
has been stolen... and you are going to hire a lawyer to fight it out in
court.
Well, do you have a specific example of something that's steal-worthy? There is
so little steal-worthy code that it's the exception rather than the rule, so
for most people and most code this is an irrational scenario to fret about.
What, your code was the missing link in their business, and now they are raking
in the cash thanks to you? Not all that likely...

I'm not saying that there aren't cases where it shouldn't end up in a court
battle, just that in reality they'd be really rare, and when they did happen it
WOULD be worth it for you to stick it out to the end because (1) there would be
significant upside and (2) if they did steal your code and use it you'd
probably have a good chance of proving it.

Stealing code is extremely risky for a company (almost unlimited liability).
4) Others release their java and .net programs. Many obfuscate their
code before doing so, for the very same reasons a Python programmer
would want to do so.
But just because other people do it doesn't mean that (1) what they're doing
really makes sense or that (2) the benefit exceeds the cost or that (3) what
they are protecting is worth stealing.

(besides, if they're using Java or .Net then their judgement skills are already
suspect ;-) It's a joke. Laugh. )
Here's my solution, it's not perfect, but it works well:
Use Pyrex, which translates your python sources (virtually unchanged) to


Hey, if this works for you, more power to you - that's great that you've found
something that suits your needs.

For me, this is yet another example of a "solution" that puts way too much
burden on the developer while providing (at best) modest gains. Pyrex is
awesome for writing extensions, but last time I checked it didn't support (in
addition to what you listed) list comprehensions, generators, functions defined
inside functions, and Unicode. These are all things I'm willing to live without
when writing Pyrex extensions, but certainly not in general - they are features
that for me provide tangible benefit (in terms of productivity, code
maintainability, etc.), so avoiding them is an actual cost. The benefit over
e.g. a trivially-encrypted zip of the .pyc files is tiny, so the scheme doesn't
seem worth it.

Again though, if it works for you, great. I definitely wouldn't consider it a
widely applicable method of code obfuscation though.

-Dave
Jul 18 '05 #2

P: n/a

[Jason]
the increase in speed of execution is very obvious.


That's interesting - you mean you get a significant increase in speed even
without using Pyrex's 'cdef' feature?

(Not trying to hijack the thread - I mostly agree with your main point, but
I have nothing to add.)

--
Richie Hindle
ri****@entrian.com
Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.