How protect proprietary Python code? (bytecode obfuscation?, what better?)

seberino

How can a proprietary software developer protect their Python code?
People often ask me about obfuscating Python bytecode. They don't want
people to easily decompile their proprietary Python app.

I suppose another idea is to rewrite entire Python app in C if compiled
C code
is harder to decompile.

Any ideas?

Apr 17 '06 #1

Subscribe Post Reply

19617

Terry Reedy

<se******@spawar.navy.mil> wrote in message
news:11*********************@v46g2000cwv.googlegro ups.com...

How can a proprietary software developer protect their Python code?
People often ask me about obfuscating Python bytecode. They don't want
people to easily decompile their proprietary Python app.

I suppose another idea is to rewrite entire Python app in C if compiled
C code
is harder to decompile.

Any ideas?

Go to Google's newsgroup archives for c.l.p (accessible via google.com) and
search for some of the numerous past threads on this issue, which give
several ideas and viewpoints. There may or may not also be something in
the Python FAQ or Wiki at python.com.

Apr 17 '06 #2

gangesmaster

well, you can do something silly: create a c file into which you embed
your code, ie.,

#include<python.h>

char code[] = "print 'hello moshe'";

void main(...)
{
Py_ExecString(code);
}

then you can compile the C file into an object file, and use regular
obfuscators/anti-debuggers. of course people who really want to get the
source will be able to do so, but it will take more time. and isn't
that
the big idea of using obfuscation?

but anyway, it's stupid. why be a dick? those who *really* want to get
to the source will be able to, no matter what you use. after all, the
code is executing on their CPU, and if the CPU can execute it, so
can really enthused men. and those who don't want to use your product,
don't care anyway if you provide the source or not. so share.
-tomer

Apr 17 '06 #3

Serge Orlov

se******@spawar.navy.mil wrote:

How can a proprietary software developer protect their Python code?
People often ask me about obfuscating Python bytecode. They don't want
people to easily decompile their proprietary Python app.

I suppose another idea is to rewrite entire Python app in C if compiled
C code
is harder to decompile.

Any ideas?

Shuffle opcode values in random order, recompile Python, recompile
stdlib, recompile py2exe (or whatever you use for bundling). It will
keep attacker busy for several hours

Apr 17 '06 #4

Alex Martelli

gangesmaster <to*********@gmail.com> wrote:
...

but anyway, it's stupid. why be a dick? those who *really* want to get
to the source will be able to, no matter what you use. after all, the
code is executing on their CPU, and if the CPU can execute it, so
can really enthused men. and those who don't want to use your product,
don't care anyway if you provide the source or not. so share.

Alternatively, if you have secrets that are REALLY worth protecting,
keep a tiny part of your app, embedding all worthwhile secrets, on YOUR
well-secured server -- expose it as a webservice, or whatever, so the
"fat client" (most of the app) can get at it. This truly gives you
complete control: you don't care any more if anybody decompiles the part
you distribute (which may be 90% or 99% of the app), indeed you can
publish the webservice's specs or some API to encourage more and more
people to write to it, and make your money by whatever business model
you prefer (subscription, one-off sale, pay-per-use, your choice!). If
you keep your client thin rather than fat, the advantages increase (your
app can be used much more widely, etc), but you may need substantial
amounts of servers and other resources to support widespread use.

When I started proposing this approach, years and years ago, the fact
that your app can work only when connected to the net might be
considered a real problem for many cases: but today, connectivity is SO
pervasive, that all sort of apps require such connectivity anyway --
e.g, look at Google Earth for a "fat client", Google Maps for a "thin"
one accessing a subset of roughly the same data but running (the client
side) inside a browser (with more limited functionality, to be sure).
Alex

Apr 18 '06 #5

Daniel Nogradi

> #include<python.h>

char code[] = "print 'hello moshe'";

void main(...)
{
Py_ExecString(code);
}

I don't get this, with python 2.4 there is no function called
Py_ExecString in any of the header files. I found something that might
do the job PyRun_SimpleString( ) in pythonrun.h, but couldn't get it
to work either. So what is really the way to execute python code in a
string from a C program?

Apr 18 '06 #6

gangesmaster

okay, i got the name wrong. i wasn't trying to provide production-level
code, just a snippet. the function you want is
PyRun_SimpleString( const char *command)

#include <python.h>

char secret_code[] = "print 'moshe'";

int main()
{
return PyRun_SimpleString(secret_code);
}

and you need to link with python24.lib or whatever the object file is
for your platform.

-tomer

Apr 18 '06 #7

Daniel Nogradi

> #include <python.h>

char secret_code[] = "print 'moshe'";

int main()
{
return PyRun_SimpleString(secret_code);
}

and you need to link with python24.lib or whatever the object file is
for your platform.

Are you sure? On a linux platform I tried linking with libpython2.4.so
(I assume this is the correct object file) but it segfaults in
PyImport_GetModuleDict( ).

Apr 18 '06 #8

Fredrik Lundh

"Daniel Nogradi" wrote:

char secret_code[] = "print 'moshe'";

int main()
{
return PyRun_SimpleString(secret_code);
}

and you need to link with python24.lib or whatever the object file is
for your platform.

Are you sure? On a linux platform I tried linking with libpython2.4.so
(I assume this is the correct object file) but it segfaults in
PyImport_GetModuleDict( ).

I still don't understand why you think that embedding the *source code* in a variable
named "secret" will do a better job than just putting the byte code in some non-obvious
packaging, but if you insist on embedding the code, reading the documentation might
help:

http://docs.python.org/ext/embedding.html
"At the very least, you have to call the function Py_Initialize()"

http://docs.python.org/ext/high-level-embedding.html
(minimal PyRun_SimpleString example)

</F>

Apr 18 '06 #9

Daniel Nogradi

> >> char secret_code[] = "print 'moshe'";

int main()
{
return PyRun_SimpleString(secret_code);
}

and you need to link with python24.lib or whatever the object file is
for your platform.

Are you sure? On a linux platform I tried linking with libpython2.4.so
(I assume this is the correct object file) but it segfaults in
PyImport_GetModuleDict( ).

I still don't understand why you think that embedding the *source code* in a
variable
named "secret" will do a better job than just putting the byte code in some
non-obvious
packaging, but if you insist on embedding the code, reading the
documentation might
help:

http://docs.python.org/ext/embedding.html
"At the very least, you have to call the function Py_Initialize()"

http://docs.python.org/ext/high-level-embedding.html
(minimal PyRun_SimpleString example)

Well, I was not the original poster in this thread I just picked up
the idea of executing python code that is assigned to a string from
within C and tried to do it with no particular goal, that's all. And
thanks a lot for the links, the docs are pretty clear, I should have
checked them before....

Apr 18 '06 #10

bruno at modulix

se******@spawar.navy.mil wrote:

How can a proprietary software developer protect their Python code?
People often ask me about obfuscating Python bytecode. They don't want
people to easily decompile their proprietary Python app.
Do they ask the same thing for Java or .NET apps ?-)
I suppose another idea is to rewrite entire Python app in C if compiled
C code
is harder to decompile.
Do you really think "native" code is harder to reverse-engineer than
Python's byte-code ?
Any ideas?

I'm afraid that the only *proven* way to protect code from
reverse-engineering is to not distribute it *at all*.
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"

Apr 18 '06 #11

Richard Brodie

"bruno at modulix" <on***@xiludom.gro> wrote in message
news:44**********************@news.free.fr...

Do they ask the same thing for Java or .NET apps ?-)

If you Google for "bytecode obfuscation", you'll find a large number
of products already exist for Java and .Net

Apr 18 '06 #12

Fredrik Lundh

Richard Brodie wrote:

Do they ask the same thing for Java or .NET apps ?-)

If you Google for "bytecode obfuscation", you'll find a large number
of products already exist for Java and .Net

and if you google for "python obfuscator", you'll find tools for python. including
tools that use "psychologically inspired techniques to produce extra confusion in
human readers" (probably by inserting small snippets of Perl here and there...).

</F>

Apr 18 '06 #13

Ben Sizer

bruno at modulix wrote:

se******@spawar.navy.mil wrote:
I suppose another idea is to rewrite entire Python app in C if compiled
C code
is harder to decompile.

Do you really think "native" code is harder to reverse-engineer than
Python's byte-code ?

Yes, until there's a native code equivalent of "import dis" that
telepathically contacts the original programmer to obtain variable
names that aren't in the executable.

--
Ben Sizer

Apr 19 '06 #14

bruno at modulix

Ben Sizer wrote:

bruno at modulix wrote:
se******@spawar.navy.mil wrote:
I suppose another idea is to rewrite entire Python app in C if compiled
C code
is harder to decompile.

Do you really think "native" code is harder to reverse-engineer than
Python's byte-code ?

Yes, until there's a native code equivalent of "import dis" that
telepathically contacts the original programmer to obtain variable
names that aren't in the executable.

Lol !-)

Ok, granted. Let's rephrase it:
"do you really think that native code is harder *enough* to
reverse-engineer ?"

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"

Apr 19 '06 #15

Ben Sizer

bruno at modulix wrote:

Let's rephrase it:
"do you really think that native code is harder *enough* to
reverse-engineer ?"

I don't know. In terms of copy protection, popular off-the-shelf
software is going to get cracked whether it's written in Python or x86
ASM, that much is true. But in terms of perhaps protecting innovative
algorithms from competitors, or something similar, compilation into
native code does a great job of hiding your work. Not a perfect job,
but a good enough job.

I know some people talk a lot about using web services to keep the
proprietary data behind a secure server, but there is a large number of
applications where this is not practical - eg. image/audio processing,
computer games, artificial intelligence, or several other applications
with heavy real-time or cpu-intensive requirements, or embedded systems
that don't have web access.

Perhaps the inclusion of ctypes will make it more practical to migrate
any sensitive code into native code libraries.

--
Ben Sizer

Apr 20 '06 #16

Alex Martelli

Ben Sizer <ky*****@gmail.com> wrote:

bruno at modulix wrote:
Let's rephrase it:
"do you really think that native code is harder *enough* to
reverse-engineer ?"
I don't know. In terms of copy protection, popular off-the-shelf
software is going to get cracked whether it's written in Python or x86
ASM, that much is true. But in terms of perhaps protecting innovative
algorithms from competitors, or something similar, compilation into
native code does a great job of hiding your work. Not a perfect job,
but a good enough job.

If they're truly worth protecting, they're worth reverse engineering.

Remember, the competition includes excellent programmers working in
countries where $10 an hour's salary is luxury and IP law enforcements
non-existent, so the cost to reveng is not as high as you might think.

I know some people talk a lot about using web services to keep the
proprietary data behind a secure server, but there is a large number of
Ah yes, that would be me;-). Except that I don't limit my advice to
proprietary DATA -- it also applies to CODE worth keeping secret.
applications where this is not practical - eg. image/audio processing,
computer games, artificial intelligence, or several other applications
with heavy real-time or cpu-intensive requirements, or embedded systems
that don't have web access.
Fewer and fewer systems "intrinsically lack" net access. For example,
good (costly) computer games more and more need net access to be played
in the best way (multiplayer etc).

"CPU intensive" is a weird reason to want to avoid keeping in a well
protected environment any code that's really worth money -- if it IS
worth that much you're no doubt charging enough for it to afford
supplying the CPU power to your customers (whatever your business model,
say pay-per-use or subscription levels with different maxima, etc etc).

Perhaps the inclusion of ctypes will make it more practical to migrate
any sensitive code into native code libraries.

Naah, ctypes shines when you access *pre-existing* dynamic libraries; if
you're building those libraries yourself, it makes more sense to make
them immediately usable from Python, e.g. via Pyrex, or SWIG, or SIP, or
the C API, etc, etc. And if your secrets are truly valuable, none of
those will really help keep them safe.

If your secrets are worth diddlysquat, and the only reason to "protect"
them is (e.g.) to keep some PHB happy (relying on the fact that he or
she has no clue as to reality anyway), then go ahead -- use a Caesar
cypher (as a just-arrested Mafia "capo di tutti i capi" appears to have
done -- Italian police easily broke it, enabling it to arrest several
other mafiosi!), or native code, or any other ineffectual approach. But
if your wallet (or jailtime;-) is really on the line, do realize that
they ARE ineffectual.
Alex

Apr 20 '06 #17

Ben Sizer

Alex Martelli wrote:

Ben Sizer <ky*****@gmail.com> wrote:
I don't know. In terms of copy protection, popular off-the-shelf
software is going to get cracked whether it's written in Python or x86
ASM, that much is true. But in terms of perhaps protecting innovative
algorithms from competitors, or something similar, compilation into
native code does a great job of hiding your work. Not a perfect job,
but a good enough job.
If they're truly worth protecting, they're worth reverse engineering.

It's a sliding scale though. You don't need to be able to stop
everybody to make it worthwhile.
Remember, the competition includes excellent programmers working in
countries where $10 an hour's salary is luxury and IP law enforcements
non-existent, so the cost to reveng is not as high as you might think.
Whether $10 is a lot or a little is not as important as whether that
$10 could be better spent. It's easy to drill down far enough to break
copy protection but nowhere near as easy to derive a high level
algorithm from the assembly language. So in the latter case, a little
protection goes a long way.

I know some people talk a lot about using web services to keep the
proprietary data behind a secure server, but there is a large number of

Ah yes, that would be me;-). Except that I don't limit my advice to
proprietary DATA -- it also applies to CODE worth keeping secret.

Code is data, data is code. :) I meant it to refer to all information
stored that way.

applications where this is not practical - eg. image/audio processing,
computer games, artificial intelligence, or several other applications
with heavy real-time or cpu-intensive requirements, or embedded systems
that don't have web access.

Fewer and fewer systems "intrinsically lack" net access. For example,
good (costly) computer games more and more need net access to be played
in the best way (multiplayer etc).

Sure, but there's still many, many programs that don't fit that
criteria. Nor are people generally happy about being compelled to use
online services to 'activate' their games.
"CPU intensive" is a weird reason to want to avoid keeping in a well
protected environment any code that's really worth money -- if it IS
worth that much you're no doubt charging enough for it to afford
supplying the CPU power to your customers (whatever your business model,
say pay-per-use or subscription levels with different maxima, etc etc).

Maybe I wasn't making myself clear - I just meant that you can't be
doing round-trips to a web server for per-pixel calculations.

--
Ben Sizer

Apr 20 '06 #18

by: Jon Perez | last post by:

Can one run a 1.5 .pyc file with the 2.x version interpreters and vice versa? How about running a 2.x .pyc using a 2.y interpreter?

Python

compiling to python byte codes

by: Maurice LING | last post by:

Hi, I remembered reading a MSc thesis about compiling Perl to Java bytecodes (as in java class files). At least, it seems that someone had compiled scheme to java class files quite successfully....

Python

python and macros (again) [Was: python3: 'where' keyword]

by: michele.simionato | last post by:

Paul Rubin wrote: > How about macros? Some pretty horrible things have been done in C > programs with the C preprocessor. But there's a movememnt afloat to > add hygienic macros to Python. Got any...

Python

159

Python obfuscation

by: petantik | last post by:

Are there any commercial, or otherwise obfuscators for python source code or byte code and what are their relative advantages or disadvantages. I wonder because there are some byte code protection...

Python

How to protect my source code from reverse engineering

by: Fady Anwar | last post by:

Hi while browsing the net i noticed that there is sites publishing some software that claim that it can decompile .net applications i didn't bleave it in fact but after trying it i was surprised...

C# / C Sharp

118

Python vs. Lisp -- please explain

by: 63q2o4i02 | last post by:

Hi, I've been thinking about Python vs. Lisp. I've been learning Python the past few months and like it very much. A few years ago I had an AI class where we had to use Lisp, and I absolutely...

Python

Best way to protect my new commercial software.

by: farsheed | last post by:

I wrote a software and I want to protect it so can not be cracked easily. I wrote it in python and compile it using py2exe. what is the best way in your opinion?

Python

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How protect proprietary Python code? (bytecode obfuscation?, what better?)

Similar topics