473,326 Members | 2,815 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

C source cruncher wanted

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have a project where I need to distribute 300kB of C source code as part
of a shell script, and would like to compress it as much as possible.

Does anyone know where I can get a (open source) tool for crunching C
programs? That is, removal of whitespace, comments, extraneous characters,
renaming identifiers to make them as short as possible, etc. Actual
outright obfuscation is not my goal; I just want to reduce the source size.

(Yes, I know I could use a tool such as gzip, but for various reasons I'd
like to make the actual source code as small as possible as well.)

It seems to be rather hard to find crunchers these days --- I know there
certainly used to be some, and you can still get them for Javascript, but I
can't find anything that works on C...

I'm using a Unix environment.

- --
+- David Given --McQ-+ "I must have spent at least ten minutes out of my
| dg@cowlark.com | life talking to this joker like he was a sane
| (dg@tao-group.com) | person. I want a refund." --- Louann Miller, on
+- www.cowlark.com --+ rasfw

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFDTP0yf9E0noFvlzgRAiHDAJ9DzfPEay2LgUvR9WP9AZ AZCxfvhQCeMtBo
g8Ui+u/TcCMT7jS7BfFOZpc=
=1Fp3
-----END PGP SIGNATURE-----
Nov 15 '05 #1
7 1343
David Given wrote:
I have a project where I need to distribute 300kB of C source code as part
of a shell script, and would like to compress it as much as possible.

Does anyone know where I can get a (open source) tool for crunching C
programs? That is, removal of whitespace, comments, extraneous characters,
renaming identifiers to make them as short as possible, etc. Actual
outright obfuscation is not my goal; I just want to reduce the source size.

(Yes, I know I could use a tool such as gzip, but for various reasons I'd
like to make the actual source code as small as possible as well.)

It seems to be rather hard to find crunchers these days --- I know there
certainly used to be some, and you can still get them for Javascript, but I
can't find anything that works on C...

Likely because it makes people go "now what good is that", like I'm
going right now. Now what good is that? :-)

It's easy enough to write something like that, though. Just grab any
random C parser + pretty printer from the net and modify it so it prints
small instead of pretty.

Renaming identifiers is slightly trickier because you have to take care
to do it only for non-external symbols (if your code is self-contained,
this doesn't matter). Additional complications arise depending on
whether you want to collapse units into one or not, and whether you're
willing to use #defines or not (I wouldn't bother; too much opportunity
for error). Then there's the ISO C limit on line length (I forget this
one; 509 characters?) that you'll have to respect if you want code to
remain portable.

But I'll still go on record as saying it's not worth it. Any platform
that can compile C has gzip (or is capable of decompressing the format,
at least). The source produced this way is nearly useless to
maintainers, especially if identifiers are renamed, whether obfuscation
is your goal or not.

You also don't save on compilation time: either the code is stable, in
which case a one-time saving of this magnitude is probably irrelevant,
or it's not stable, in which case you have to "compress" it every time
you change the original, which takes more time than just feeding it to
the compiler, unless your compiler really sucks. About the only thing I
can imagine this is good for is reduced transmission times over a
network, but again, gzip is your friend. In fact, HTTP has built-in support.

S.
Nov 15 '05 #2
David Given <dg@cowlark.com> wrote in
news:m0**************@newsfe6-win.ntli.net:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have a project where I need to distribute 300kB of C source code as
part of a shell script, and would like to compress it as much as
possible.

Does anyone know where I can get a (open source) tool for crunching C
programs? That is, removal of whitespace, comments, extraneous
characters, renaming identifiers to make them as short as possible,
etc. Actual outright obfuscation is not my goal; I just want to reduce
the source size.

(Yes, I know I could use a tool such as gzip, but for various reasons
I'd like to make the actual source code as small as possible as well.)

It seems to be rather hard to find crunchers these days --- I know
there certainly used to be some, and you can still get them for
Javascript, but I can't find anything that works on C...

I'm using a Unix environment.


Well, there's always sed. You could use that to remove all the spaces,
tabs. newline characters and whatever else you want to be rid of. Just
replace the unwanted characters with nothing (e.g.., s/\ //g to get rid of
spaces).
Nov 15 '05 #3


David Given wrote On 10/12/05 08:07,:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have a project where I need to distribute 300kB of C source code as part
of a shell script, and would like to compress it as much as possible.

Does anyone know where I can get a (open source) tool for crunching C
programs? That is, removal of whitespace, comments, extraneous characters,
renaming identifiers to make them as short as possible, etc. Actual
outright obfuscation is not my goal; I just want to reduce the source size.
[...]


CB Falconer (anybody know why he's been so silent of late?)
has made mention of an identifier-renaming program he wrote;
you might be able to modify it to squeeze out excess white space
at the same time. I don't have a link to his code repository,
but if you Google your way through some of his postings to this
group you'll probably find it.

(Still, I've got to echo Skarmander's question: "Now, what
good is that?")

--
Er*********@sun.com

Nov 15 '05 #4
Dale wrote:
David Given <dg@cowlark.com> wrote in
news:m0**************@newsfe6-win.ntli.net:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have a project where I need to distribute 300kB of C source code as
part of a shell script, and would like to compress it as much as
possible.

Does anyone know where I can get a (open source) tool for crunching C
programs? That is, removal of whitespace, comments, extraneous
characters, renaming identifiers to make them as short as possible,
etc. Actual outright obfuscation is not my goal; I just want to reduce
the source size.

(Yes, I know I could use a tool such as gzip, but for various reasons
I'd like to make the actual source code as small as possible as well.)

It seems to be rather hard to find crunchers these days --- I know
there certainly used to be some, and you can still get them for
Javascript, but I can't find anything that works on C...

I'm using a Unix environment.

Well, there's always sed. You could use that to remove all the spaces,
tabs. newline characters and whatever else you want to be rid of. Just
replace the unwanted characters with nothing (e.g.., s/\ //g to get rid of
spaces).


Idon'tthinkyouwanttoremoveallspaces,especiallyinqu otedstrings.

I really don't understand the need for this "crunching", unless
it is for obfusation, and then there are probably better ways
of doing that. Running it through a "pretty-printer" like indent
would "decrunch" the source.

Now, if the original poster would specify why he wants to do
this, we an comment on it intelligently.

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Nov 15 '05 #5
Kevin Handy wrote:
[...]
I really don't understand the need for this "crunching", unless
it is for obfusation, and then there are probably better ways
of doing that. Running it through a "pretty-printer" like indent
would "decrunch" the source.

Now, if the original poster would specify why he wants to do
this, we an comment on it intelligently.


Surely that's irrelevant? I do know what I'm looking for, and I do have
specific reasons for wanting it.

FWIW, what I've got is a build utility consisting of a shell script
containing a script and a chunk of source code which is the interpreter for
the script. When the utility is run for the first time, it will unpack the
interpreter, compile it, stash the binary somewhere, and then use it to
invoke the script.

The interpreter is currently pretty chunky. I want people to be able to
deploy the utility by just dropping it in to a source distribution, which
means I want to make it as small as possible. Being able to read the code
isn't an issue, because if you're developing, you use the full,
uncompressed source.

I'm currently building several versions of the shell script package, using
different encodings for the interpreter source. The uncompressed version is
about 400kB. The non 7-bit clean version, which is diff unfriendly and uses
a gzip compressed data chunk, is 100kB. The 7-bit clean version, which uses
gzip and then uuencode, is 150kB. If I can reduce the size of the
interpreter source then I can reduce the size of the package, even if it is
using gzip. It's worth noting that using 'cobfusc -dem' I can reduce the
source code size by 40%, which reduces the gzip compressed version by 25%,
so using a code cruncher *is* useful; but cobfusc was not intended for code
compression, so I can't achieve any further savings.

None of this is particularly on-topic, which I why I didn't mention it to
begin with...

--
+- David Given --McQ-+ "They laughed at Newton. They laughed at Einstein.
| dg@cowlark.com | Of course, they also laughed at Bozo the Clown."
| (dg@tao-group.com) | --- Carl Sagan
+- www.cowlark.com --+

Nov 15 '05 #6

In article <MI*****************@newsfe5-win.ntli.net>, David Given <dg@cowlark.com> writes:

FWIW, what I've got is a build utility consisting of a shell script
containing a script and a chunk of source code which is the interpreter for
the script. When the utility is run for the first time, it will unpack the
interpreter, compile it, stash the binary somewhere, and then use it to
invoke the script.
Ah, it's "Revenge of the Shell Archive".
The interpreter is currently pretty chunky. I want people to be able to
deploy the utility by just dropping it in to a source distribution, which
means I want to make it as small as possible. Being able to read the code
isn't an issue, because if you're developing, you use the full,
uncompressed source.

I'm currently building several versions of the shell script package, using
different encodings for the interpreter source. The uncompressed version is
about 400kB. The non 7-bit clean version, which is diff unfriendly and uses
a gzip compressed data chunk, is 100kB. The 7-bit clean version, which uses
gzip and then uuencode, is 150kB.
uuencode is a lousy encoding (its expansion ratio is 5:3). Base64
would be significantly better (4:3). You should drop about 20KB with
Base64.

(Are you using gzip with maximum compression?)
If I can reduce the size of the
interpreter source then I can reduce the size of the package, even if it is
using gzip.


Someone already suggested modifying a source reformatter (like
indent), since you basically need a C parser and a backend that
writes C source in something close to its minimal representation.
Personally, I wouldn't bother with renaming identifiers - I think
that's past the point of diminishing returns.

It shouldn't be hard to remove comments (assuming no pathological
cases; see the thread starting at [1]) and leading/trailing
whitespace from your source, if you want to do it yourself. That
alone should get you some savings.
1. http://groups.google.com/group/comp....a4486b8ae7dcc1

--
Michael Wojcik mi************@microfocus.com

World domination has encountered a momentary setback. Talk amongst
yourselves. -- Darby Conley
Nov 15 '05 #7
David Given <dg@cowlark.com> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 I have a project where I need to distribute 300kB of C source code as part
of a shell script, and would like to compress it as much as possible. Does anyone know where I can get a (open source) tool for crunching C
programs? That is, removal of whitespace, comments, extraneous characters,
renaming identifiers to make them as short as possible, etc. Actual
outright obfuscation is not my goal; I just want to reduce the source size.


IIRC there something like that in "A book on C" by Kelley & Pohl.
--
Jan van den Broek
ba******@xs4all.nl 0xAFDAD00D
http://huizen.dds.nl/~balglaas/
Nov 15 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Tony Bansten | last post by:
I am searching for some sample source code to convert voice (or music) recorded with a microphone into digital data. As far as I know this must be an implementation of a Fourier Transformation...
2
by: greatbooksclassics | last post by:
Open Source DRM? What does everyone think about it? Will Open Source DRM ever catch up to MS DRM? Will DRM ever be integrated into common LAMP applications?...
0
by: Steven | last post by:
I have a windows 2000 server running V8 fix pack 4a with the DB2 Instance. I wanted to move the active database off of the db2 instance away from the TOOLSDB so I created a new instance INST2. I...
115
by: TheAd | last post by:
At this moment I use MsAccess and i can build about every databound application i want. Who knows about a serious open source alternative? Because Windows will be a client platform for some time, i...
8
by: Alvo von Cossel I | last post by:
hey everybody, I have written a great browser but it is missing a feature (quite a lot actually, but forget about them for now). that feature just so happens to be the View > Source function....
6
by: VB Programmer | last post by:
Typically, when you are done with a client's website, do you give them all of the source code, if they ask for it? Do you, as the developer, legally own it, or do they, because they paid you to...
2
by: clintonb | last post by:
Using Visual Studio 2005, I created a new website: File->New->Web Site and chose the ASP.Net Web Site template. Saved it in location c:\Projects\GCSLRMS_DotNet\WS\WebSite1. It created the...
6
by: Just D. | last post by:
Does anybody know if there any open source project written in C# as a class library allowing to convert RTF string into HTML string? I'd like to join this project for the interest of both sides. I...
6
by: doublemaster007 | last post by:
Hi all, Is there any open source IDE for C++ which uses GCC for windows? Hope this is a right place to ask this question
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.