Has anyone tried something like this to speed up programs?

Snis Pilbor

Hello,

Here is an idea I've been toying with to speed up programs but
still keep them portable. It's just a very handwavey rough description
right now since I haven't worked out details.

The program would contain lots of little utility functions which do
small amounts of work. None of these would actually be run. Rather,
they would be typecast as strings and those strings would be examined
by the program. In otherwords, the program would examine its own code.
By doing this to these little well-behaved functions, the program
would attempt to deduce the rudimentary facts about the machine code of
the machine running it. The program would then use this knowledge to
write new machine code on the fly, typecast it as a function and run
it, a sort of portable self-rewriting code to achieve both portability
and speed. (Of course the initial deductions would take time, so this
would only really be useful if the actual main purpose of the program
was extremely time sensitive)

Does anyone know of anything like this, and if so, does anyone know
where I can find info about it?

Thanks, you guys rock!

May 25 '06 #1

Subscribe Post Reply

2221

Jack Klein

On 24 May 2006 21:08:41 -0700, "Snis Pilbor" <sn********@yahoo.com>
wrote in comp.lang.c:

Hello,

Here is an idea I've been toying with to speed up programs but
still keep them portable. It's just a very handwavey rough description
right now since I haven't worked out details.

The program would contain lots of little utility functions which do
small amounts of work. None of these would actually be run. Rather,
they would be typecast as strings and those strings would be examined
Here you've already left the C language behind. There is no way in
the C language to do anything with a function other than to call it or
take its address. There is no defined way at all to cast a function
to a string, or to examine the contents of a function.
by the program. In otherwords, the program would examine its own code.
Again, no such thing is defined in C.
By doing this to these little well-behaved functions, the program
would attempt to deduce the rudimentary facts about the machine code of
the machine running it. The program would then use this knowledge to
write new machine code on the fly, typecast it as a function and run
it, a sort of portable self-rewriting code to achieve both portability
and speed. (Of course the initial deductions would take time, so this
would only really be useful if the actual main purpose of the program
was extremely time sensitive)
Here again, it is quite possible for a program to write bytes into a
buffer that form a sequence of machine code instructions, there is no
defined way in C to call that sequence as a function.

Forgetting that, because this can probably be made to do what you want
on some platforms/compilers, there is the fact that some operating
systems have security features that would not allow this. Applications
executing in user, as opposed to system, mode cannot write to memory
containing the executable's machine code, and can't transfer control
to any memory they can write to. I think that even 64-bit Windows has
this security.
Does anyone know of anything like this, and if so, does anyone know
where I can find info about it?

Thanks, you guys rock!

Aside from the fact that this is off-topic, due to the fact that it is
all undefined in standard C, it doesn't seem to make sense, at least
to me. So let's look at why I get that impression:

The executable image is running on some hardware architecture under
some operating system. It had to be built with some tool set for that
processor/OS combination in the first place.

At the time you built the code, you know everything that the program
itself can learn from examining it's own machine code. So any
optimizations that it could achieve by such convoluted tricks at run
time could also be achieved at build time, possibly with preprocessor
conditionals to select certain optimized sequences on symbols defined
for the specific processor/OS combination.

I'm going to paraphrase Knuth here, although I doubt my version is
clever enough to ever be quoted as often as the original:

"Overly ambitious optimization is the square root of all evil (or at
least most of it) in programming."

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

May 25 '06 #2

jacob navia

Snis Pilbor a écrit :

Hello,

Here is an idea I've been toying with to speed up programs but
still keep them portable. It's just a very handwavey rough description
right now since I haven't worked out details.

The program would contain lots of little utility functions which do
small amounts of work. None of these would actually be run. Rather,
they would be typecast as strings and those strings would be examined
by the program.
In C you do not need to typecast to string. Just use plain function
pointers directly.

In otherwords, the program would examine its own code.
By doing this to these little well-behaved functions, the program
would attempt to deduce the rudimentary facts about the machine code of
the machine running it.
The "machine code of the machine running it". Mmm you mean the hardware
running it? Or the microcode of the hardware running it?
The program would then use this knowledge to
write new machine code on the fly, typecast it as a function and run
it, a sort of portable self-rewriting code to achieve both portability
and speed.
I have written a JIT (Just In Time compiler) that does dynamically
code generation but no, it does not care what type of model the CPU
is. It generates code that is portable across models, or generates
code for a specific model of CPU but then, that is done statically,
not dynamically. I bet you would loose so much time doing those
tests that the gains would be nothing

(Of course the initial deductions would take time, so this would only really be useful if the actual main purpose of the program
was extremely time sensitive)

Well, I bet just to MEASURE how fast a program runs you would need
some milli-seconds at least. THEN you have to run several tests
of that, so you will end up lossing at least a second for this

The speedup should be much more than a second delta, quite a feat.
Does anyone know of anything like this, and if so, does anyone know
where I can find info about it?

As far as I know, nobody has done something like this.

May 25 '06 #3

Gordon Burditt

> Here is an idea I've been toying with to speed up programs but

still keep them portable. It's just a very handwavey rough description
right now since I haven't worked out details.

The program would contain lots of little utility functions which do
small amounts of work. None of these would actually be run. Rather,
they would be typecast as strings and those strings would be examined
Due to the role of the nul terminator in strings and the common presence
of that byte in machine code, I suggest you don't literally use
"strings", but blocks of memory.
by the program. In otherwords, the program would examine its own code.
Are you assuming here that the program learns the assembly language
of the machine it is running on, and then figures out how to generate
fast code from that, with no prior knowledge of the structure of
machine code or what target machine it's running on? I would assume
that this would take *at least* 10 times the time necessary to
compile GCC. Actually I think it would have trouble beating a human
infant trying to learn its first language.
By doing this to these little well-behaved functions, the program
would attempt to deduce the rudimentary facts about the machine code of
the machine running it. The program would then use this knowledge to
write new machine code on the fly, typecast it as a function and run
it, a sort of portable self-rewriting code to achieve both portability
and speed. (Of course the initial deductions would take time, so this
would only really be useful if the actual main purpose of the program
was extremely time sensitive)
I have to wonder if it would be faster to breed some human programmers
from scratch. You'd also need some mechanisms for it to SAVE the
deductions so the next startup of the code wouldn't take years.
Does anyone know of anything like this, and if so, does anyone know
where I can find info about it?

If you can get this thing to work in milliseconds, it would seem
to have some very interesting applications if the "target machine
code" was a human language (or an encrypted one).

Gordon L. Burditt

May 25 '06 #4

robertwessel2

jacob navia wrote:

Snis Pilbor a écrit :
Hello,

Here is an idea I've been toying with to speed up programs but
still keep them portable. It's just a very handwavey rough description
right now since I haven't worked out details.

The program would contain lots of little utility functions which do
small amounts of work. None of these would actually be run. Rather,
they would be typecast as strings and those strings would be examined
by the program.

In C you do not need to typecast to string. Just use plain function
pointers directly.

In otherwords, the program would examine its own code.
By doing this to these little well-behaved functions, the program
would attempt to deduce the rudimentary facts about the machine code of
the machine running it.

The "machine code of the machine running it". Mmm you mean the hardware
running it? Or the microcode of the hardware running it?
The program would then use this knowledge to
write new machine code on the fly, typecast it as a function and run
it, a sort of portable self-rewriting code to achieve both portability
and speed.

I have written a JIT (Just In Time compiler) that does dynamically
code generation but no, it does not care what type of model the CPU
is. It generates code that is portable across models, or generates
code for a specific model of CPU but then, that is done statically,
not dynamically. I bet you would loose so much time doing those
tests that the gains would be nothing

(Of course the initial deductions would take time, so this
would only really be useful if the actual main purpose of the program
was extremely time sensitive)

Well, I bet just to MEASURE how fast a program runs you would need
some milli-seconds at least. THEN you have to run several tests
of that, so you will end up lossing at least a second for this

The speedup should be much more than a second delta, quite a feat.
Does anyone know of anything like this, and if so, does anyone know
where I can find info about it?

As far as I know, nobody has done something like this.

While this is all severely OT, dynamic optimization has been endlessly
discussed and implemented in various research projects. HP's Dynamo is
a recent and fairly interesting one (although targeted at PA-RISC).

Fairly common are dynamic optimizers that are part of binary
recompilers (for example HP's Aries which binary recompiles and
optimized PA-RISC code for IPF). Many of the JVMs (not to mention MSIL
implementations) do the same thing - the monitor the code and attempt
to optimize it as the run proceeds.

May 25 '06 #5

Keith Thompson

"ro***********@yahoo.com" <ro***********@yahoo.com> writes:
[...]

Snis Pilbor a écrit :
> Here is an idea I've been toying with to speed up programs but
> still keep them portable. It's just a very handwavey rough description
> right now since I haven't worked out details. [...] > In otherwords, the program would examine its own code.
> By doing this to these little well-behaved functions, the program
> would attempt to deduce the rudimentary facts about the machine code of
> the machine running it.
[...]
While this is all severely OT, dynamic optimization has been endlessly
discussed and implemented in various research projects. HP's Dynamo is
a recent and fairly interesting one (although targeted at PA-RISC).

Fairly common are dynamic optimizers that are part of binary
recompilers (for example HP's Aries which binary recompiles and
optimized PA-RISC code for IPF). Many of the JVMs (not to mention MSIL
implementations) do the same thing - the monitor the code and attempt
to optimize it as the run proceeds.

But that's different from what the OP is talking about. Dynamic
optimization, if I understand it correctly, optimizes code based on
the program's actual execution patterns. The OP is (I think)
proposing run-time optimization based entirely on information that's
already available during compilation, assuming the compiler knows the
performance attributes of the target CPU.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

May 25 '06 #6

santosh

Snis Pilbor wrote:

Hello,

Here is an idea I've been toying with to speed up programs but
still keep them portable. It's just a very handwavey rough description
right now since I haven't worked out details.

The program would contain lots of little utility functions which do
small amounts of work. None of these would actually be run. Rather,
they would be typecast as strings and those strings would be examined
by the program. In otherwords, the program would examine its own code.
By doing this to these little well-behaved functions, the program
would attempt to deduce the rudimentary facts about the machine code of
the machine running it. The program would then use this knowledge to
write new machine code on the fly, typecast it as a function and run
it, a sort of portable self-rewriting code to achieve both portability
and speed. (Of course the initial deductions would take time, so this
would only really be useful if the actual main purpose of the program
was extremely time sensitive)
Data Execution Prevention, (DEP), is being increasingly implemented in
modern operating systems, which would effectively prevent you from
executing blocks of generated data. Of course you can get around this
by using specific OS APIs or writing the data as an executable image to
the harddisk and executing it.

An optimising compiler, targetting the specific CPU family, with
optimising turned on is probably emitting the best machine code
possible for a mechanical translator. If you want to optimise even more
you'll have to write the routines in hand optimised assembly or fine
tune the compiler assembler output.

Runtime analysis and generation of code would actually slow down your
time-critical program enormously. Additionally you'll have to embedd
more or less a complete optimiser and code generator in each of you're
programs.

Java and .NET CLR do what you're attempting in a much better manner.
Maybe you should study their techniques.
Does anyone know of anything like this, and if so, does anyone know
where I can find info about it?
Look at JIT and CLR. A lot of research is on in this field currently.
Thanks, you guys rock!

For the future, confine your posts to this group to it's topic - ISO C
language. For your benifit you might look at the following:

<http://cfaj.freeshell.org/google/>
<http://clc-wiki.net/wiki/Introduction_to_comp.lang.c>
<http://www.safalra.com/special/googlegroupsreply/>

May 25 '06 #7

jacob navia

ro***********@yahoo.com a écrit :

jacob navia wrote:
Snis Pilbor a écrit :
Hello,

Here is an idea I've been toying with to speed up programs but
still keep them portable. It's just a very handwavey rough description
right now since I haven't worked out details.

The program would contain lots of little utility functions which do
small amounts of work. None of these would actually be run. Rather,
they would be typecast as strings and those strings would be examined
by the program.

In C you do not need to typecast to string. Just use plain function
pointers directly.

In otherwords, the program would examine its own code.
By doing this to these little well-behaved functions, the program
would attempt to deduce the rudimentary facts about the machine code of
the machine running it.

The "machine code of the machine running it". Mmm you mean the hardware
running it? Or the microcode of the hardware running it?

The program would then use this knowledge to
write new machine code on the fly, typecast it as a function and run
it, a sort of portable self-rewriting code to achieve both portability
and speed.

I have written a JIT (Just In Time compiler) that does dynamically
code generation but no, it does not care what type of model the CPU
is. It generates code that is portable across models, or generates
code for a specific model of CPU but then, that is done statically,
not dynamically. I bet you would loose so much time doing those
tests that the gains would be nothing

(Of course the initial deductions would take time, so this
would only really be useful if the actual main purpose of the program
was extremely time sensitive)

Well, I bet just to MEASURE how fast a program runs you would need
some milli-seconds at least. THEN you have to run several tests
of that, so you will end up lossing at least a second for this

The speedup should be much more than a second delta, quite a feat.

Does anyone know of anything like this, and if so, does anyone know
where I can find info about it?

As far as I know, nobody has done something like this.

While this is all severely OT, dynamic optimization has been endlessly
discussed and implemented in various research projects. HP's Dynamo is
a recent and fairly interesting one (although targeted at PA-RISC).

Fairly common are dynamic optimizers that are part of binary
recompilers (for example HP's Aries which binary recompiles and
optimized PA-RISC code for IPF). Many of the JVMs (not to mention MSIL
implementations) do the same thing - the monitor the code and attempt
to optimize it as the run proceeds.

Dynamic optimizers do not try to adapt to the model the programm is
running on.

They measure execution speed and will try to change the code
optimization for a given setting/input data, not what the
original poster is asking for:

" the program would attempt to deduce the rudimentary facts about the
machine code of the machine running it."

jacob

May 25 '06 #8

robertwessel2

jacob navia wrote:

ro***********@yahoo.com a écrit :
While this is all severely OT, dynamic optimization has been endlessly
discussed and implemented in various research projects. HP's Dynamo is
a recent and fairly interesting one (although targeted at PA-RISC).

Fairly common are dynamic optimizers that are part of binary
recompilers (for example HP's Aries which binary recompiles and
optimized PA-RISC code for IPF). Many of the JVMs (not to mention MSIL
implementations) do the same thing - the monitor the code and attempt
to optimize it as the run proceeds.

Dynamic optimizers do not try to adapt to the model the programm is
running on.

Take a look at Dynamo. It will do all sorts of target specific
transforms to enhance performance. It, of course, uses PGO, but also
uses profiling to determine which hotspots need to be rebuilt.

May 25 '06 #9

jacob navia

ro***********@yahoo.com a écrit :

jacob navia wrote:
ro***********@yahoo.com a écrit :
While this is all severely OT, dynamic optimization has been endlessly
discussed and implemented in various research projects. HP's Dynamo is
a recent and fairly interesting one (although targeted at PA-RISC).

Fairly common are dynamic optimizers that are part of binary
recompilers (for example HP's Aries which binary recompiles and
optimized PA-RISC code for IPF). Many of the JVMs (not to mention MSIL
implementations) do the same thing - the monitor the code and attempt
to optimize it as the run proceeds.

Dynamic optimizers do not try to adapt to the model the programm is
running on.

Take a look at Dynamo. It will do all sorts of target specific
transforms to enhance performance.

Yes I know that. But it doesn't discover those target specific
transforms at runtime. It has been programmed to use the advantages
of specific machine models, or to follow a profiler, not to discover
the type of model it is running and to modify its code accordingly.

May 25 '06 #10

Kenneth Brody

Snis Pilbor wrote:
[...]

The program would then use this knowledge to
write new machine code on the fly, typecast it as a function and run
it,

[...]

Ignoring the rest of this post, the fact is that many modern computers
can't execute code that's not specifically marked (in some extremely
system-specific manor) as being code.

For example, on my CPU, 0xC3 is a "return from subroutine" opcode. But
the following cannot be executed:

==========

int main()
{
void (*fakefunc)(void);
char fakecode[] = { 0xc3 };

fakefunc = (void (*)(void))fakecode;

fakefunc(); /* Crashes, as fakefunc doesn't point to code. */

return(0);
}

==========

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>

May 25 '06 #11

Snis Pilbor

Guys, thanks so much for all your help, and I do apologize if this is
slightly off topic.

Some people read too much into what I wrote. You assumed I am talking
about runtime optimization of the main program itself. What I'm doing
is a Turing Machine simulator and I want it to run very fast. Much
time in a TM sim is spent stepping through routine TM instructions. It
is not difficult to, in runtime, figure out the behavior of some sets
of TM instructions. So to make the sim much faster, I am thinking
along the lines of: "realize that once -this- state is reached, the
next -blah- steps are predictable and do precisely -blah-. memorize
this fact and in the future, whenever this state is reached, instead of
babystepping through a dozen head movements, just run -this-
dynamically generated block of code."

Of course one can do some less optimal optimization, creating a
"command" structure which contains a list of tasks, basically instead
of replacing a bunch of TM transitions with dynamic code, replace it
with a simple script. This still speeds up the TM sim a lot, though
not as much as dynamic code would... (bear in mind in any TM which
"matters", the majority of time will be spent traversing the same
states over and over again).

Thanks for all the help you guys gave. I really did think this was on
topic since I'm just talking about what's possible in C (namely asking
about possibility of reading and writing machine code like a char
array).

Once again, you guys rock!

Snis Pilbor wrote:

Hello,

Here is an idea I've been toying with to speed up programs but
still keep them portable. It's just a very handwavey rough description
right now since I haven't worked out details.

The program would contain lots of little utility functions which do
small amounts of work. None of these would actually be run. Rather,
they would be typecast as strings and those strings would be examined
by the program. In otherwords, the program would examine its own code.
By doing this to these little well-behaved functions, the program
would attempt to deduce the rudimentary facts about the machine code of
the machine running it. The program would then use this knowledge to
write new machine code on the fly, typecast it as a function and run
it, a sort of portable self-rewriting code to achieve both portability
and speed. (Of course the initial deductions would take time, so this
would only really be useful if the actual main purpose of the program
was extremely time sensitive)

Does anyone know of anything like this, and if so, does anyone know
where I can find info about it?

Thanks, you guys rock!

May 25 '06 #12

Stephen Sprunk

"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:44***********************@news.wanadoo.fr...

ro***********@yahoo.com a écrit :
Take a look at Dynamo. It will do all sorts of target specific
transforms to enhance performance.

Yes I know that. But it doesn't discover those target specific
transforms at runtime. It has been programmed to use the advantages
of specific machine models, or to follow a profiler, not to discover
the type of model it is running and to modify its code accordingly.

DynamoRIO has all sorts of hooks for plugins to modify the code; I'm not
aware of anyone having done it, but in theory one could have a plugin that
detected the specific CPU model in use and modify the code to use
instruction sequences that work best on that model. Still, those changes
would still be static ones that are coded into the plugin, not something the
optimizer "learned" while it was running.

Today's software simply can't "learn" anything it wasn't programmed to
learn, which defeats most purposes of "learning".

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin
*** Posted via a free Usenet account from http://www.teranews.com ***

May 27 '06 #13

Rod Pemberton

"Snis Pilbor" <sn********@yahoo.com> wrote in message
news:11*********************@y43g2000cwc.googlegro ups.com...

Hello,

Here is an idea I've been toying with to speed up programs but
still keep them portable. It's just a very handwavey rough description
right now since I haven't worked out details.

The program would contain lots of little utility functions which do
small amounts of work. None of these would actually be run. Rather,
they would be typecast as strings and those strings would be examined
by the program. In otherwords, the program would examine its own code.
By doing this to these little well-behaved functions, the program
would attempt to deduce the rudimentary facts about the machine code of
the machine running it. The program would then use this knowledge to
write new machine code on the fly, typecast it as a function and run
it, a sort of portable self-rewriting code to achieve both portability
and speed. (Of course the initial deductions would take time, so this
would only really be useful if the actual main purpose of the program
was extremely time sensitive)

Does anyone know of anything like this, and if so, does anyone know
where I can find info about it?

As I see it, you are right on the boundary between two technologies:
interpreters and binary translators.

To understand interpreters, I'd start with FORTH:
http://www.zetetics.com/bj/papers/index.html

To understand binary translators, I'd start with Digital's FX!32:
http://www.realworldtech.com/page.cf...2803224105&p=2
Rod Pemberton

May 27 '06 #14

Has anyone tried something like this to speed up programs?

Similar topics