By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,121 Members | 1,064 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,121 IT Pros & Developers. It's quick & easy.

compiling assembler just in time and jumping to the result from a c++ program

P: n/a
Mostly for testing reasons I'd like to see if it makes sense to chose
the following approach for just-in-time compilation of shaders for a
renderer:
Seeing as the shaders themsefs consist mostly of very basic operations
I'd like to translate them into assembly, have an assembler compile the
binary code and then call the resulting machine code from c++.

The thing is that up until now I have only used inline assembly in my
c++ projects, so there's a few things I hardly know anything about and
would be very greatful if anyone here could point me in the right
direction:
- Having a set of asm instructions, say "addl 5, %%eax" or "add eax, 5"
respectively, how would I go about translating just this one line into
binary? (in a way that doesn't mean i'll have to re-write the whole
thing when porting to a different os if at all possible :)
- How do I jump into the binary from my c++ app in a way that I can jmp
back at the end of my assembly code segment?

Thanks!

Dec 23 '06 #1
Share this Question
Share on Google+
14 Replies


P: n/a
sp******@crayne.org napsal:
Mostly for testing reasons I'd like to see if it makes sense to chose
the following approach for just-in-time compilation of shaders for a
renderer:
Seeing as the shaders themsefs consist mostly of very basic operations
I'd like to translate them into assembly, have an assembler compile the
binary code and then call the resulting machine code from c++.

The thing is that up until now I have only used inline assembly in my
c++ projects, so there's a few things I hardly know anything about and
would be very greatful if anyone here could point me in the right
direction:
- Having a set of asm instructions, say "addl 5, %%eax" or "add eax, 5"
respectively, how would I go about translating just this one line into
binary? (in a way that doesn't mean i'll have to re-write the whole
thing when porting to a different os if at all possible :)
- How do I jump into the binary from my c++ app in a way that I can jmp
back at the end of my assembly code segment?

Thanks!
In real operating system you usually cannot simply create some data in
memory and then say "It's my routine, jump there" (It could work in
DOS, but who cares about DOS now...).

There is no standard C++ solution for this. But you can do it this way:
- generate your assembler (or C, C++, it does not matter) source into
some file
- compile it with your favourite compiler
- link it into dynamic library
- load this library into memory, resolve symbols and call your function

For compilation you need start a new process. Starting processes from
your program is non-portable. Use fork and exec (or execl or similar)
on POSIX systems or CreateProcess in Windows.

Loading dynamic library is also non standard from C++ point of view.
However it is standardized on POSIX platforms with dlopen family and on
Windows platform with LoadLibrary. Functions have very simillar API, it
may be simply #ifdef'ed.

Instead of #ifdefs you can use some library which hides platform
specific issues (like ACE, wxWidgets or anything else) which implements
starting processes and loading library.

Dec 23 '06 #2

P: n/a
Ondra Holub wrote:
In real operating system you usually cannot simply create some data in
memory and then say "It's my routine, jump there" (It could work in
DOS, but who cares about DOS now...).
I'm not sure what "real operating system" you had in mind, but in
Windows you most certainly *can* write machine code directly into
memory 'on the fly' and execute it there immediately. That's how the
assembler in BBC BASIC for Windows works, and without that capability
the language would be severely crippled! Some modern processors can be
set to prevent code execution in 'data' memory, but it's not the norm
in 32-bit Windows.

Richard.
Author of BBC BASIC for Windows.
http://www.rtrussell.co.uk/

Dec 23 '06 #3

P: n/a
- Having a set of asm instructions, say "addl 5, %%eax" or "add eax, 5"
respectively, how would I go about translating just this one line into
binary?
Look for a solution that does this already. It's probably something
commercial. Google is your friend.

Who knows, maybe someone has written an x86 compiler for free, in pure
C++ code. Writing compilers isn't hard at all actually, as long as you
stick to minimal implementation and implement the 1% of the features
that is used 99% of the time ;) Writing an asm compiler should be
really easy to do yourself actually. Just pick up an x86 manual and
away you go.
(in a way that doesn't mean i'll have to re-write the whole
thing when porting to a different os if at all possible :)
Assuming the OS is still on the same processor... shouldn't be a
problem I imagine! Getting the compiler on both OSs will be a different
matter, though.
- How do I jump into the binary from my c++ app in a way that I can jmp
back at the end of my assembly code segment?
No idea sorry. The answer should be simple though.

Dec 24 '06 #4

P: n/a
"Ondra Holub" <sp******@crayne.orgwrites:
sp******@crayne.org napsal:
>Mostly for testing reasons I'd like to see if it makes sense to chose
the following approach for just-in-time compilation of shaders for a
renderer:
Seeing as the shaders themsefs consist mostly of very basic operations
I'd like to translate them into assembly, have an assembler compile the
binary code and then call the resulting machine code from c++.

The thing is that up until now I have only used inline assembly in my
c++ projects, so there's a few things I hardly know anything about and
would be very greatful if anyone here could point me in the right
direction:
- Having a set of asm instructions, say "addl 5, %%eax" or "add eax, 5"
respectively, how would I go about translating just this one line into
binary? (in a way that doesn't mean i'll have to re-write the whole
thing when porting to a different os if at all possible :)
- How do I jump into the binary from my c++ app in a way that I can jmp
back at the end of my assembly code segment?

Thanks!

In real operating system you usually cannot simply create some data in
memory and then say "It's my routine, jump there" (It could work in
DOS, but who cares about DOS now...).
Well, the problem is not only the OS. The OS could indeed prevent
execution from data pages, but you can usually get some memory page
with both the write and execute access rights.

The other difficulty is that modern processors have distinct
instruction and data caches. So once you've written the code to the
data block, the code can still be in the data cache when you try to
load it in the instruction cache, and you won't get it, but what was
stored previously. So you need to flush the instruction cache
everytime you've created a new function.

There is no standard C++ solution for this. But you can do it this way:
- generate your assembler (or C, C++, it does not matter) source into
some file
- compile it with your favourite compiler
- link it into dynamic library
- load this library into memory, resolve symbols and call your function
This is the standard way. If you don't want to be processor specific,
it's a good idea to go thru the program who knows what bytes the
current processor likes.

For compilation you need start a new process. Starting processes from
your program is non-portable. Use fork and exec (or execl or similar)
on POSIX systems or CreateProcess in Windows.
Isn't MS-Windows a POSIX system?

Loading dynamic library is also non standard from C++ point of view.
However it is standardized on POSIX platforms with dlopen family and on
Windows platform with LoadLibrary. Functions have very simillar API, it
may be simply #ifdef'ed.

Instead of #ifdefs you can use some library which hides platform
specific issues (like ACE, wxWidgets or anything else) which implements
starting processes and loading library.
Why are you bothering with MS-Windows anyways?
--
__Pascal Bourguignon__ http://www.informatimago.com/

READ THIS BEFORE OPENING PACKAGE: According to certain suggested
versions of the Grand Unified Theory, the primary particles
constituting this product may decay to nothingness within the next
four hundred million years.

Dec 24 '06 #5

P: n/a
In article <11**********************@a3g2000cwd.googlegroups. com>
sp******@crayne.org "ne**@rtrussell.co.uk" writes:
Ondra Holub wrote:
In real operating system you usually cannot simply create some data in
memory and then say "It's my routine, jump there" (It could work in
DOS, but who cares about DOS now...).
Mamy more programmers than you think, I suspect. Not all devices
and applications require GB of memory or to be network-aware.
Horses for courses...
I'm not sure what "real operating system" you had in mind, but in
Windows you most certainly *can* write machine code directly into
memory 'on the fly' and execute it there immediately.
As malware creators are abundantly aware :( The sad part is that
most such PCs (are they "personal anymore?) are connected to the
Internet -- a hostile environment where big bucks are concerned.
That's how the
assembler in BBC BASIC for Windows works, and without that capability
the language would be severely crippled! Some modern processors can be
set to prevent code execution in 'data' memory, but it's not the norm
in 32-bit Windows.
One man's meats, another man's poison... Anything that is really
good/useful can be subverted for nefarious usage, sadly.
Richard.
Author of BBC BASIC for Windows.
http://www.rtrussell.co.uk/
Pete
--
"We have not inherited the earth from our ancestors,
we have borrowed it from our descendants."

Dec 24 '06 #6

P: n/a
Isn't MS-Windows a POSIX system?
Well, there may be installed some POSIX layer (or Windows Services for
UNIX in Windows XP or 2003), but I am not sure that you get POSIX
functions in your C/C++ libraries (I do not mean cygwin or simillar).
Maybe it works, I do not have any experiences.

Dec 24 '06 #7

P: n/a
Pascal Bourguignon wrote:
The other difficulty is that modern processors have distinct
instruction and data caches. So once you've written the code to the
data block, the code can still be in the data cache when you try to
load it in the instruction cache, and you won't get it, but what was
stored previously. So you need to flush the instruction cache
everytime you've created a new function.
Not usually you don't. In Intel processors at least, and I would be
very surprised if AMD are any different, great care is taken to ensure
that self-modifying-code works correctly (since it is so prevalent).
There can certainly be a very significant performance hit (typically
because a data write may invalidate the code cache, causing a re-fetch)
but there is definitely no need for the programmer to take special
action of the sort you describe in the case of the IA-32 architecture.

Quoting from the Intel Optimization Manual: "Self-modifying code (SMC)
that ran correctly on Pentium III processors and prior implementations
will run correctly on subsequent implementations, including Pentium 4
and Intel Xeon processors":

http://developer.intel.com/design/Pe...als/245127.htm

Richard.
http://www.rtrussell.co.uk/

Dec 24 '06 #8

P: n/a
ne**@rtrussell.co.uk wrote:
I'm not sure what "real operating system" you had in mind, but in
Windows you most certainly *can* write machine code directly into
memory 'on the fly' and execute it there immediately. That's how the
assembler in BBC BASIC for Windows works, and without that capability
the language would be severely crippled! Some modern processors can be
set to prevent code execution in 'data' memory, but it's not the norm
in 32-bit Windows.
I believe in fact it is set to prevent such, but the trap handler does
"what you want" (for suitable choice of what to want...) by flushing
the cache and changing the pages from R/W data to read-only code.
It might get really expensive if the code wrote to its own page.

Dec 24 '06 #9

P: n/a
Robert Mabee wrote:
I believe in fact it is set to prevent such, but the trap handler does
"what you want" (for suitable choice of what to want...) by flushing
the cache and changing the pages from R/W data to read-only code.
Your evidence for that is what exactly? I'm pretty sure it's not the
case.
It might get really expensive if the code wrote to its own page.
Which of course is exactly what self-modifying code does. Yes it's
expensive in performance, but IA-32 processors support it without any
intervention from the OS (see my quote from the Intel Optimization
Manual elsewhere in this thread).

Richard.
http://www.rtrussell.co.uk/

Dec 24 '06 #10

P: n/a
ne**@rtrussell.co.uk wrote:
Robert Mabee wrote:
>>I believe in fact it is set to prevent such, but the trap handler does
"what you want" (for suitable choice of what to want...) by flushing
the cache and changing the pages from R/W data to read-only code.

Your evidence for that is what exactly? I'm pretty sure it's not the
case.
The evidence that Windows has such a trap handler is your prior claim
that such code works. A cache flush is needed on any CPU that has
an I cache that doesn't snoop the bus. I recall discussions about such
a CPU from Intel but couldn't say if this is a problem with recent
chips, but the code has to be right on the worst case that might still
be running.
>>It might get really expensive if the code wrote to its own page.

Which of course is exactly what self-modifying code does. Yes it's
expensive in performance, but IA-32 processors support it without any
intervention from the OS (see my quote from the Intel Optimization
Manual elsewhere in this thread).
Then where does the expense come from? I mean by my remark to warn the
OP to do all the writing to the fabricated code before jumping into it,
which will either make no difference (your model) or a vast improvement
(my model) versus a possible implementation that mixes writes and code
fetches to the same page.

I checked the quote -- it unfortunately doesn't say anything about
whether prior implementations also do this right.

Dec 25 '06 #11

P: n/a
All this talk of cache flush penalties: How is
a call to a just-created block of code any
different to the CPU than an indirect call through
a register? In either case the CPU must load
new code into the pipeline that can't be prefetched.

I assume the OP would only use generated code
for performance, which implies it will be in a loop
that gets used enough times to overcome any
setup penalties... just like most performance code.

Best regards,

Bob Masta
dqatechATdaqartaDOTcom

D A Q A R T A
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Home of DaqGen, the FREEWARE signal generator

Dec 25 '06 #12

P: n/a
Robert Mabee wrote:
Then where does the expense come from?
This is what the Intel optimization document says:

"Software should avoid writing to a code page in the same 1 KB subpage
of that is being executed or fetching code in the same 2 KB subpage of
that is currently being written. In addition, sharing a page containing
directly or speculatively executed code with another processor as a
data page can trigger an SMC condition that causes the entire pipeline
of the machine and the trace cache to be cleared. This is due to the
self-modifying code condition".
I mean by my remark to warn the
OP to do all the writing to the fabricated code before jumping into it,
which will either make no difference (your model) or a vast improvement
(my model) versus a possible implementation that mixes writes and code
fetches to the same page.
I never suggested that there was no performance hit, I simply wanted to
emphasise that writing machine code to data memory and then executing
it is supported by the processor, and requires no user or OS
intervention (such as flushing the instruction cache) - the necessary
steps are carried out by the CPU itself. This is the 'SMC condition'
referred to by Intel.
I checked the quote -- it unfortunately doesn't say anything about
whether prior implementations also do this right.
It refers to "Self-modifying code (SMC) that ran correctly on Pentium
III processors and prior implementations". Since 'prior
implementations' presumably include the 80386, the implication is that
all IA-32 processors have supported self-modifying code. Certainly I
have never encountered any unexpected behavior from dynamically
assembling code and then executing it in data memory, which all
versions of BBC BASIC (right back to the 6502) have done.

Richard.
http://www.rtrussell.co.uk/

Dec 25 '06 #13

P: n/a

On Sat, 23 Dec 2006, sp******@crayne.org wrote:
Mostly for testing reasons I'd like to see if it makes sense to chose
the following approach for just-in-time compilation of shaders for a
renderer:
Seeing as the shaders themsefs consist mostly of very basic operations
I'd like to translate them into assembly, have an assembler compile the
binary code and then call the resulting machine code from c++.

The thing is that up until now I have only used inline assembly in my
c++ projects, so there's a few things I hardly know anything about and
would be very greatful if anyone here could point me in the right
direction:
- Having a set of asm instructions, say "addl 5, %%eax" or "add eax, 5"
respectively, how would I go about translating just this one line into
binary? (in a way that doesn't mean i'll have to re-write the whole
thing when porting to a different os if at all possible :)
- How do I jump into the binary from my c++ app in a way that I can jmp
back at the end of my assembly code segment?

Thanks!
you can use or study the GNU lightning library:

http://www.gnu.org/software/lightning

GNU lightning is a library that generates assembly language code at
run-time; it is very fast, making it ideal for Just-In-Time compilers, and
it abstracts over the target CPU, as it exposes to the clients a
standardized RISC instruction set inspired by the MIPS and SPARC chips.

GNU lightning 1.0 has been released and is usable in complex code
generation tasks. The available backends cover the x86, SPARC and PowerPC
architectures; the floating point interface is still experimental though,
and developed for the x86 only.

regards,
lajos

Dec 27 '06 #14

P: n/a
Hi,

I have a project, SoftWire, that does exactly what you intend to do:
https://gna.org/projects/softwire/. It's free for use under the LGPL
license. Its commercial successor is used in SwiftShader, an advanced
software renderer.

Executing the binary code is as simple as treating the pointer to the
memory buffer as a function pointer, and calling it. Memory can be made
executable with the following code (straight from SoftWire):

#ifdef WIN32
unsigned long oldProtection;
VirtualProtect(machineCode, length, PAGE_EXECUTE_READWRITE,
&oldProtection); // #include <windows.h>
#elif __unix__
mprotect(machineCode, length, PROT_READ | PROT_WRITE | PROT_EXEC);
// #include <sys/mman.h>
#endif

Kind regards,

Nicolas Capens
sp******@crayne.org wrote:
Mostly for testing reasons I'd like to see if it makes sense to chose
the following approach for just-in-time compilation of shaders for a
renderer:
Seeing as the shaders themsefs consist mostly of very basic operations
I'd like to translate them into assembly, have an assembler compile the
binary code and then call the resulting machine code from c++.

The thing is that up until now I have only used inline assembly in my
c++ projects, so there's a few things I hardly know anything about and
would be very greatful if anyone here could point me in the right
direction:
- Having a set of asm instructions, say "addl 5, %%eax" or "add eax, 5"
respectively, how would I go about translating just this one line into
binary? (in a way that doesn't mean i'll have to re-write the whole
thing when porting to a different os if at all possible :)
- How do I jump into the binary from my c++ app in a way that I can jmp
back at the end of my assembly code segment?

Thanks!
Dec 27 '06 #15

This discussion thread is closed

Replies have been disabled for this discussion.