473,396 Members | 2,018 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

compiling assembler just in time and jumping to the result from a c++ program

Mostly for testing reasons I'd like to see if it makes sense to chose
the following approach for just-in-time compilation of shaders for a
renderer:
Seeing as the shaders themsefs consist mostly of very basic operations
I'd like to translate them into assembly, have an assembler compile the
binary code and then call the resulting machine code from c++.

The thing is that up until now I have only used inline assembly in my
c++ projects, so there's a few things I hardly know anything about and
would be very greatful if anyone here could point me in the right
direction:
- Having a set of asm instructions, say "addl 5, %%eax" or "add eax, 5"
respectively, how would I go about translating just this one line into
binary? (in a way that doesn't mean i'll have to re-write the whole
thing when porting to a different os if at all possible :)
- How do I jump into the binary from my c++ app in a way that I can jmp
back at the end of my assembly code segment?

Thanks!

Dec 23 '06 #1
14 2442
sp******@crayne.org napsal:
Mostly for testing reasons I'd like to see if it makes sense to chose
the following approach for just-in-time compilation of shaders for a
renderer:
Seeing as the shaders themsefs consist mostly of very basic operations
I'd like to translate them into assembly, have an assembler compile the
binary code and then call the resulting machine code from c++.

The thing is that up until now I have only used inline assembly in my
c++ projects, so there's a few things I hardly know anything about and
would be very greatful if anyone here could point me in the right
direction:
- Having a set of asm instructions, say "addl 5, %%eax" or "add eax, 5"
respectively, how would I go about translating just this one line into
binary? (in a way that doesn't mean i'll have to re-write the whole
thing when porting to a different os if at all possible :)
- How do I jump into the binary from my c++ app in a way that I can jmp
back at the end of my assembly code segment?

Thanks!
In real operating system you usually cannot simply create some data in
memory and then say "It's my routine, jump there" (It could work in
DOS, but who cares about DOS now...).

There is no standard C++ solution for this. But you can do it this way:
- generate your assembler (or C, C++, it does not matter) source into
some file
- compile it with your favourite compiler
- link it into dynamic library
- load this library into memory, resolve symbols and call your function

For compilation you need start a new process. Starting processes from
your program is non-portable. Use fork and exec (or execl or similar)
on POSIX systems or CreateProcess in Windows.

Loading dynamic library is also non standard from C++ point of view.
However it is standardized on POSIX platforms with dlopen family and on
Windows platform with LoadLibrary. Functions have very simillar API, it
may be simply #ifdef'ed.

Instead of #ifdefs you can use some library which hides platform
specific issues (like ACE, wxWidgets or anything else) which implements
starting processes and loading library.

Dec 23 '06 #2
Ondra Holub wrote:
In real operating system you usually cannot simply create some data in
memory and then say "It's my routine, jump there" (It could work in
DOS, but who cares about DOS now...).
I'm not sure what "real operating system" you had in mind, but in
Windows you most certainly *can* write machine code directly into
memory 'on the fly' and execute it there immediately. That's how the
assembler in BBC BASIC for Windows works, and without that capability
the language would be severely crippled! Some modern processors can be
set to prevent code execution in 'data' memory, but it's not the norm
in 32-bit Windows.

Richard.
Author of BBC BASIC for Windows.
http://www.rtrussell.co.uk/

Dec 23 '06 #3
- Having a set of asm instructions, say "addl 5, %%eax" or "add eax, 5"
respectively, how would I go about translating just this one line into
binary?
Look for a solution that does this already. It's probably something
commercial. Google is your friend.

Who knows, maybe someone has written an x86 compiler for free, in pure
C++ code. Writing compilers isn't hard at all actually, as long as you
stick to minimal implementation and implement the 1% of the features
that is used 99% of the time ;) Writing an asm compiler should be
really easy to do yourself actually. Just pick up an x86 manual and
away you go.
(in a way that doesn't mean i'll have to re-write the whole
thing when porting to a different os if at all possible :)
Assuming the OS is still on the same processor... shouldn't be a
problem I imagine! Getting the compiler on both OSs will be a different
matter, though.
- How do I jump into the binary from my c++ app in a way that I can jmp
back at the end of my assembly code segment?
No idea sorry. The answer should be simple though.

Dec 24 '06 #4
"Ondra Holub" <sp******@crayne.orgwrites:
sp******@crayne.org napsal:
>Mostly for testing reasons I'd like to see if it makes sense to chose
the following approach for just-in-time compilation of shaders for a
renderer:
Seeing as the shaders themsefs consist mostly of very basic operations
I'd like to translate them into assembly, have an assembler compile the
binary code and then call the resulting machine code from c++.

The thing is that up until now I have only used inline assembly in my
c++ projects, so there's a few things I hardly know anything about and
would be very greatful if anyone here could point me in the right
direction:
- Having a set of asm instructions, say "addl 5, %%eax" or "add eax, 5"
respectively, how would I go about translating just this one line into
binary? (in a way that doesn't mean i'll have to re-write the whole
thing when porting to a different os if at all possible :)
- How do I jump into the binary from my c++ app in a way that I can jmp
back at the end of my assembly code segment?

Thanks!

In real operating system you usually cannot simply create some data in
memory and then say "It's my routine, jump there" (It could work in
DOS, but who cares about DOS now...).
Well, the problem is not only the OS. The OS could indeed prevent
execution from data pages, but you can usually get some memory page
with both the write and execute access rights.

The other difficulty is that modern processors have distinct
instruction and data caches. So once you've written the code to the
data block, the code can still be in the data cache when you try to
load it in the instruction cache, and you won't get it, but what was
stored previously. So you need to flush the instruction cache
everytime you've created a new function.

There is no standard C++ solution for this. But you can do it this way:
- generate your assembler (or C, C++, it does not matter) source into
some file
- compile it with your favourite compiler
- link it into dynamic library
- load this library into memory, resolve symbols and call your function
This is the standard way. If you don't want to be processor specific,
it's a good idea to go thru the program who knows what bytes the
current processor likes.

For compilation you need start a new process. Starting processes from
your program is non-portable. Use fork and exec (or execl or similar)
on POSIX systems or CreateProcess in Windows.
Isn't MS-Windows a POSIX system?

Loading dynamic library is also non standard from C++ point of view.
However it is standardized on POSIX platforms with dlopen family and on
Windows platform with LoadLibrary. Functions have very simillar API, it
may be simply #ifdef'ed.

Instead of #ifdefs you can use some library which hides platform
specific issues (like ACE, wxWidgets or anything else) which implements
starting processes and loading library.
Why are you bothering with MS-Windows anyways?
--
__Pascal Bourguignon__ http://www.informatimago.com/

READ THIS BEFORE OPENING PACKAGE: According to certain suggested
versions of the Grand Unified Theory, the primary particles
constituting this product may decay to nothingness within the next
four hundred million years.

Dec 24 '06 #5
In article <11**********************@a3g2000cwd.googlegroups. com>
sp******@crayne.org "ne**@rtrussell.co.uk" writes:
Ondra Holub wrote:
In real operating system you usually cannot simply create some data in
memory and then say "It's my routine, jump there" (It could work in
DOS, but who cares about DOS now...).
Mamy more programmers than you think, I suspect. Not all devices
and applications require GB of memory or to be network-aware.
Horses for courses...
I'm not sure what "real operating system" you had in mind, but in
Windows you most certainly *can* write machine code directly into
memory 'on the fly' and execute it there immediately.
As malware creators are abundantly aware :( The sad part is that
most such PCs (are they "personal anymore?) are connected to the
Internet -- a hostile environment where big bucks are concerned.
That's how the
assembler in BBC BASIC for Windows works, and without that capability
the language would be severely crippled! Some modern processors can be
set to prevent code execution in 'data' memory, but it's not the norm
in 32-bit Windows.
One man's meats, another man's poison... Anything that is really
good/useful can be subverted for nefarious usage, sadly.
Richard.
Author of BBC BASIC for Windows.
http://www.rtrussell.co.uk/
Pete
--
"We have not inherited the earth from our ancestors,
we have borrowed it from our descendants."

Dec 24 '06 #6
Isn't MS-Windows a POSIX system?
Well, there may be installed some POSIX layer (or Windows Services for
UNIX in Windows XP or 2003), but I am not sure that you get POSIX
functions in your C/C++ libraries (I do not mean cygwin or simillar).
Maybe it works, I do not have any experiences.

Dec 24 '06 #7
Pascal Bourguignon wrote:
The other difficulty is that modern processors have distinct
instruction and data caches. So once you've written the code to the
data block, the code can still be in the data cache when you try to
load it in the instruction cache, and you won't get it, but what was
stored previously. So you need to flush the instruction cache
everytime you've created a new function.
Not usually you don't. In Intel processors at least, and I would be
very surprised if AMD are any different, great care is taken to ensure
that self-modifying-code works correctly (since it is so prevalent).
There can certainly be a very significant performance hit (typically
because a data write may invalidate the code cache, causing a re-fetch)
but there is definitely no need for the programmer to take special
action of the sort you describe in the case of the IA-32 architecture.

Quoting from the Intel Optimization Manual: "Self-modifying code (SMC)
that ran correctly on Pentium III processors and prior implementations
will run correctly on subsequent implementations, including Pentium 4
and Intel Xeon processors":

http://developer.intel.com/design/Pe...als/245127.htm

Richard.
http://www.rtrussell.co.uk/

Dec 24 '06 #8
ne**@rtrussell.co.uk wrote:
I'm not sure what "real operating system" you had in mind, but in
Windows you most certainly *can* write machine code directly into
memory 'on the fly' and execute it there immediately. That's how the
assembler in BBC BASIC for Windows works, and without that capability
the language would be severely crippled! Some modern processors can be
set to prevent code execution in 'data' memory, but it's not the norm
in 32-bit Windows.
I believe in fact it is set to prevent such, but the trap handler does
"what you want" (for suitable choice of what to want...) by flushing
the cache and changing the pages from R/W data to read-only code.
It might get really expensive if the code wrote to its own page.

Dec 24 '06 #9
Robert Mabee wrote:
I believe in fact it is set to prevent such, but the trap handler does
"what you want" (for suitable choice of what to want...) by flushing
the cache and changing the pages from R/W data to read-only code.
Your evidence for that is what exactly? I'm pretty sure it's not the
case.
It might get really expensive if the code wrote to its own page.
Which of course is exactly what self-modifying code does. Yes it's
expensive in performance, but IA-32 processors support it without any
intervention from the OS (see my quote from the Intel Optimization
Manual elsewhere in this thread).

Richard.
http://www.rtrussell.co.uk/

Dec 24 '06 #10
ne**@rtrussell.co.uk wrote:
Robert Mabee wrote:
>>I believe in fact it is set to prevent such, but the trap handler does
"what you want" (for suitable choice of what to want...) by flushing
the cache and changing the pages from R/W data to read-only code.

Your evidence for that is what exactly? I'm pretty sure it's not the
case.
The evidence that Windows has such a trap handler is your prior claim
that such code works. A cache flush is needed on any CPU that has
an I cache that doesn't snoop the bus. I recall discussions about such
a CPU from Intel but couldn't say if this is a problem with recent
chips, but the code has to be right on the worst case that might still
be running.
>>It might get really expensive if the code wrote to its own page.

Which of course is exactly what self-modifying code does. Yes it's
expensive in performance, but IA-32 processors support it without any
intervention from the OS (see my quote from the Intel Optimization
Manual elsewhere in this thread).
Then where does the expense come from? I mean by my remark to warn the
OP to do all the writing to the fabricated code before jumping into it,
which will either make no difference (your model) or a vast improvement
(my model) versus a possible implementation that mixes writes and code
fetches to the same page.

I checked the quote -- it unfortunately doesn't say anything about
whether prior implementations also do this right.

Dec 25 '06 #11
All this talk of cache flush penalties: How is
a call to a just-created block of code any
different to the CPU than an indirect call through
a register? In either case the CPU must load
new code into the pipeline that can't be prefetched.

I assume the OP would only use generated code
for performance, which implies it will be in a loop
that gets used enough times to overcome any
setup penalties... just like most performance code.

Best regards,

Bob Masta
dqatechATdaqartaDOTcom

D A Q A R T A
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Home of DaqGen, the FREEWARE signal generator

Dec 25 '06 #12
Robert Mabee wrote:
Then where does the expense come from?
This is what the Intel optimization document says:

"Software should avoid writing to a code page in the same 1 KB subpage
of that is being executed or fetching code in the same 2 KB subpage of
that is currently being written. In addition, sharing a page containing
directly or speculatively executed code with another processor as a
data page can trigger an SMC condition that causes the entire pipeline
of the machine and the trace cache to be cleared. This is due to the
self-modifying code condition".
I mean by my remark to warn the
OP to do all the writing to the fabricated code before jumping into it,
which will either make no difference (your model) or a vast improvement
(my model) versus a possible implementation that mixes writes and code
fetches to the same page.
I never suggested that there was no performance hit, I simply wanted to
emphasise that writing machine code to data memory and then executing
it is supported by the processor, and requires no user or OS
intervention (such as flushing the instruction cache) - the necessary
steps are carried out by the CPU itself. This is the 'SMC condition'
referred to by Intel.
I checked the quote -- it unfortunately doesn't say anything about
whether prior implementations also do this right.
It refers to "Self-modifying code (SMC) that ran correctly on Pentium
III processors and prior implementations". Since 'prior
implementations' presumably include the 80386, the implication is that
all IA-32 processors have supported self-modifying code. Certainly I
have never encountered any unexpected behavior from dynamically
assembling code and then executing it in data memory, which all
versions of BBC BASIC (right back to the 6502) have done.

Richard.
http://www.rtrussell.co.uk/

Dec 25 '06 #13

On Sat, 23 Dec 2006, sp******@crayne.org wrote:
Mostly for testing reasons I'd like to see if it makes sense to chose
the following approach for just-in-time compilation of shaders for a
renderer:
Seeing as the shaders themsefs consist mostly of very basic operations
I'd like to translate them into assembly, have an assembler compile the
binary code and then call the resulting machine code from c++.

The thing is that up until now I have only used inline assembly in my
c++ projects, so there's a few things I hardly know anything about and
would be very greatful if anyone here could point me in the right
direction:
- Having a set of asm instructions, say "addl 5, %%eax" or "add eax, 5"
respectively, how would I go about translating just this one line into
binary? (in a way that doesn't mean i'll have to re-write the whole
thing when porting to a different os if at all possible :)
- How do I jump into the binary from my c++ app in a way that I can jmp
back at the end of my assembly code segment?

Thanks!
you can use or study the GNU lightning library:

http://www.gnu.org/software/lightning

GNU lightning is a library that generates assembly language code at
run-time; it is very fast, making it ideal for Just-In-Time compilers, and
it abstracts over the target CPU, as it exposes to the clients a
standardized RISC instruction set inspired by the MIPS and SPARC chips.

GNU lightning 1.0 has been released and is usable in complex code
generation tasks. The available backends cover the x86, SPARC and PowerPC
architectures; the floating point interface is still experimental though,
and developed for the x86 only.

regards,
lajos

Dec 27 '06 #14
Hi,

I have a project, SoftWire, that does exactly what you intend to do:
https://gna.org/projects/softwire/. It's free for use under the LGPL
license. Its commercial successor is used in SwiftShader, an advanced
software renderer.

Executing the binary code is as simple as treating the pointer to the
memory buffer as a function pointer, and calling it. Memory can be made
executable with the following code (straight from SoftWire):

#ifdef WIN32
unsigned long oldProtection;
VirtualProtect(machineCode, length, PAGE_EXECUTE_READWRITE,
&oldProtection); // #include <windows.h>
#elif __unix__
mprotect(machineCode, length, PROT_READ | PROT_WRITE | PROT_EXEC);
// #include <sys/mman.h>
#endif

Kind regards,

Nicolas Capens
sp******@crayne.org wrote:
Mostly for testing reasons I'd like to see if it makes sense to chose
the following approach for just-in-time compilation of shaders for a
renderer:
Seeing as the shaders themsefs consist mostly of very basic operations
I'd like to translate them into assembly, have an assembler compile the
binary code and then call the resulting machine code from c++.

The thing is that up until now I have only used inline assembly in my
c++ projects, so there's a few things I hardly know anything about and
would be very greatful if anyone here could point me in the right
direction:
- Having a set of asm instructions, say "addl 5, %%eax" or "add eax, 5"
respectively, how would I go about translating just this one line into
binary? (in a way that doesn't mean i'll have to re-write the whole
thing when porting to a different os if at all possible :)
- How do I jump into the binary from my c++ app in a way that I can jmp
back at the end of my assembly code segment?

Thanks!
Dec 27 '06 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Steven T. Hatton | last post by:
Is there anything that gives a good description of how source code is converted into a translation unit, then object code, and then linked. I'm particularly interested in understanding why putting...
9
by: cppaddict | last post by:
Let's say you want to implement a Dictionary class, which contains a vector of DictionaryEntry. Assume each DictionaryEntry has two members, a word and a definition. Now assume your program...
21
by: Morten Aune Lyrstad | last post by:
I wish to create my own assembly language for script. For now it is mostly for fun and for the sake of the learning, but I am also creating a game engine where I want this system in. Once the...
3
by: H. S. | last post by:
Hi, I am trying to compile these set of C++ files and trying out class inheritence and function pointers. Can anybody shed some light why my compiler is not compiling them and where I am going...
3
by: enfis.the.paladin | last post by:
Hi to all! I have something like this: class FWrap { public: virtual void READ (void) = 0; } class Optimized { private:
2
by: mclagett | last post by:
Can anyone please help me figure out why all of a sudden (even after many restarts of Visual Studio 2005 and reboots, etc) I am no longer able to set breakpoints in the disassembler window. I...
2
by: Jan Althaus | last post by:
Mostly for testing reasons I'd like to see if it makes sense to chose the following approach for just-in-time compilation of shaders for a renderer: Seeing as the shaders themsefs consist mostly...
3
by: Randy Yates | last post by:
Hi Folks, I have a cross-development problem in which I'm using the x86_64 version of Fedora Core 6 as a development system but want to build executables that are 32-bit. I've got a mix of C...
13
by: Analizer1 | last post by:
Hello all I have a idea...and dont know if it is possible...... we have a pretty huge system at work and we send EDI Special formatted Data to Several Other Companies, via sFtp,dial up, vpn...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.