By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,559 Members | 1,150 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,559 IT Pros & Developers. It's quick & easy.

compiling to python byte codes

P: n/a
Hi,

I remembered reading a MSc thesis about compiling Perl to Java bytecodes
(as in java class files). At least, it seems that someone had compiled
scheme to java class files quite successfully. I'm wondering if
something of such had been attempted in python, as in compiling X
language into .pyc. I do not understand the schematics of .pyc files but
I assume that they are the so called python bytecode files.

Or is there any documentation or books that is the python equivalent of
"Programming for the Java Virtual Machine" by Joshua Engel?

Thanks
Maurice
--
Maurice Han Tong LING, BSc(Hons)(MCB), AdvDipComp, SSN, FIFA
Doctor of Philosophy (Science) Candidate, The University of Melbourne
mobile: +61 4 22781753
+65 96669233
mailing address: Department of Zoology, The University of Melbourne
Royal Parade, Parkville, Victoria 3010, Australia
residential address: 9/41 Dover Street
Flemington, Victoria 3031, Australia
email: ma*********@acm.org
resume: http://maurice.vodien.com/maurice_resume.pdf
www: http://www.geocities.com/beldin79/

The information contained in this message, including its attachment(s),
is CONFIDENTIAL and solely intended to its addressee(s) only. The
content of this message, including its attachment(s), may be subjected
to copyright and privacy laws. If you have received this email in error,
please let me know by returning this email, and then destroy all copies.

"I cannot discover anyone knows enough to say definitely what is
and what is not possible" -Henry Ford
"The difference between the impossible and the possible lies
in a person's determination" -Tommy Charles Lasorda
Jul 18 '05 #1
Share this Question
Share on Google+
29 Replies


P: n/a
Maurice LING wrote:
I remembered reading a MSc thesis about compiling Perl to Java bytecodes
(as in java class files).
You don't have to look that far. Jython compiles Python code into Java
bytecode; IronPython compiles Python code into Microsoft intermediate
language.
I'm wondering if
something of such had been attempted in python, as in compiling X
language into .pyc.
The easiest way to create a .pyc file is to create a Python file,
and then compile that. There are various tools that compile X to
..pyc. For example, Fnorb compiles OMG IDL into .pyc files.
I do not understand the schematics of .pyc files but
I assume that they are the so called python bytecode files.
That's correct.
Or is there any documentation or books that is the python equivalent of
"Programming for the Java Virtual Machine" by Joshua Engel?


There is the dis module and its documentation. However, as I said, in
Python, you don't really *need* to create .pyc files directly, as
the Python compiler is always available through the compile() builtin
function. This is unlike Java or .NET, where the compiler is not
available in the JRE, or the .NET commercial framework.

Regards,
Martin
Jul 18 '05 #2

P: n/a
Maurice LING wrote:
Or is there any documentation or books that is the python equivalent of
"Programming for the Java Virtual Machine" by Joshua Engel?


Python's byte code isn't very stable, so you might have to recreate your
entire code base with every new Python version. I would suggest
generating Python code (not byte code) instead and compiling that.
Jul 18 '05 #3

P: n/a
"Martin v. L÷wis" <ma****@v.loewis.de> wrote in message news:<41*********************@news.freenet.de>...
Maurice LING wrote:
I remembered reading a MSc thesis about compiling Perl to Java bytecodes
(as in java class files).


You don't have to look that far. Jython compiles Python code into Java
bytecode; IronPython compiles Python code into Microsoft intermediate
language.
I'm wondering if
something of such had been attempted in python, as in compiling X
language into .pyc.


The easiest way to create a .pyc file is to create a Python file,
and then compile that. There are various tools that compile X to
.pyc. For example, Fnorb compiles OMG IDL into .pyc files.
I do not understand the schematics of .pyc files but
I assume that they are the so called python bytecode files.


That's correct.
Or is there any documentation or books that is the python equivalent of
"Programming for the Java Virtual Machine" by Joshua Engel?


There is the dis module and its documentation. However, as I said, in
Python, you don't really *need* to create .pyc files directly, as
the Python compiler is always available through the compile() builtin
function. This is unlike Java or .NET, where the compiler is not
available in the JRE, or the .NET commercial framework.

Regards,
Martin

But that still doesn't answer the OPs question which is about writing
code in another language to generate python bytecode....

Which is interesting.. but not that interesting I suppose.
Is python bytecode *that* different to Java bytecode (not in detail
but in concept ?). There's no reason why another compiler couldn't
emit python bytecode to run on the 'python virtual machine' ? (plenty
of reasons not to do it I suppose just no reasons why it shouldn't be
possible).

Regards,

Fuzzy

http://www.voidspace.org.uk/atlantib...thonutils.html
Jul 18 '05 #4

P: n/a
Maurice LING <ma*********@acm.org> writes:
Hi,

I remembered reading a MSc thesis about compiling Perl to Java
bytecodes (as in java class files). At least, it seems that someone
had compiled scheme to java class files quite successfully. I'm
wondering if something of such had been attempted in python, as in
compiling X language into .pyc.
Not to my knowledge. It wouldn't be very interesting: the Python
bytecode is pretty Python specific.
I do not understand the schematics of .pyc files but I assume that
they are the so called python bytecode files.

Or is there any documentation or books that is the python equivalent
of "Programming for the Java Virtual Machine" by Joshua Engel?


Nope. As others point out, the details tend to change each (major)
version of Python. The documentation for the standard library module
'dis' might help. You could also look at the 'bytecodehacks' package
(google, and make sure you get the CVS version).

Cheers,
mwh

--
39. Re graphics: A picture is worth 10K words - but only those
to describe the picture. Hardly any sets of 10K words can be
adequately described with pictures.
-- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Jul 18 '05 #5

P: n/a
Michael Foord wrote:
But that still doesn't answer the OPs question which is about writing
code in another language to generate python bytecode....
I did. I told him about the compile() function, and indeed,

compile("2+4","<string>","eval")

generates Python bytecode.
Is python bytecode *that* different to Java bytecode (not in detail
but in concept ?).
Yes. Java bytecode is typed; Python bytecode is not.
There's no reason why another compiler couldn't
emit python bytecode to run on the 'python virtual machine' ?


It is certainly possible. Indeed, the Python compiler does generate
Python bytecode from source code, so it must be possible :-)

Regards,
Martin

Jul 18 '05 #6

P: n/a
Hi Martin,

Martin v. L÷wis wrote:
Michael Foord wrote:
But that still doesn't answer the OPs question which is about writing
code in another language to generate python bytecode....

I did. I told him about the compile() function, and indeed,

compile("2+4","<string>","eval")

generates Python bytecode.


Can I feed a python source file into compile(), line by line, and expect
it to generate a working .pyc file? I suppose my intended use is to be
able to handle python codes written at run time, to execute python codes
line by line, in a python program. It is somewhat like a tracer routine
that can interpret a line of python code, read out the variables, before
going to the next line of python code. Can compile() do this, or do I
have to use pexpect to run an instance of python?
Is python bytecode *that* different to Java bytecode (not in detail
but in concept ?).

Yes. Java bytecode is typed; Python bytecode is not.
There's no reason why another compiler couldn't
emit python bytecode to run on the 'python virtual machine' ?


I was thinking that it may be simpler to say, write a PHP-to-Python
compiler which compiles PHP into an intermediate form, which is then
converted into python bytecodes, rather than trying to automate source
code conversion from PHP to Python. Well, PHP is just an off-hand
example, it may be COBOL or Pascal. Any ideas?

Regards
Maurice


It is certainly possible. Indeed, the Python compiler does generate
Python bytecode from source code, so it must be possible :-)

Regards,
Martin

Jul 18 '05 #7

P: n/a
Hi Leif,

Leif K-Brooks wrote:
Maurice LING wrote:
Or is there any documentation or books that is the python equivalent
of "Programming for the Java Virtual Machine" by Joshua Engel?

Python's byte code isn't very stable, so you might have to recreate your
entire code base with every new Python version. I would suggest
generating Python code (not byte code) instead and compiling that.


So are you suggesting that say I want to write a compiler to compile
Pascal to Python Virtual Machine, it will be wiser to do source code
conversion from Pascal to Python?

maurice
Jul 18 '05 #8

P: n/a
Maurice LING wrote:
Hi Leif,

Leif K-Brooks wrote:
Maurice LING wrote:
Or is there any documentation or books that is the python equivalent
of "Programming for the Java Virtual Machine" by Joshua Engel?


Python's byte code isn't very stable, so you might have to recreate
your entire code base with every new Python version. I would suggest
generating Python code (not byte code) instead and compiling that.

So are you suggesting that say I want to write a compiler to compile
Pascal to Python Virtual Machine, it will be wiser to do source code
conversion from Pascal to Python?

maurice


What about generating an Abstract Syntax Tree (compiler.ast) and using
the compiler module (compiler.pycodegen) to write the bytecode?
Jul 18 '05 #9

P: n/a
On Thu, 02 Sep 2004 23:10:11 +0000, Maurice LING wrote:
Can I feed a python source file into compile(), line by line, and expect
it to generate a working .pyc file? I suppose my intended use is to be
able to handle python codes written at run time, to execute python codes
line by line, in a python program. It is somewhat like a tracer routine
that can interpret a line of python code, read out the variables, before
going to the next line of python code. Can compile() do this, or do I
have to use pexpect to run an instance of python?


Why don't you clearly spell out your intended use and ask about that,
instead?

If, based on your use of "I suppose" and "somewhat", you are still unclear
on your intended use, figuring that out would be step #1. :-)

Many good modules exist for many things already; if you're trying to trace
for instance, there is a module for that. Let's start at the beginning:
What are you trying to do?

Jul 18 '05 #10

P: n/a
Jeremy Bowers wrote:
On Thu, 02 Sep 2004 23:10:11 +0000, Maurice LING wrote:
Can I feed a python source file into compile(), line by line, and expect
it to generate a working .pyc file? I suppose my intended use is to be
able to handle python codes written at run time, to execute python codes
line by line, in a python program. It is somewhat like a tracer routine
that can interpret a line of python code, read out the variables, before
going to the next line of python code. Can compile() do this, or do I
have to use pexpect to run an instance of python?

Why don't you clearly spell out your intended use and ask about that,
instead?

If, based on your use of "I suppose" and "somewhat", you are still unclear
on your intended use, figuring that out would be step #1. :-)

Many good modules exist for many things already; if you're trying to trace
for instance, there is a module for that. Let's start at the beginning:
What are you trying to do?


I am using SBML (system biology markup language) as a front-end
modelling language for my project. And for ease of further maintenance
of the model and interoperability purposes, my project requires me to
convert it into an intermediate form (MA), which is somewhat assembly is
structure, as in, each instruction takes the form of <opcode>
<operand>*. Here I am, attempting to write a virtual machine that can
run MA, using python. So, it becomes a MA virtual machine running on
python virtual machine.

My concern is, is it simpler to convert MA to python codes or python
bytecodes. What are the pros and cons? Assuming that to convert to
python source code is a choice, I'm thinking that MA virtual machine can
then read a MA instruction and output the corresponding python source
codes, but are there facilities in python to run python codes, line by
line, as it is being thrown out by MA virtual machine?

As a side note, does anyone think that this project might be suitable
enough to apply for PSF Grant?

Thanks
Maurice

--
Maurice Han Tong LING, BSc(Hons)(MCB), AdvDipComp, SSN, FIFA
Doctor of Philosophy (Science) Candidate, The University of Melbourne
mobile: +61 4 22781753
+65 96669233
mailing address: Department of Zoology, The University of Melbourne
Royal Parade, Parkville, Victoria 3010, Australia
residential address: 9/41 Dover Street
Flemington, Victoria 3031, Australia
email: ma*********@acm.org
resume: http://maurice.vodien.com/maurice_resume.pdf
www: http://www.geocities.com/beldin79/

The information contained in this message, including its attachment(s),
is CONFIDENTIAL and solely intended to its addressee(s) only. The
content of this message, including its attachment(s), may be subjected
to copyright and privacy laws. If you have received this email in error,
please let me know by returning this email, and then destroy all copies.

"I cannot discover anyone knows enough to say definitely what is
and what is not possible" -Henry Ford
"The difference between the impossible and the possible lies
in a person's determination" -Tommy Charles Lasorda
Jul 18 '05 #11

P: n/a
On Fri, 03 Sep 2004 00:48:55 +0000, Maurice LING wrote:
I am using SBML (system biology markup language) as a front-end
modelling language for my project. And for ease of further maintenance
of the model and interoperability purposes, my project requires me to
convert it into an intermediate form (MA), which is somewhat assembly is
structure, as in, each instruction takes the form of <opcode>
<operand>*. Here I am, attempting to write a virtual machine that can
run MA, using python. So, it becomes a MA virtual machine running on
python virtual machine.


Hmmm, could you post an example of this assembly-like code? It might be
easiest to implement a Python interpreter directly; if the assembly-like
code is simple enough it isn't even worth a true parser.

Without knowing about your code, I can't be sure, but I would be surprised
if MA is similar enough to Python to make it worth running MA on the
Python machine directly.

Assembly language is right up there with LISP (without macros) in terms of
ease of parsing, if no opcode ever crosses multiple lines.
Jul 18 '05 #12

P: n/a
Jeremy Bowers wrote:

Hmmm, could you post an example of this assembly-like code? It might be
easiest to implement a Python interpreter directly; if the assembly-like
code is simple enough it isn't even worth a true parser.
What do you mean by implementing a Python interpreter directly? Sorry, I
am unable to provide an example of this assembly-like code. This is
currently still unpublished work, so I'm not able to disclose much,
especially in a public forum.
Without knowing about your code, I can't be sure, but I would be surprised
if MA is similar enough to Python to make it worth running MA on the
Python machine directly.
Do you think that there is very slight chance that it is worthwhile
converting MA directly to python bytecodes? This is how I read it.
Please tell me if I've misunderstood you.

Assembly language is right up there with LISP (without macros) in terms of
ease of parsing, if no opcode ever crosses multiple lines.


Some parts of MA is still undergoing development and cleaning up but I
certainly do not see why any opcode should cross multiple lines. As far
as I can see, 70% of the opcodes are able to be represented by multiple
lines of python codes. I've not thought hard enough on this yet.

All I can say is that MA looks similar to any assembly is structure,
with directives.

Sorry that I am not able to disclose much, but hope to get some opinions
based on what I can say.

Thank you,
Maurice
Jul 18 '05 #13

P: n/a
Jason Lai wrote:
[talking about compiling some language besides Python to Python bytecode]
What about generating an Abstract Syntax Tree (compiler.ast) and using
the compiler module (compiler.pycodegen) to write the bytecode?


That would certainly be possible, but it seems to me like it might be
easier to generate Python code. You're using Python logic if you use its
AST, after all.
Jul 18 '05 #14

P: n/a
Maurice LING wrote:

Some parts of MA is still undergoing development and cleaning up but I
certainly do not see why any opcode should cross multiple lines. As far
as I can see, 70% of the opcodes are able to be represented by multiple
lines of python codes. I've not thought hard enough on this yet.

All I can say is that MA looks similar to any assembly is structure,
with directives.


Well, x86/PPC assembly operates on registers. Python uses a stack.
See http://docs.python.org/lib/bytecodes.html

If you're using registers, I guess you'd have to store the values in
variables, and load/store them through the stack whenever you do an
operation -- maybe with some optimization if you can keep the result on
the stack.

Python pretty much only lets you run a block of code at once (using exec
or eval). So if you compile it line by line on the fly, your VM would
have to ask Python to run each line, and take care of unstructured jumps
itself. Python doesn't really like arbitrary gotos anyway. I assume if
you were translating to Python code, you'd have to have the whole block
for if, while, etc, ahead of time. Or only jump backwards, since you
can't jump to something that hasn't been written yet.

- Jason
Jul 18 '05 #15

P: n/a
>
Well, x86/PPC assembly operates on registers. Python uses a stack.
See http://docs.python.org/lib/bytecodes.html

If you're using registers, I guess you'd have to store the values in
variables, and load/store them through the stack whenever you do an
operation -- maybe with some optimization if you can keep the result on
the stack.


I dont't quite get this right. Since x86/PPC uses register operations,
why do virtual machines, like python's and java's, are designed as stack
machines? Why not just stick to registers?

maurice
Jul 18 '05 #16

P: n/a
Maurice LING wrote:
Can I feed a python source file into compile(), line by line, and expect
it to generate a working .pyc file?
compile() requires the complete source code. However, it might be that
the complete source code is just a single statement, or a single
function exectuting a single statement.

Did you read the documentation of compile()? It would have told you that
compile() does not generate .pyc files at all. Instead, it generates
code objects, and you use the marshal module to save them into .pyc
files.
I suppose my intended use is to be
able to handle python codes written at run time, to execute python codes
line by line, in a python program.
Single-step execution is an issue entire independent of generating
Python bytecode, or source code. Regardless of how you have generated
the Python bytecode (directly, or through source code), Python supports
single-stepping of byte code (on a line-per-line-of-source-code basis).

However, when you generate Python code (source or byte), and you know
you are going to need single-stepping, you should put single-stepping
*into the generated code*. I.e. if your input language reads

action 1
action 2
action 3

you generate

def program():
starting()
do_action_1()
step_done()
do_action_2()
step_done()
do_action_3()
step_done()

Then, a proper implementation of step_done() will allow for user
interaction, giving you single-step capabilities.
I was thinking that it may be simpler to say, write a PHP-to-Python
compiler which compiles PHP into an intermediate form, which is then
converted into python bytecodes, rather than trying to automate source
code conversion from PHP to Python. Well, PHP is just an off-hand
example, it may be COBOL or Pascal. Any ideas?


No. Generating source code is *always* simpler. There are three reasons
why one would not generate source code even though it is simpler:
- you don't have a compiler for the source code available on the target
system. This is the compile-to-JVM example.
- the compiler for the source language is gives inefficient byte code,
and you can do better. Although it is theoretically possible, it is
unlikely to happen in practice (not because the compiler is already
optimal, but because it is very difficult to do better - if it wasn't,
the authors of the compiler would have improved it already)
- certain VM opcodes are not available through source code. This sounds
theoretical, too - why would the VM include opcodes that will never
occur in practice? The real-world example is .NET, though, which
supports many languages, and thus supports constructs not available
in, say, C# (like global fields). This is not the case for Python,
though: Python uses virtually all of its opcodes.

Regards,
Martin
Jul 18 '05 #17

P: n/a
Maurice LING wrote:
Hmmm, could you post an example of this assembly-like code? It might be
easiest to implement a Python interpreter directly; if the assembly-like
code is simple enough it isn't even worth a true parser.

What do you mean by implementing a Python interpreter directly? Sorry, I
am unable to provide an example of this assembly-like code. This is
currently still unpublished work, so I'm not able to disclose much,
especially in a public forum.


He didn't mean to suggest that that you write an interpreter *of*
Python, but an interpreter *of* your language *in* Python. Instead
of compiling your intermediate language into Python bytecode,
directly implement the VM (if you prefer that term over "interpreter")
for MA in Python.

Regards,
Martin
Jul 18 '05 #18

P: n/a
Maurice LING wrote:
If you're using registers, I guess you'd have to store the values in
variables, and load/store them through the stack whenever you do an
operation -- maybe with some optimization if you can keep the result
on the stack.

I dont't quite get this right. Since x86/PPC uses register operations,
why do virtual machines, like python's and java's, are designed as stack
machines? Why not just stick to registers?


I really think you should study programming language implementations
for some time before approaching your problem.

For an interpreter, what the processor does is completely irrelevant
(not completely if you have a just-in-time compiler, as that needs
to generate machine code, but totally irrelevant if you have an
interpreter). Using a stack-based implementations allows to simplify
the opcodes - many opcodes don't need parameters, or atmost a single
parameter. This allows to survive with less than 256 opcodes, which
is the source for calling these opcodes "byte code". That, in turn,
allows for an implementation that uses an "interpreter loop", which
consists of a "giant switch".

In x86, a single instruction has between 1 and 20 bytes, and the
decoding process (finding out what the instruction does) is
very lengthy. For a microprocessor, this doesn't matter, since it
is done in hardware, and in parallel with executing other
instructions (pipelining). For an interpreter, the decoding process
must be superfast, and therefore supersimple.

Regards,
Martin
Jul 18 '05 #19

P: n/a
Maurice LING wrote:

I dont't quite get this right. Since x86/PPC uses register operations,
why do virtual machines, like python's and java's, are designed as stack
machines? Why not just stick to registers?


Because stacks are common to _all_ processors, where registers are differing
from architecture to architecture - the x86 hasn't been very gifted in that
respect (not sure if that changed recently - at least the SIMD instructions
introduced registers, but you can't rely on that beeing available)

So resorting to stacks makes the implementation totally independend of the
actual processor architecture - and stacks are as good as registers in
terms of abstract use.

What a JIT then does is purely up to its implementors - but thats another
topic.

--
Regards,

Diez B. Roggisch
Jul 18 '05 #20

P: n/a
On Fri, 03 Sep 2004 09:42:58 +0200, Martin v. L÷wis wrote:
Maurice LING wrote:
I dont't quite get this right. Since x86/PPC uses register operations,
why do virtual machines, like python's and java's, are designed as stack
machines? Why not just stick to registers?


I really think you should study programming language implementations
for some time before approaching your problem.


With respect Maurice, I think I have to agree on this.

In your situation, I can promise you that it is *faster* to take the time
to learn about this stuff correctly then to try to power through it
without learning; it is one of those places in computer technology
where there are such powerful tools to help you that it is better to
learn how to use them then to kludge through. Unfortunately, it is too
large a topic to cover in a Usenet posting.

If your institution offers a compilers class (a sadly diminishing number),
try to take or audit that. (You most likely don't want a *compiler*, but
an *interpreter*; the course will explain the difference. An interpreter
typically uses much the same technology to implement, parsers and abstract
syntax trees and such, but is usually much easier to implement.) (I think
you hinted this was thesis project, hence this suggestion. Failing that,
you may need a compilers book and some self-study time. Again, I promise
you this is faster almost immediately than trying to power through this
without it.)

If you are responsible for creating the opcodes directly, you may find a
better way to do what you are doing anyhow. Assembly is easy to implement
but (with apologies to the more experienced among us) sucks to program in.

Stepping up a level, are you sure you can't just implement a C or Python
library and let people write their own programs in Python? You'll never be
able to match Python-the-language's feature set.
Jul 18 '05 #21

P: n/a
>
In your situation, I can promise you that it is *faster* to take the time
to learn about this stuff correctly then to try to power through it
without learning; it is one of those places in computer technology
where there are such powerful tools to help you that it is better to
learn how to use them then to kludge through. Unfortunately, it is too
large a topic to cover in a Usenet posting.
I realised that there are powerful tools such as lex and yacc around
that can save me a lot of time. I'll be using PLY for my purpose.

If your institution offers a compilers class (a sadly diminishing number),
try to take or audit that. (You most likely don't want a *compiler*, but
an *interpreter*; the course will explain the difference. An interpreter
typically uses much the same technology to implement, parsers and abstract
syntax trees and such, but is usually much easier to implement.) (I think
you hinted this was thesis project, hence this suggestion. Failing that,
you may need a compilers book and some self-study time. Again, I promise
you this is faster almost immediately than trying to power through this
without it.)
I can only have the self-study options and good books on compiler
construction are rare. I am a molecular biologist by professional
training. There are things that are tough for me to understand and to
just find the answer about stacks vs register computers will take ages,
and I always appreciate people who do not treat me as an idiot. I'm sure
there are much more idiotic questions being asked in newsgroups.

Stepping up a level, are you sure you can't just implement a C or Python
library and let people write their own programs in Python? You'll never be
able to match Python-the-language's feature set.


What I'm doing is a special-purpose language (for modelling purposes).
Jul 18 '05 #22

P: n/a
Maurice LING wrote:
I can only have the self-study options and good books on compiler
construction are rare. I am a molecular biologist by professional
training. There are things that are tough for me to understand and to
just find the answer about stacks vs register computers will take ages,
and I always appreciate people who do not treat me as an idiot. I'm sure
there are much more idiotic questions being asked in newsgroups.


I certainly did not mean to declare you an idiot. Instead, I tried to
point out that this is a complex topic, one where a Usenet thread can
hardly give sufficient introduction. Instead, in such threads, posters
typically assume common background, with respect to grammars, syntax,
abstract syntax, intermediate representation (using trees or opcodes),
interpretation vs. compilation, and so on.

Regards,
Martin
Jul 18 '05 #23

P: n/a
Probably my question should be phrased as, given what x86/PPC processors
are register-based (even after more than a decade from the publication
of the book "Stack Machines - the new wave") and there isn't much
examples of stack-based processors, why is there a difference? It seems
wierd to me that if stack-based machines (physical processors or VMs)
are so good, why hadn't the processor engineering caught up?

You've totally missed my question but thanks anyway, I've learnt. My
actual question had been partially answered.

Thanks
Jul 18 '05 #24

P: n/a
Maurice LING wrote:
Probably my question should be phrased as, given what x86/PPC processors
are register-based (even after more than a decade from the publication
of the book "Stack Machines - the new wave") and there isn't much
examples of stack-based processors, why is there a difference? It seems
wierd to me that if stack-based machines (physical processors or VMs)
are so good, why hadn't the processor engineering caught up?


Stack-based microprocessors would be very inefficient. If you don't
have registers, every operation will need to access the stack, which
is an access to main memory, which is expensive. The counter-argument
of interpreters against registers (difficult to decode opcodes, long
opcodes) does not hold for microprocessors, as they can decode the
instruction in parallel with doing other things (which an interpreter
couldn't).

For interpreters, the same rationale does not hold - even registers
would live in main memory, so there would be no performance gained.

Virtual machines are quite different from real machines, in many
respects.

Regards,
Martin
Jul 18 '05 #25

P: n/a
On Sun, 05 Sep 2004 23:20:59 +0000, Maurice LING wrote:

In your situation, I can promise you that it is *faster* to take the time
to learn about this stuff correctly then to try to power through it
without learning; it is one of those places in computer technology
where there are such powerful tools to help you that it is better to
learn how to use them then to kludge through. Unfortunately, it is too
large a topic to cover in a Usenet posting.
I realised that there are powerful tools such as lex and yacc around
that can save me a lot of time. I'll be using PLY for my purpose.


That helps.
I can only have the self-study options and good books on compiler
construction are rare. I am a molecular biologist by professional
training. There are things that are tough for me to understand and to
just find the answer about stacks vs register computers will take ages,
and I always appreciate people who do not treat me as an idiot. I'm sure
there are much more idiotic questions being asked in newsgroups.
I'm not sure if you feel I'm treating you as an idiot, or if you mean that
literally. Regardless, it isn't my intent. It is challenging because we
end up unable to share vocabulary.

Stepping up a level, are you sure you can't just implement a C or Python
library and let people write their own programs in Python? You'll never be
able to match Python-the-language's feature set.


What I'm doing is a special-purpose language (for modelling purposes).


OK, makes sense.

Now that you say you are using PLY, that at least gives us a common frame
of reference with code. Take a look at the calc.py file that should have
been included with your PLY distribution.

It implements a simple calculator interpreter. It is an "interpreter"
because as it encounters the input, it is dynamically executing it. A
"compiler" would actually just store it as a tree, then later output it
into some other format without execution, which is what a C++ compiler
does, outputting the opcodes for the CPU.

Because of that extra step in the middle, "building a tree", a compiler is
typically harder to write than an interpreter. Getting the output right
can also be tricky, and a challenge to debug.

Thus, in terms of PLY, my suggestion has been to write an interpreter,
like calc.py, not a compiler. Unfortunately, like I said earlier, control
flow can be a pain, because you can't execute directly like calc.py does.

You want something else, though, that builds something and then executes
it later.
Jul 18 '05 #26

P: n/a
Maurice LING wrote:
Probably my question should be phrased as, given what x86/PPC processors
are register-based (even after more than a decade from the publication
of the book "Stack Machines - the new wave") and there isn't much
examples of stack-based processors, why is there a difference?


If you're implementing a machine in hardware, access to
registers is much faster than access to memory. Since the
current trend in hardware design seems to be "as fast as
possible, whatever it takes", today's architectures are
increasingly register-based.

But with an interpreter, things are very different. The
"registers" of the VM probably aren't going to be real
registers, but memory locations. Even if you do manage to
keep them in real registers, the time spent accessing them
is going to be small compared to the time spent fetching
instructions, decoding them and figuring out what operands
they refer to, so the speed advantage would be quite minimal.

Given that, and the fact that stack architectures are much
easier to generate code for, it's not surprising that most
VMs tend to have stack architectures.

Greg
Jul 18 '05 #27

P: n/a
> I'm not sure if you feel I'm treating you as an idiot, or if you mean that
literally. Regardless, it isn't my intent. It is challenging because we
end up unable to share vocabulary.

Figurative speech intended here. I wish to maintain the thought that
python users are helpful. :)

It implements a simple calculator interpreter. It is an "interpreter"
because as it encounters the input, it is dynamically executing it. A
"compiler" would actually just store it as a tree, then later output it
into some other format without execution, which is what a C++ compiler
does, outputting the opcodes for the CPU.
I'm looking at generating either python source or MA (intermediate
representation in assembly-like form) on the fly (as the lines are being
interpreted). The design of MA (2-operands code) works pretty much like
functions themselves, as in, each "opcode" can be represented by a
function in python (or any other language, I presume) and the "operands"
are like parameters. When MA was thought of, it was meant to target to
Java, but I suppose it is possible to target to python.

So I think the "MA virtual machine" is like a python library.

Because of that extra step in the middle, "building a tree", a compiler is
typically harder to write than an interpreter. Getting the output right
can also be tricky, and a challenge to debug.


I can see this coming. It may be tricky to isolate the error to
tree-building or the test codes.
Jul 18 '05 #28

P: n/a
Maurice LING wrote:
Sorry if I had misunderstood your intentions. VM is something of a
rather old concept, since Forth days, but is revived for the sake of
portability. With the exception of the book by Joshua Engel, I've not
seen any books that is devoted on VM. Do you know of any?
One of the older ones is

Goldberg, Robson. Smalltalk-80: The Language and its Implementation.
Addison-Wesley, 1983.

A thing that just turned up in a Google search is

Christian Queinnec. Lisp in Small Pieces.
Cambridge University Press, 1996

This covers 11 interpreters and 2 compilers.

For Scheme, there is an online book

http://www.cs.utexas.edu/users/wilso...intro_toc.html

There also is an Icon book

Ralph E. Griswold and Madge T. Griswold.
The Implementation of the Icon Programming Language
Princeton University Press, 1986

On the language-independent/cross-language side, we have

Samuel Kamin. Languages: An Interpreter-based Approach
Addison-Wesley, 1990.

and, of course

Aho, Sethi, Ullman
Compilers : Principles, Techniques, and Tools.
Addison-Wesley, 1988 (with many reprints)
As for books on compiler construction, many explains the same topics and
it doesn't quite help when something I want to know is so precise, or
are just too old for any good use.


It turns out that this is an area of computing that is very old
(compared to the total age of electronic computing), and many of its
foundations have been built years ago. So even the old books are still
"valid".

Now, for *specific* questions, Usenet is the right medium, although
comp.compilers may be a better forum.

Regards,
Martin
Jul 18 '05 #29

P: n/a
In article <41********@news.unimelb.edu.au>,
Maurice LING <ma*********@acm.org> wrote:
Jeremy Bowers wrote:
On Thu, 02 Sep 2004 23:10:11 +0000, Maurice LING wrote:

I am using SBML (system biology markup language) as a front-end
modelling language for my project. And for ease of further maintenance
of the model and interoperability purposes, my project requires me to
convert it into an intermediate form (MA), which is somewhat assembly is
structure, as in, each instruction takes the form of <opcode>
<operand>*. Here I am, attempting to write a virtual machine that can
run MA, using python. So, it becomes a MA virtual machine running on
python virtual machine.

My concern is, is it simpler to convert MA to python codes or python
bytecodes. What are the pros and cons? Assuming that to convert to
python source code is a choice, I'm thinking that MA virtual machine can
then read a MA instruction and output the corresponding python source
codes, but are there facilities in python to run python codes, line by
line, as it is being thrown out by MA virtual machine?


The Python virtual machine is not something that is fixed. It may change
between Python versions. For this reason, it is a bad idea to generate
bytecode directly, since you may have to redo the work many times. It is
much better to have Python as the target language of your SBML compilation.

Jacob HallÚn

--
Jul 18 '05 #30

This discussion thread is closed

Replies have been disabled for this discussion.