| re: Assembler Parser/Lexer in Python
Simon Foster wrote:
[color=blue]
> Anyone have any experience or pointers to how to go about creating
> a parser lexer for assemble in Python. I was thinking of using PLY
> but wonder whether it's too heavyweight for what I want. Anyone have
> any thoughts?[/color]
There are, of course, lots of tools available to help you out with
this, but as you probably realize, most of them have heavyweight
features which help out for higher-level languages, but not really
so much for assembler. Also, most of them will probably not give you
great help for assembly language macros, which are typically more
full-featured than C macros, in that they know something about
the actual program being built. (Please note that I am _not_
cross-posting this to comp.lang.lisp, and also that if you want
to parse pre-existing assembly language, it will look _nothing_
like lisp, so the built-in parser wouldn't help you out in any
case. Also note that YMMV, but while I use macros _extensively_
in assembly language, I have personally never felt the necessity
of having any sort of macro processor in Python :)
I had a similar problem, in maintaining a system with over 10MB of
crufty ancient assembly language. I had conflicting goals of wanting
to use Python so I could easily and correctly do different things with
the source code (code rewriting, automatic HTML generation, some
lint-like operations, etc.) and wanting operations to complete rapidly
so that I could do some of it in the typical Python experimental mode.
I wrote a lexer using a tiny bit of C and a Pyrex wrapper. The
partitioning was such that the C code knows nothing about Python,
and the Pyrex interface handles the higher layers of the tokenization.
The lexer performs a single tokenization pass over an entire file,
(with the Pyrex calling the C code once per line) and returns a list
of token tuples (one tuple per line). Macro lines which invoke text-
pasting operations are flagged, and the lexer is re-run on these
lines when they are encountered at parse time.
A separate (and very simple!) Python script generates a .h file
which contains the lowest-level lexer tables.
I did an earlier version of this in mxTextTools, which is not too
bad for such a thing if a) for whatever reason, you don't want to
write your own C extensions, and b) you're not doing too much
maintenance on the actual lexer.
I also played around with re, but if you do that, you will quickly
come to realize why lexer generators are popular :)
More recently, I played a little bit with psyco. If I didn't care
quite as much about speed and didn't already have the C code, I
might consider one of the existing parser/lexer generators in
conjunction with psyco. Unfortunately, I've only dabbled in
a very minor way with some of these packages, so I couldn't begin
to compare their strengths and weaknesses.
Hope this helps.
Pat |