472,353 Members | 1,411 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,353 software developers and data experts.

A simple lexer

I'm a royal n00b to writing translators, but you have to start
someplace.

In my Python project, I've decided that writing the dispatch code
to sit between the Glulx virtual machine and the Glk API will be
best done automatically, using the handy prototypes.

Below is the prototype of the lexer, and I'd like some comments
in case I'm doing something silly already.

My particular concern are:

The loop checks for each possible lexeme one at a time, and has
rather lame error checking.

I made regexes for matching a couple of really trivial cases for
the sake of consistency. In general, is there a better way to use
re module for lexing.

Ultimately, I'm going to need to build up an AST from the lines,
and use that to generate Python code to dispatch Glk functions. I
realize I'm throwing away the id of the lexeme right now;
suggestions on the best way to store that information are
welcome.

I do know of and have experimented with PyParsing, but for now I
want to use the standard modules. After I understand what I'm
doing, I think a PyParsing solution will be easy to write.

import re

def match(regex, proto, ix, lexed_line):
m = regex.match(proto, ix)
if m:
lexed_line.append(m.group())
ix = m.end()
return ix

def parse(proto):
""" Return a lexed version of the prototype string. See the
Glk specification, 0.7.0, section 11.1.4
>>parse('0:')
['0', ':']
>>parse('1:Cu')
['1', ':', 'Cu']
>>parse('2<Qb:Cn')
['2', '<', 'Qb', ':', 'Cn']
>>parse('4Iu&#![2SF]>+Iu:Is')
['4', 'Iu', '&#!', '[', '2', 'S', 'F', ']', '>+', 'Iu', ':', 'Is']
"""
arg_count = re.compile('\d+')
qualifier = re.compile('[&<>][+#!]*')
type_name = re.compile('I[us]|C[nus]|[SUF]|Q[a-z]')
o_bracket = re.compile('\\[')
c_bracket = re.compile('\\]')
colon = re.compile(':')
ix = 0
lexed_line = []
m = lambda regex, ix: match(regex, proto, ix, lexed_line)
while ix < len(proto):
old = ix
ix = m(arg_count, ix)
ix = m(qualifier, ix)
ix = m(type_name, ix)
ix = m(o_bracket, ix)
ix = m(c_bracket, ix)
ix = m(colon, ix)
if ix == old:
print "Parse error at %s of %s" % (proto[ix:], proto)
ix = len(proto)
return lexed_line

if __name__ == "__main__":
import doctest
doctest.testmod()

--
Neil Cerutti
We dispense with accuracy --sign at New York drug store
Jan 9 '07 #1
0 1161

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Simon Foster | last post by:
Anyone have any experience or pointers to how to go about creating a parser lexer for assemble in Python. I was thinking of using PLY but wonder...
1
by: Thorsten Claus | last post by:
Hi, what's the environment variable for in the scite configuration file for the currently used lexer/syntax highlight? I want to have the current...
6
by: Mike C# | last post by:
Hi all, Can anyone recommend a good and *easy to use* lexer and parser generator? Preferably one that was written specifically for VC++ and not...
26
by: jacob navia | last post by:
Summary: I have changed (as proposed by Chuck) the code to use isalpha() instead of (c>='a' && c <= 'z') etc. I agree that EBCDIC exists :-) ...
6
by: aparna881 | last post by:
Hello every one.. I am planning to make a lexcial analyzer in C++, the keywords, opertators etc for the language that the lexer would take lexmes...
4
by: Thomas | last post by:
Hello, I am a CS student and I want to write simple lisp interpreter. The code should be entierly in C. I don't want to use any compiler...
2
by: gasfusion | last post by:
Hey guys. I'm currently taking a course where everyone in the class will code a compiler for a simple C-like language. I'm doing mine in C# and have...
1
by: Itanium | last post by:
Hi all! I'm new to .NET Platform and got some simple questions about efficiency... To put you in situation, to say that I'm involved in the...
14
by: Thomas Mlynarczyk | last post by:
Hello, I started to write a lexer in Python -- my first attempt to do something useful with Python (rather than trying out snippets from...
1
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...
0
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.