A simple lexer

Neil Cerutti

I'm a royal n00b to writing translators, but you have to start
someplace.

In my Python project, I've decided that writing the dispatch code
to sit between the Glulx virtual machine and the Glk API will be
best done automatically, using the handy prototypes.

Below is the prototype of the lexer, and I'd like some comments
in case I'm doing something silly already.

My particular concern are:

The loop checks for each possible lexeme one at a time, and has
rather lame error checking.

I made regexes for matching a couple of really trivial cases for
the sake of consistency. In general, is there a better way to use
re module for lexing.

Ultimately, I'm going to need to build up an AST from the lines,
and use that to generate Python code to dispatch Glk functions. I
realize I'm throwing away the id of the lexeme right now;
suggestions on the best way to store that information are
welcome.

I do know of and have experimented with PyParsing, but for now I
want to use the standard modules. After I understand what I'm
doing, I think a PyParsing solution will be easy to write.

import re

def match(regex, proto, ix, lexed_line):
m = regex.match(proto, ix)
if m:
lexed_line.append(m.group())
ix = m.end()
return ix

def parse(proto):
""" Return a lexed version of the prototype string. See the
Glk specification, 0.7.0, section 11.1.4

>>parse('0:')

['0', ':']

>>parse('1:Cu')

['1', ':', 'Cu']

>>parse('2<Qb:Cn')

['2', '<', 'Qb', ':', 'Cn']

>>parse('4Iu&#![2SF]>+Iu:Is')

['4', 'Iu', '&#!', '[', '2', 'S', 'F', ']', '>+', 'Iu', ':', 'Is']
"""
arg_count = re.compile('\d+')
qualifier = re.compile('[&<>][+#!]*')
type_name = re.compile('I[us]|C[nus]|[SUF]|Q[a-z]')
o_bracket = re.compile('\\[')
c_bracket = re.compile('\\]')
colon = re.compile(':')
ix = 0
lexed_line = []
m = lambda regex, ix: match(regex, proto, ix, lexed_line)
while ix < len(proto):
old = ix
ix = m(arg_count, ix)
ix = m(qualifier, ix)
ix = m(type_name, ix)
ix = m(o_bracket, ix)
ix = m(c_bracket, ix)
ix = m(colon, ix)
if ix == old:
print "Parse error at %s of %s" % (proto[ix:], proto)
ix = len(proto)
return lexed_line

if __name__ == "__main__":
import doctest
doctest.testmod()

--
Neil Cerutti
We dispense with accuracy --sign at New York drug store

Jan 9 '07 #1

Subscribe Post Reply

1201

Similar topics

Assembler Parser/Lexer in Python

by: Simon Foster | last post by:

Anyone have any experience or pointers to how to go about creating a parser lexer for assemble in Python. I was thinking of using PLY but wonder whether it's too heavyweight for what I want. ...

Python

scite variable for current lexer (markup)

by: Thorsten Claus | last post by:

Hi, what's the environment variable for in the scite configuration file for the currently used lexer/syntax highlight? I want to have the current markup in the statusbar... Thorsten

Python

Lexer/Parser Generator Recommendations?

by: Mike C# | last post by:

Hi all, Can anyone recommend a good and *easy to use* lexer and parser generator? Preferably one that was written specifically for VC++ and not mangled through 20 different platforms. I've had...

.NET Framework

A simple parser

by: jacob navia | last post by:

Summary: I have changed (as proposed by Chuck) the code to use isalpha() instead of (c>='a' && c <= 'z') etc. I agree that EBCDIC exists :-) I eliminated the goto statement, obviously it is...

C / C++

Writting a lexer program in C++

by: aparna881 | last post by:

Hello every one.. I am planning to make a lexcial analyzer in C++, the keywords, opertators etc for the language that the lexer would take lexmes from are all given to us as a set of requirements....

C / C++

simple lisp interpreter help needed

by: Thomas | last post by:

Hello, I am a CS student and I want to write simple lisp interpreter. The code should be entierly in C. I don't want to use any compiler generators like Bison or Yak, since wrinting this in...

C / C++

Designing a language interpreter & compiler for a simple language

by: gasfusion | last post by:

Hey guys. I'm currently taking a course where everyone in the class will code a compiler for a simple C-like language. I'm doing mine in C# and have some general questions for efficiency and the...

.NET Framework

C#-APP: Simple efficiency related questions in C#

by: Itanium | last post by:

Hi all! I'm new to .NET Platform and got some simple questions about efficiency... To put you in situation, to say that I'm involved in the writing of a complex regex based lexer for use over...

.NET Framework

My first Python program -- a lexer

by: Thomas Mlynarczyk | last post by:

Hello, I started to write a lexer in Python -- my first attempt to do something useful with Python (rather than trying out snippets from tutorials). It is not complete yet, but I would like some...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing