On Nov 11, 1:59*pm, André <andre.robe...@ gmail.comwrote:
Hi everyone,
I would like to implement a parser for a mini-language
and would appreciate some pointers. *The type of
text I would like to parse is an extension of:
http://www.websequencediagrams.com/examples.html
For those that don't want to go to the link, consider
the following, *very* simplified, example:
=======
programmer Guido
programmer "Fredrik Lundh" as effbot
programmer "Alex Martelli" as martellibot
programmer "Tim Peters" as timbot
note left of effbot: cutting sense of humor
note over martellibot:
* * Offers detailed note, explaining a problem,
* * accompanied by culinary diversion
* * to the delight of the reader
note over timbot: programmer "clever" as fox
timbot -Guido: I give you doctest
Guido --timbot: Have you checked my time machine?
=======
From this, I would like to be able to extract
("programmer ", "Guido")
("programmer as", "Fredrik Lundh", "effbot")
...
("note left of", "effbot", "cutting sense of humor")
("note over", "martellibo t", "Offers..." )
("note over", "timbot", 'programmer "clever" as fox')
Even if you choose not to use pyparsing, a pyparsing example might
give you some insights into your problem. See how the grammar is
built up from separate pieces. Parse actions in pyparsing implement
callbacks to do parse-time conversion - in this case, the multiline
note body is converted from the parsed list of separate strings into a
single newline-separated string.
Here is the pyparsing example:
from pyparsing import Suppress, Combine, LineEnd, Word, alphas,
alphanums,\
quotedString, Keyword, Optional, oneOf, restOfLine, indentedBlock,
\
removeQuotes,em pty,OneOrMore,G roup
# used to manage indentation levels when parsing indented blocks
indentstack = [1]
# define some basic punctuation and terminal words
COLON = Suppress(":")
ARROW = Combine(Word('-')+'>')
NL = LineEnd().suppr ess()
ident = Word(alphas,alp hanums+"-_")
quotedString.se tParseAction(re moveQuotes)
# programmer definition
progDefn = Keyword("progra mmer") + Optional(quoted String("alias") + \
Optional("as")) + ident("name")
# new pyparsing idiom - embed simple asserts to verify bits of the
# overall grammar in isolation
assert "programmer Guido" == progDefn
assert 'programmer "Tim Peters" as timbot' == progDefn
# note specification - only complicated part is the indented block
# form of the note we use a pyparsing parse action to convert the
# nested token lists into a multiline string
OF = Optional("of")
notelocn = oneOf("over under") | "left" + OF | "right" + OF
notetext = restOfLine.setN ame("notetext")
noteblock = indentedBlock(n otetext, indentstack).se tName("notebloc k")
noteblock.setPa rseAction(lambd a t:'\n'.join(tt[0] for tt in t[0]))
note = Keyword("note") + notelocn("locat ion") + ident("subject" ) +
COLON + \
(~NL + empty + notetext("note" ) | noteblock("note ") )
assert 'note over timbot: programmer "clever" as fox ' == note
# message definition
msg = ident("from") + ARROW + ident("to") + COLON + empty + notetext
("note")
assert 'Guido --timbot: Have you checked my time machine?' == msg
# a seqstatement is one of these 3 types of statements
seqStatement = progDefn | note | msg
# parse the sample text
parsedStatement s = OneOrMore(Group (seqStatement)) .parseString(se qtext)
# print out token/field dumps for each statement
for s in parsedStatement s:
print s.dump()
Prints:
['programmer', 'Guido']
- name: Guido
['programmer', 'Fredrik Lundh', 'as', 'effbot']
- alias: Fredrik Lundh
- name: effbot
['programmer', 'Alex Martelli', 'as', 'martellibot']
- alias: Alex Martelli
- name: martellibot
['programmer', 'Tim Peters', 'as', 'timbot']
- alias: Tim Peters
- name: timbot
['note', 'left', 'of', 'effbot', 'cutting sense of humor ']
- location: left
- note: cutting sense of humor
- subject: effbot
['note', 'over', 'martellibot', 'Offers ...']
- location: over
- note: Offers detailed note, explaining a problem,
accompanied by culinary diversion
to the delight of the reader
- subject: martellibot
['note', 'over', 'timbot', 'programmer "clever" as fox ']
- location: over
- note: programmer "clever" as fox
- subject: timbot
['timbot', '->', 'Guido', 'I give you doctest ']
- from: timbot
- note: I give you doctest
- to: Guido
['Guido', '-->', 'timbot', 'Have you checked my time machine?']
- from: Guido
- note: Have you checked my time machine?
- to: timbot
Best of luck in your project,
-- Paul