473,327 Members | 1,952 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

Python parser that records source ranges

The parser library module only records source line numbers for tokens. I
need a parser that records ranges of line and character locations for
each AST node, so I can map back to the source. Does anyone know of such
a thing? Thanks

Jonathan

Jul 18 '05 #1
5 1730
The tokenize module will give column information for each token, but
it produces a stream of tokens only, not an AST.

Jeff

Jul 18 '05 #2
Jonathan Edwards <ed*****@nospam.lcs.mit.edu> wrote in message news:<qRKdb.456249$Oz4.260848@rwcrnsc54>...
The parser library module only records source line numbers for tokens. I
need a parser that records ranges of line and character locations for
each AST node, so I can map back to the source. Does anyone know of such
a thing? Thanks

Jonathan


You know there's not going to be a one-to-one relationship, right?
Most ast nodes are symbols and aren't going to match to any tokens.
Python asts also use a lot of intermediate nodes to enforce operator
precidence.

Anyway, I have some rather specialized code in PyXR that syncs tokens
to an ast. You probably won't be able to use it out of the box but it
should give you a good start:

http://www.cathoderaymission.net/~logistix/PyXR/

The source file of particular interest to you would be astToHtml.py:

http://tinyurl.com/p3cn
Jul 18 '05 #3
So the basic idea is to match up the leaves of the AST with the list of
tokens from tokenizer, which do contain location info. I had thought of
that, but was hoping there was a more informative parser out there.
Thanks.

Jonathan
logistix at cathoderaymission.net wrote:
Jonathan Edwards <ed*****@nospam.lcs.mit.edu> wrote in message news:<qRKdb.456249$Oz4.260848@rwcrnsc54>...
The parser library module only records source line numbers for tokens. I
need a parser that records ranges of line and character locations for
each AST node, so I can map back to the source. Does anyone know of such
a thing? Thanks

Jonathan

You know there's not going to be a one-to-one relationship, right?
Most ast nodes are symbols and aren't going to match to any tokens.
Python asts also use a lot of intermediate nodes to enforce operator
precidence.

Anyway, I have some rather specialized code in PyXR that syncs tokens
to an ast. You probably won't be able to use it out of the box but it
should give you a good start:

http://www.cathoderaymission.net/~logistix/PyXR/

The source file of particular interest to you would be astToHtml.py:

http://tinyurl.com/p3cn


Jul 18 '05 #4
Jonathan Edwards <ed*****@nospam.lcs.mit.edu> wrote in message news:<3F**************@nospam.lcs.mit.edu>...
So the basic idea is to match up the leaves of the AST with the list of
tokens from tokenizer, which do contain location info. I had thought of
that, but was hoping there was a more informative parser out there.
Thanks.

Jonathan

Its really not that bad. The more I think about it, the code
reference I sent you is way overcomplicated. General pseudocode for
walking asts generated via parser.ast2tuple(parser.suite(code)) is:

def walk_node(node):
if len(node) == 2 and type(node[1]) is not tuple:
walk_token(node)
else:
return walk_symbol(node)

def walk_symbol(node):
symbol_type = node[0]
symbol_leaves = node[1:]
for leave in symbol_leaves:
walk_node(nod)

def walk_token(node):
token_type = node[0]
token_value = node[1]
Jul 18 '05 #5
"Jonathan Edwards" <ed*****@nospam.lcs.mit.edu> wrote in message
news:qRKdb.456249$Oz4.260848@rwcrnsc54...
The parser library module only records source line numbers for tokens. I
need a parser that records ranges of line and character locations for
each AST node, so I can map back to the source. Does anyone know of such
a thing? Thanks

Jonathan


If I understand you correctly, then the Simpleparse parser may be just what
you are looking for:

http://simpleparse.sourceforge.net

It is very powerful but still easy to use. The AST it produces gives the
start and end points of the matching tokens. Below is an example for parsing
a statement (from a VB grammar) ... you will see each node comprises a tuple
of (token_name, start_char, end_char, [sub_node1, sub_node2, ...]).

The example below looks rather complex because of the grammar, but you can
see that most of the sub_node matches all relate to the same characters in
the source. You can easily match each token to the corresponding text in the
source.

Paul
c("a = f(20, val)", verbose=1)

1 15
[('line_body',
0,
15,
[('single_statement',
0,
14,
[('assignment_statement',
0,
14,
[('object', 0, 1, [('primary', 0, 1, [('identifier', 0, 1, [])])]),
('expression',
4,
14,
[('par_expression',
4,
14,
[('base_expression',
4,
14,
[('simple_expr',
4,
14,
[('call',
4,
14,
[('object',
4,
14,
[('primary',
4,
5,
[('identifier', 4, 5, [])]),
('parameter_list',
5,
14,
[('list',
5,
14,
[('bare_list',
6,
13,
[('bare_list_item',
6,
8,
[('expression',
6,
8,
[('par_expression',
6,
8,
[('base_expression',
6,
8,
[('simple_expr',
6,
8,
[('atom',
6,
8,
[('literal',
6,
8,
[('integer',
6,
8,
[('decimalinteger',
6,
8,
None)])])])])])])])]),
('bare_list_item',
10,
13,
[('expression',
10,
13,
[('par_expression',
10,
13,
[('base_expression',
10,
13,
[('simple_expr',
10,
13,
[('call',
10,
13,
[('object',
10,
13,
[('primary',
10,
13,
[('identifier',
10,
13,

[])])])])])])])])])])])])])])])])])])])]),
('line_end', 14, 15, [('NEWLINE', 14, 15, None)])])]
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

220
by: Brandon J. Van Every | last post by:
What's better about Ruby than Python? I'm sure there's something. What is it? This is not a troll. I'm language shopping and I want people's answers. I don't know beans about Ruby or have...
699
by: mike420 | last post by:
I think everyone who used Python will agree that its syntax is the best thing going for it. It is very readable and easy for everyone to learn. But, Python does not a have very good macro...
4
by: PT | last post by:
Hi, I'm not looking to get into a debate about case-sensitive vs. insensitive programming languages, but I was looking for suggestions about how it might be possible to add a hook to the python...
7
by: svilen | last post by:
hello again. i'm now into using python instead of another language(s) for describing structures of data, including names, structure, type-checks, conversions, value-validations, metadata etc....
1
by: M.E.Farmer | last post by:
Hello c.l.py!, I have just finished this and decided to share. PySourceColor is a module to convert Python source into colored html. Yes it has been done before, but I like this better:) You can...
25
by: Ali-R | last post by:
Hi, Is there a parser which parses CSV files? Thanks for your help. Reza
1
by: Justin Johnson | last post by:
Hello, I'm trying to build Python 2.5.0 on AIX 5.3 using IBM's compiler (VisualAge C++ Professional / C for AIX Compiler, Version 6). I run configure and make, but makes fails with undefined...
13
by: Wiseman | last post by:
I'm kind of disappointed with the re regular expressions module. In particular, the lack of support for recursion ( (?R) or (?n) ) is a major drawback to me. There are so many great things that can...
3
by: Kinokunya | last post by:
Hi guys, My group and I will be working on our final year project, the scope to do a program/web-based application similar areas of functionalities like the PyLint and PyChecker; a Python syntax...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.