473,401 Members | 2,146 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,401 software developers and data experts.

Parsing and Editing Source

Hi all,

I'd like to be able to do the following to a python source file
programmatically:
* Read in a source file
* Add/Remove/Edit Classes, methods, functions
* Add/Remove/Edit Decorators
* List the Classes
* List the imported modules
* List the functions
* List methods of classes

And then save out the result back to the original file (or elsewhere).

I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.

So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Any thoughts?
Thanks,
Paul
Aug 15 '08 #1
6 1369
On Aug 15, 4:21*pm, "Paul Wilson" <paulalexwil...@gmail.comwrote:
Hi all,

I'd like to be able to do the following to a python source file
programmatically:
** Read in a source file
** Add/Remove/Edit Classes, methods, functions
** Add/Remove/Edit Decorators
** List the Classes
** List the imported modules
** List the functions
** List methods of classes

And then save out the result back to the original file (or elsewhere).

I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.

So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Any thoughts?
Thanks,
Paul
Consider using the 'compiler' module which will lend you more help
than 'tokenize'.

For example, the following demo lists all the method names in a file:

import compiler

class MethodFinder:
""" Print the names of all the methods

Each visit method takes two arguments, the node and its
current scope.
The scope is the name of the current class or None.
"""

def visitClass(self, node, scope=None):
self.visit(node.code, node.name)

def visitFunction(self, node, scope=None):
if scope is not None:
print "%s.%s" % (scope, node.name)
self.visit(node.code, None)

def main(files):
mf = MethodFinder()
for file in files:
f = open(file)
buf = f.read()
f.close()
ast = compiler.parse(buf)
compiler.walk(ast, mf)

if __name__ == "__main__":
import pprint
import sys

main(sys.argv)
Aug 15 '08 #2
On Aug 15, 3:45*pm, eliben <eli...@gmail.comwrote:
On Aug 15, 4:21*pm, "Paul Wilson" <paulalexwil...@gmail.comwrote:
Hi all,
I'd like to be able to do the following to a python source file
programmatically:
** Read in a source file
** Add/Remove/Edit Classes, methods, functions
** Add/Remove/Edit Decorators
** List the Classes
** List the imported modules
** List the functions
** List methods of classes
And then save out the result back to the original file (or elsewhere).
I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.
So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.
Any thoughts?
Thanks,
Paul

Consider using the 'compiler' module which will lend you more help
than 'tokenize'.

For example, the following demo lists all the method names in a file:

import compiler

class MethodFinder:
* * """ Print the names of all the methods

* * * * Each visit method takes two arguments, the node and its
* * * * current scope.
* * * * The scope is the name of the current class or None.
* * """

* * def visitClass(self, node, scope=None):
* * * * self.visit(node.code, node.name)

* * def visitFunction(self, node, scope=None):
* * * * if scope is not None:
* * * * * * print "%s.%s" % (scope, node.name)
* * * * self.visit(node.code, None)

def main(files):
* * mf = MethodFinder()
* * for file in files:
* * * * f = open(file)
* * * * buf = f.read()
* * * * f.close()
* * * * ast = compiler.parse(buf)
* * * * compiler.walk(ast, mf)

if __name__ == "__main__":
* * import pprint
* * import sys

* * main(sys.argv)
Thanks! Will I be able to make changes to the ast such as "rename
decorator", "add decorator", etc.. and write them back out to a file
as Python source?

Regards,
Paul
Aug 15 '08 #3
On Aug 15, 9:21 pm, "Paul Wilson" <paulalexwil...@gmail.comwrote:
Hi all,

I'd like to be able to do the following to a python source file
programmatically:
* Read in a source file
* Add/Remove/Edit Classes, methods, functions
* Add/Remove/Edit Decorators
* List the Classes
* List the imported modules
* List the functions
* List methods of classes

And then save out the result back to the original file (or elsewhere).

I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.

So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Any thoughts?
Thanks,
Paul

I can't help much...yet, but I am also heavily interested in this as I
will be approaching a project which will require me to write code
which writes code back to a file or new file after being manipulated.
I had planned on using the inspect module's getsource(), getmodule()
and getmembers() methods rather than doing any sort of file reading.
Have you tried any of these yet? Have you found any insurmountable
limitations?

It looks like everything needed is there. Some quick thoughts
regarding inspect.getmembers(module) results...
* Module objects can be written based on their attribute name and
__name__ values. If they are the same, then just write "import %s" %
mod.__name__. If they are different, write "import %s as %s" % (name,
mod.__name__)

* Skipping built in stuff is easy and everything else is either an
attribute name,value pair or an object of type 'function' or 'class'.
Both of which work with inspect.getsource() I believe.

* If the module used any from-import-* lines, it doesn't look like
there is any difference between items defined in the module and those
imported in to the modules name space. writing this back directly
would 'flatten' this call to individual module imports and local
module attributes. Maybe reading the file just to test for this would
be the answer. You could then import the module and subtract items
which haven't changed. This is easy for attributes but harder for
functions and classes...right?
Beyond this initial bit of code, I'm hoping to be able to write new
code where I only want the new object to have attributes which were
changed. So if I have an instance of a Person object who's name has
been changed from it's default, I only want a new class which inherits
the Person class and has an attribute 'name' with the new value.
Basically using python as a text-based storage format instead of
something like XML. Thoughts on this would be great for me if it
doesn't hijack the thread ;) I know there a quite a few who have done
this already.
Cheers,

- Rafe


Aug 15 '08 #4
On Aug 15, 4:16*pm, Rafe <rafesa...@gmail.comwrote:
On Aug 15, 9:21 pm, "Paul Wilson" <paulalexwil...@gmail.comwrote:
Hi all,
I'd like to be able to do the following to a python source file
programmatically:
** Read in a source file
** Add/Remove/Edit Classes, methods, functions
** Add/Remove/Edit Decorators
** List the Classes
** List the imported modules
** List the functions
** List methods of classes
And then save out the result back to the original file (or elsewhere).
I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.
So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.
Any thoughts?
Thanks,
Paul

I can't help much...yet, but I am also heavily interested in this as I
will be approaching a project which will require me to write code
which writes code back to a file or new file after being manipulated.
I had planned on using the inspect module's getsource(), getmodule()
and getmembers() methods rather than doing any sort of file reading.
Have you tried any of these yet? Have you found any insurmountable
limitations?
The inspect module's getsource() returns the source code as originally
defined. It does not return any changes that have been made during
runtime. So, if you attached a new class to a module, I don't belive
that getsource() would be any use for extracting the code again to be
saved. I have rejected this approach for this reason. getmembers()
seems to be fine for this purpose, however I don't seen anyway to get
class decorators and method decorators out.
It looks like everything needed is there. Some quick thoughts
regarding inspect.getmembers(module) results...
** Module objects can be written based on their attribute name and
__name__ values. If they are the same, then just write "import %s" %
mod.__name__. If they are different, write "import %s as %s" % (name,
mod.__name__)

** Skipping built in stuff is easy and everything else is either an
attribute name,value pair or an object of type 'function' or 'class'.
Both of which work with inspect.getsource() I believe.
True, but if you add a function or class at runtime,
inspect.getsource() will not pick it up. It's reading the source from
a file, not doing some sort of AST unparse magic as I'd hoped. You'll
also have to check getsource() will return the decorator of an object
too.
** If the module used any from-import-* lines, it doesn't look like
there is any difference between items defined in the module and those
imported in to the modules name space. writing this back directly
would 'flatten' this call to individual module imports and local
module attributes. Maybe reading the file just to test for this would
be the answer. You could then import the module and subtract items
which haven't changed. This is easy for attributes but harder for
functions and classes...right?
Does getmodule() not tell you where objects are defined?
Beyond this initial bit of code, I'm hoping to be able to write new
code where I only want the new object to have attributes which were
changed. So if I have an instance of a Person object who's name has
been changed from it's default, I only want a new class which inherits
the Person class and has an attribute 'name' with the new value.
Basically using python as a text-based storage format instead of
something like XML. Thoughts on this would be great for me if it
doesn't hijack the thread ;) I know there a quite a few who have done
this already.
You want to be able to make class attribute changes and then have some
automated way of generating overriding subclasses that reflects this
change? Sounds difficult. Be sure to keep me posted on your journey!

Regards,
Paul
Aug 15 '08 #5
On Aug 15, 9:21*am, "Paul Wilson" <paulalexwil...@gmail.comwrote:
Hi all,

I'd like to be able to do the following to a python source file
programmatically:
** Read in a source file
** Add/Remove/Edit Classes, methods, functions
** Add/Remove/Edit Decorators
** List the Classes
** List the imported modules
** List the functions
** List methods of classes

And then save out the result back to the original file (or elsewhere).

I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.

So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Look at the 2to3 tool which is good at this sort of thing. It lets you
define custom "fixers" that work on a fairly high-level representation
of the parse tree and then write the source back exactly unchanged.
>
Any thoughts?
Thanks,
Paul
Aug 16 '08 #6
On Aug 16, 3:51*am, Benjamin <musiccomposit...@gmail.comwrote:
On Aug 15, 9:21*am, "Paul Wilson" <paulalexwil...@gmail.comwrote:
Hi all,
I'd like to be able to do the following to a python source file
programmatically:
** Read in a source file
** Add/Remove/Edit Classes, methods, functions
** Add/Remove/Edit Decorators
** List the Classes
** List the imported modules
** List the functions
** List methods of classes
And then save out the result back to the original file (or elsewhere).
I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.
So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Look at the 2to3 tool which is good at this sort of thing. It lets you
define custom "fixers" that work on a fairly high-level representation
of the parse tree and then write the source back exactly unchanged.
Any thoughts?
Thanks,
Paul

Thanks for the hint. I've looked at lib2to3 and there might be some
useful stuff in there!

Thank you,
Paul
Aug 16 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Cigdem | last post by:
Hello, I am trying to parse the XML files that the user selects(XML files are on anoher OS400 system called "wkdis3"). But i am permenantly getting that error: Directory0: \\wkdis3\ROOT\home...
8
by: pradeepsarathy | last post by:
Hi all, Does the SAX parser has eventhandlers for parsing xml schema. Can we parse the xml schema the same way as we parse the xml document using SAX Parser. Thanks in advance. -pradeep
5
by: Ian Bicking | last post by:
I got a puzzler for y'all. I want to allow the editing of functions in-place. I won't go into the reason (it's for HTConsole -- http://blog.ianbicking.org/introducing-htconsole.html), except that...
4
by: Rick Walsh | last post by:
I have an HTML table in the following format: <table> <tr><td>Header 1</td><td>Header 2</td></tr> <tr><td>1</td><td>2</td></tr> <tr><td>3</td><td>4</td></tr> <tr><td>5</td><td>6</td></tr>...
5
by: randy | last post by:
Can some point me to a good example of parsing XML using C# 2.0? Thanks
5
by: Zytan | last post by:
I cannot stand being unable to change the code while the debugger is running. Is there a way to do this? thanks, Zytan
0
by: hanusoft | last post by:
This is an example of editing in DataGrid and Default Paging http://www.hanusoftware.com Html Design Code : - <asp:DataGrid id="DataGrid1" DataKeyField="id" runat="server" Height="224px"...
1
by: Sidhartha | last post by:
Hi, I am facing a problem while parsing local language characters using sax parser. We use DOM to parse and SAX to read the source. But when our application parses strings with local language...
1
by: Stuart Shay | last post by:
Hello All: I am working on parsing the Amazon Key word search webservice for the Editorial Reviews Content containing "Amazon.Com Review" in the Source element. <EditorialReviews>...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.