469,359 Members | 1,647 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,359 developers. It's quick & easy.

Parsing and Editing Source

Hi all,

I'd like to be able to do the following to a python source file
programmatically:
* Read in a source file
* Add/Remove/Edit Classes, methods, functions
* Add/Remove/Edit Decorators
* List the Classes
* List the imported modules
* List the functions
* List methods of classes

And then save out the result back to the original file (or elsewhere).

I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.

So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Any thoughts?
Thanks,
Paul
Aug 15 '08 #1
6 1223
On Aug 15, 4:21*pm, "Paul Wilson" <paulalexwil...@gmail.comwrote:
Hi all,

I'd like to be able to do the following to a python source file
programmatically:
** Read in a source file
** Add/Remove/Edit Classes, methods, functions
** Add/Remove/Edit Decorators
** List the Classes
** List the imported modules
** List the functions
** List methods of classes

And then save out the result back to the original file (or elsewhere).

I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.

So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Any thoughts?
Thanks,
Paul
Consider using the 'compiler' module which will lend you more help
than 'tokenize'.

For example, the following demo lists all the method names in a file:

import compiler

class MethodFinder:
""" Print the names of all the methods

Each visit method takes two arguments, the node and its
current scope.
The scope is the name of the current class or None.
"""

def visitClass(self, node, scope=None):
self.visit(node.code, node.name)

def visitFunction(self, node, scope=None):
if scope is not None:
print "%s.%s" % (scope, node.name)
self.visit(node.code, None)

def main(files):
mf = MethodFinder()
for file in files:
f = open(file)
buf = f.read()
f.close()
ast = compiler.parse(buf)
compiler.walk(ast, mf)

if __name__ == "__main__":
import pprint
import sys

main(sys.argv)
Aug 15 '08 #2
On Aug 15, 3:45*pm, eliben <eli...@gmail.comwrote:
On Aug 15, 4:21*pm, "Paul Wilson" <paulalexwil...@gmail.comwrote:
Hi all,
I'd like to be able to do the following to a python source file
programmatically:
** Read in a source file
** Add/Remove/Edit Classes, methods, functions
** Add/Remove/Edit Decorators
** List the Classes
** List the imported modules
** List the functions
** List methods of classes
And then save out the result back to the original file (or elsewhere).
I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.
So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.
Any thoughts?
Thanks,
Paul

Consider using the 'compiler' module which will lend you more help
than 'tokenize'.

For example, the following demo lists all the method names in a file:

import compiler

class MethodFinder:
* * """ Print the names of all the methods

* * * * Each visit method takes two arguments, the node and its
* * * * current scope.
* * * * The scope is the name of the current class or None.
* * """

* * def visitClass(self, node, scope=None):
* * * * self.visit(node.code, node.name)

* * def visitFunction(self, node, scope=None):
* * * * if scope is not None:
* * * * * * print "%s.%s" % (scope, node.name)
* * * * self.visit(node.code, None)

def main(files):
* * mf = MethodFinder()
* * for file in files:
* * * * f = open(file)
* * * * buf = f.read()
* * * * f.close()
* * * * ast = compiler.parse(buf)
* * * * compiler.walk(ast, mf)

if __name__ == "__main__":
* * import pprint
* * import sys

* * main(sys.argv)
Thanks! Will I be able to make changes to the ast such as "rename
decorator", "add decorator", etc.. and write them back out to a file
as Python source?

Regards,
Paul
Aug 15 '08 #3
On Aug 15, 9:21 pm, "Paul Wilson" <paulalexwil...@gmail.comwrote:
Hi all,

I'd like to be able to do the following to a python source file
programmatically:
* Read in a source file
* Add/Remove/Edit Classes, methods, functions
* Add/Remove/Edit Decorators
* List the Classes
* List the imported modules
* List the functions
* List methods of classes

And then save out the result back to the original file (or elsewhere).

I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.

So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Any thoughts?
Thanks,
Paul

I can't help much...yet, but I am also heavily interested in this as I
will be approaching a project which will require me to write code
which writes code back to a file or new file after being manipulated.
I had planned on using the inspect module's getsource(), getmodule()
and getmembers() methods rather than doing any sort of file reading.
Have you tried any of these yet? Have you found any insurmountable
limitations?

It looks like everything needed is there. Some quick thoughts
regarding inspect.getmembers(module) results...
* Module objects can be written based on their attribute name and
__name__ values. If they are the same, then just write "import %s" %
mod.__name__. If they are different, write "import %s as %s" % (name,
mod.__name__)

* Skipping built in stuff is easy and everything else is either an
attribute name,value pair or an object of type 'function' or 'class'.
Both of which work with inspect.getsource() I believe.

* If the module used any from-import-* lines, it doesn't look like
there is any difference between items defined in the module and those
imported in to the modules name space. writing this back directly
would 'flatten' this call to individual module imports and local
module attributes. Maybe reading the file just to test for this would
be the answer. You could then import the module and subtract items
which haven't changed. This is easy for attributes but harder for
functions and classes...right?
Beyond this initial bit of code, I'm hoping to be able to write new
code where I only want the new object to have attributes which were
changed. So if I have an instance of a Person object who's name has
been changed from it's default, I only want a new class which inherits
the Person class and has an attribute 'name' with the new value.
Basically using python as a text-based storage format instead of
something like XML. Thoughts on this would be great for me if it
doesn't hijack the thread ;) I know there a quite a few who have done
this already.
Cheers,

- Rafe


Aug 15 '08 #4
On Aug 15, 4:16*pm, Rafe <rafesa...@gmail.comwrote:
On Aug 15, 9:21 pm, "Paul Wilson" <paulalexwil...@gmail.comwrote:
Hi all,
I'd like to be able to do the following to a python source file
programmatically:
** Read in a source file
** Add/Remove/Edit Classes, methods, functions
** Add/Remove/Edit Decorators
** List the Classes
** List the imported modules
** List the functions
** List methods of classes
And then save out the result back to the original file (or elsewhere).
I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.
So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.
Any thoughts?
Thanks,
Paul

I can't help much...yet, but I am also heavily interested in this as I
will be approaching a project which will require me to write code
which writes code back to a file or new file after being manipulated.
I had planned on using the inspect module's getsource(), getmodule()
and getmembers() methods rather than doing any sort of file reading.
Have you tried any of these yet? Have you found any insurmountable
limitations?
The inspect module's getsource() returns the source code as originally
defined. It does not return any changes that have been made during
runtime. So, if you attached a new class to a module, I don't belive
that getsource() would be any use for extracting the code again to be
saved. I have rejected this approach for this reason. getmembers()
seems to be fine for this purpose, however I don't seen anyway to get
class decorators and method decorators out.
It looks like everything needed is there. Some quick thoughts
regarding inspect.getmembers(module) results...
** Module objects can be written based on their attribute name and
__name__ values. If they are the same, then just write "import %s" %
mod.__name__. If they are different, write "import %s as %s" % (name,
mod.__name__)

** Skipping built in stuff is easy and everything else is either an
attribute name,value pair or an object of type 'function' or 'class'.
Both of which work with inspect.getsource() I believe.
True, but if you add a function or class at runtime,
inspect.getsource() will not pick it up. It's reading the source from
a file, not doing some sort of AST unparse magic as I'd hoped. You'll
also have to check getsource() will return the decorator of an object
too.
** If the module used any from-import-* lines, it doesn't look like
there is any difference between items defined in the module and those
imported in to the modules name space. writing this back directly
would 'flatten' this call to individual module imports and local
module attributes. Maybe reading the file just to test for this would
be the answer. You could then import the module and subtract items
which haven't changed. This is easy for attributes but harder for
functions and classes...right?
Does getmodule() not tell you where objects are defined?
Beyond this initial bit of code, I'm hoping to be able to write new
code where I only want the new object to have attributes which were
changed. So if I have an instance of a Person object who's name has
been changed from it's default, I only want a new class which inherits
the Person class and has an attribute 'name' with the new value.
Basically using python as a text-based storage format instead of
something like XML. Thoughts on this would be great for me if it
doesn't hijack the thread ;) I know there a quite a few who have done
this already.
You want to be able to make class attribute changes and then have some
automated way of generating overriding subclasses that reflects this
change? Sounds difficult. Be sure to keep me posted on your journey!

Regards,
Paul
Aug 15 '08 #5
On Aug 15, 9:21*am, "Paul Wilson" <paulalexwil...@gmail.comwrote:
Hi all,

I'd like to be able to do the following to a python source file
programmatically:
** Read in a source file
** Add/Remove/Edit Classes, methods, functions
** Add/Remove/Edit Decorators
** List the Classes
** List the imported modules
** List the functions
** List methods of classes

And then save out the result back to the original file (or elsewhere).

I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.

So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Look at the 2to3 tool which is good at this sort of thing. It lets you
define custom "fixers" that work on a fairly high-level representation
of the parse tree and then write the source back exactly unchanged.
>
Any thoughts?
Thanks,
Paul
Aug 16 '08 #6
On Aug 16, 3:51*am, Benjamin <musiccomposit...@gmail.comwrote:
On Aug 15, 9:21*am, "Paul Wilson" <paulalexwil...@gmail.comwrote:
Hi all,
I'd like to be able to do the following to a python source file
programmatically:
** Read in a source file
** Add/Remove/Edit Classes, methods, functions
** Add/Remove/Edit Decorators
** List the Classes
** List the imported modules
** List the functions
** List methods of classes
And then save out the result back to the original file (or elsewhere).
I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.
So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Look at the 2to3 tool which is good at this sort of thing. It lets you
define custom "fixers" that work on a fairly high-level representation
of the parse tree and then write the source back exactly unchanged.
Any thoughts?
Thanks,
Paul

Thanks for the hint. I've looked at lib2to3 and there might be some
useful stuff in there!

Thank you,
Paul
Aug 16 '08 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

8 posts views Thread by pradeepsarathy | last post: by
5 posts views Thread by Ian Bicking | last post: by
4 posts views Thread by Rick Walsh | last post: by
5 posts views Thread by randy | last post: by
5 posts views Thread by Zytan | last post: by
1 post views Thread by Stuart Shay | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.