AST Manipulation and Common Subexpression Elimination

Robert Kern

I'm in the middle of writing a scientific program with a number of very, very
long formulae. You can see a typical one below. What I would like to do is to
parse this code to yield an AST, identify identical subtrees, replace them in
the AST with a dummy variable to which I will assign the common subexpression.

v21c = ((1-2*nu)*((2*(1-nu)ctB*ctB-nu)*log(Ry3bar) - \
(2*(1-nu)*ctB*ctB+1-2*nu)*cB*log(Rz3bar)) - \
(1-2*nu)/Ry3bar*(y1*ctB*(1-2*nu-a/Rbar) + nu*ybar3 - a + \
y2*y2/Ry3bar*(nu+a/Rbar)) - \
(1-2*nu)*zbar3*ctB/Rz3bar*(cB+a/Rbar) - \
a*y1*(ybar3-a)*ctB/Rbar3 + \
(ybar3-a)/Ry3bar*(-2*nu+((1-2*nu)*y1*ctB-a)/Rbar + \
y2*y2/Rbar/Ry3bar*(2*nu+a/Rbar)+a*y2*y2/Rbar3) + \
(ybar3-a)/Rz3bar*(cB*cB-((1-2*nu)*zbar1*ctB+a*cB)/Rbar +
a*ybar3*zbar1*ctB/Rbar3 - \
(y2*y2*cB*cB-a*zbar1*ctB/Rbar*(Rbar*cB+ybar3))/Rbar/Rz3bar))/(4*pi*(
1-nu))

For example, I would like to globally replace (1-2*nu) with, say, the variable
nu12=(1-2*nu). Some of these variables are already the evaluations of
subexpressions I found convenient when typing in these formulae (or the original
authors when deriving them).

Each of these variables is either a float or a Numeric array (or the log()
function). No variables changes value during the computation, nor are there
branches or loops, so the CSE algorithm doesn't have to be sophisticated enough
to account for these.

Before you start on about "premature optimization," note that I'm more concerned
with keeping the formulae readable, manageable, and debuggable than anything
else. Heck, with the size of the arrays I'm thinking about, doing lots of CSE
will probably kill my memory.

My questions are:

1. Can I do better than brute-force traversing the AST and trying to match each
subtree to the other subtrees (using something like the match() function given
in the parser module example)?

2. Are there good tools out there for doing AST transformations like this? Are
there any GUI tools for messing with the AST? I know that I will want to
interactively select which subexpressions are eliminated in this fashion.

3. Is there any example code around showing how to output Python code from the
AST? I can probably figure out, then, how to do the same for
Maxima/Yacas/Maple/Mathematica as I'm sure I'm going to want to slap these
formulae into a CAS and play around with them.

4. And as a corollary to that last thought: are there any automatic
differentiation tools for Python that operate directly on the AST and generate
new code?

Googling hasn't come up with much for me, but my brain is fried from typing in
these $%&@! formulae. Pointers to introductory compiler technique references
(online or dead tree) or other useful resources are welcome.

Many thanks.

--
Robert Kern
ke**@ugcs.caltech.edu

"In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Jul 18 '05 #1

Subscribe Post Reply

2268

Martin v. Löwis

Robert Kern wrote:

1. Can I do better than brute-force traversing the AST and trying to match each
subtree to the other subtrees (using something like the match() function given
in the parser module example)?
Certainly: you could do hashing or sorting. For hashing, define some
hash function for expressions, preferably recursively
2. Are there good tools out there for doing AST transformations like this? Are
there any GUI tools for messing with the AST? I know that I will want to
interactively select which subexpressions are eliminated in this fashion.

GUI yes, interactive no: You might want to take a look at the
dotty/graphviz tool suite.

Regards,
Martin

Jul 18 '05 #2

Heiko Wundram

Martin v. Löwis wrote:

Robert Kern wrote:
1. Can I do better than brute-force traversing the AST and trying to
match each
subtree to the other subtrees (using something like the match()
function given
in the parser module example)?

Certainly: you could do hashing or sorting. For hashing, define some
hash function for expressions, preferably recursively

Coolest way to do this (IMHO) would be the application of something like
the LZW compression algorithm.

Hmm... Guess I'm still too tired to explain this morning, so I'll just
have to give an example:

Expression: x = (n1+2)*(n1-1)*(n3+2)*(n1-1)*(n1+2)*(n3+2)*(n1-3)

Step 1

dummy1 = (n1+2)
x = dummy1

First Term isn't known yet, so assign it to dummy1 and stop processing.
Continue step two at second term.

Step 2

dummy1 = (n1+2)
dummy2 = (n1-1)
x = dummy1*dummy2

Second term isn't known yet, so assign to dummy2 and stop processing.
Continue step three at third term.

Step 3

dummy1 = (n1+2)
dummy2 = (n1-1)
dummy3 = (n3+2)
x = dummy1*dummy2*dummy3

Third term isn't known yet, so assign to dummy3 and stop processing.
Continue step four at fourth term.

Step 4

dummy1 = (n1+2)
dummy2 = (n1-1)
dummy3 = (n3+2)
dummy4 = dummy1*dummy2
dummy5 = dummy1*dummy3
dummy6 = dummy2*dummy3
dummy7 = dummy3*dummy4 # Remember that this is dummy1*dummy2*dummy3
x = dummy1*dummy2*dummy3*dummy2*dummy1*dummy3

Fourth term is known already in a "simple" dummy (dummy2). Now continue
to match "simple dummies" until you find a term that isn't already
present (n1-3). Repush the non-match onto the factor stack. This makes
the longest matching sequence dummy2*dummy1*dummy3. Write down all
possible combinations of at least length 2 of the found terms to dummy
variables (regardless of order). In these dummies, replace all
occurences of "complex dummies" with references to earlier "complex
dummies" (dummy7 = dummy1*dummy2*dummy3, when you start at the
beginning, this makes dummy3*dummy4). Continue step five at seventh term.

Step 5

dummy1 = (n1+2)
dummy2 = (n1-1)
dummy3 = (n3+2)
dummy4 = dummy1*dummy2
dummy5 = dummy1*dummy3
dummy6 = dummy2*dummy3
dummy7 = dummy3*dummy4
dummy8 = (n1-3)
x = dummy1*dummy2*dummy3*dummy2*dummy1*dummy3*dummy8

Seventh term isn't know yet, so assign to dummy8 and stop processing. As
this is the last term, stop compression and continue with cleanup.

Step 6

dummy1 = (n1+2)
dummy2 = (n1-1)
dummy3 = (n3+2)
dummy4 = dummy1*dummy2
dummy5 = dummy1*dummy3
dummy6 = dummy2*dummy3
dummy7 = dummy3*dummy4 # This is dummy1*dummy2*dummy3
dummy8 = (n1-3)
x = dummy7*dummy7*dummy8

Apply same algorithm as with step 4, replacing all occurences of
"complex dummies" found in x with references to other "complex dummies",
starting to match at the beginning.

Step 8

Finally Remove all terms that aren't directly or indirectly referenced
from x. (Remove dummy5 and dummy6).

This algorithm should create longest matching subsequences. If I
understand correctly, you want to deal with more than just
multiplication, but I guess you could modify this algorithm so that it
works on sums and multiplication, as you need it to (it would have to
know about bracket rules, which it doesn't at the moment).

If the above description isn't algorithmically clear, feel free to mail
me, and I might (if I find the time) code a reference implementation.

HTH!

Heiko.

Jul 18 '05 #3

logistix at cathoderaymission.net

Robert Kern <ke**@ugcs.caltech.edu> wrote in message news:<bm**********@naig.caltech.edu>...

I'm in the middle of writing a scientific program with a number of very, very
long formulae. You can see a typical one below. What I would like to do is to
parse this code to yield an AST, identify identical subtrees, replace them in
the AST with a dummy variable to which I will assign the common subexpression.

You should probably read about "directed acyclic graphs", or dags.
That term should produce plenty or results on google, although I doubt
there will be anything that's python-based out there.

Jul 18 '05 #4

Bengt Richter

On Sat, 18 Oct 2003 07:09:51 +0000 (UTC), Robert Kern <ke**@ugcs.caltech.edu> wrote:

I'm in the middle of writing a scientific program with a number of very, very
long formulae. You can see a typical one below. What I would like to do is to
parse this code to yield an AST, identify identical subtrees, replace them in
the AST with a dummy variable to which I will assign the common subexpression.

v21c = ((1-2*nu)*((2*(1-nu)ctB*ctB-nu)*log(Ry3bar) - \ ^--missing * ? (2*(1-nu)*ctB*ctB+1-2*nu)*cB*log(Rz3bar)) - \
(1-2*nu)/Ry3bar*(y1*ctB*(1-2*nu-a/Rbar) + nu*ybar3 - a + \
y2*y2/Ry3bar*(nu+a/Rbar)) - \
(1-2*nu)*zbar3*ctB/Rz3bar*(cB+a/Rbar) - \
a*y1*(ybar3-a)*ctB/Rbar3 + \
(ybar3-a)/Ry3bar*(-2*nu+((1-2*nu)*y1*ctB-a)/Rbar + \
y2*y2/Rbar/Ry3bar*(2*nu+a/Rbar)+a*y2*y2/Rbar3) + \
(ybar3-a)/Rz3bar*(cB*cB-((1-2*nu)*zbar1*ctB+a*cB)/Rbar +
a*ybar3*zbar1*ctB/Rbar3 - \
(y2*y2*cB*cB-a*zbar1*ctB/Rbar*(Rbar*cB+ybar3))/Rbar/Rz3bar))/(4*pi*(
1-nu))

For example, I would like to globally replace (1-2*nu) with, say, the variable
nu12=(1-2*nu). Some of these variables are already the evaluations of
Do you want to recognize that (-2*nu+1) is an equivalent expression in one sense, though
the AST's differ at one level of interpretation, e.g.,

compiler.transformer.parse('(2*nu-1)') Module(None, Stmt([Discard(Sub((Mul((Const(2), Name('nu'))), Const(1))))]))

vs
compiler.transformer.parse('(-1+2*nu)')

Module(None, Stmt([Discard(Add((UnarySub(Const(1)), Mul((Const(2), Name('nu'))))))]))
And are you considering your integer literal coefficients as abbreviations of their floating point
equivalent? Are you concerned with order of evaluation at all? I presume so if some of
your variables are matrices, but how about the non-matrices? Could you use distinctive
names for the matrices, to make that part? Is it a given that there are no aliases?

It might be interesting to take Martin's suggestion and generate a hash by way of using
canonicalized string representations and letting the hashing be done by adding them to
subexpression dict. E.g., sort the elements of same-level operator terms from the bottom
leaves up. I had an idea to do that by canonicalizing to a prefix notation with sign-tagged
atoms, so 2*nu-1 becomes (+ (* 2 nu) -1). BTW, what is your evaluation precedence for * and / ?
I.e., is a*b/c*d ((a*b)/c)*d or (a*b)/(b*c) ? Python is left->right unless something binds tighter,
I believe.
subexpressions I found convenient when typing in these formulae (or the original
authors when deriving them).

Each of these variables is either a float or a Numeric array (or the log()
function). No variables changes value during the computation, nor are there
branches or loops, so the CSE algorithm doesn't have to be sophisticated enough
to account for these.

Before you start on about "premature optimization," note that I'm more concerned
with keeping the formulae readable, manageable, and debuggable than anything
else. Heck, with the size of the arrays I'm thinking about, doing lots of CSE
will probably kill my memory.
I tried a hack using the text of repr(compiler.transformer.parse(your_formula) and re-evaluating it
to rewrite the tree in prefix format. I got

'((1-2*nu)*((2*(1-nu)*ctB*ctB-nu)*log(Ry3bar) -\n (2*(1-nu)*ctB*ctB+1-2*nu)*cB*log(Rz3bar
)) -\n (1-2*nu)/Ry3bar*(y1*ctB*(1-2*nu-a/Rbar) + nu*ybar3 - a + \n y2*y2/Ry3ba
r*(nu+a/Rbar)) - \n (1-2*nu)*zbar3*ctB/Rz3bar*(cB+a/Rbar) - \n a*y1*(ybar3-a)*ctB/
Rbar3 + \n (ybar3-a)/Ry3bar*(-2*nu+((1-2*nu)*y1*ctB-a)/Rbar + \n y2*y2/Rbar/Ry
3bar*(2*nu+a/Rbar)+a*y2*y2/Rbar3) +\n (ybar3-a)/Rz3bar*(cB*cB-((1-2*nu)*zbar1*ctB+a*cB)/R
bar + \n a*ybar3*zbar1*ctB/Rbar3 - \n (y2*y2*cB*cB-a*zbar1*ctB/Rbar*(Rbar*
cB+ybar3))/Rbar/Rz3bar))/(4*pi*( 1-nu))' =>

"Module(None, Stmt([Discard(Div((Add((Add((Sub((Sub((Sub((Mul((Sub((Co nst(1), Mul((Const(2), Nam
e('nu'))))), Sub((Mul((Sub((Mul((Mul((Mul((Const(2), Sub((Const(1), Name('nu'))))), Name('ctB'))
), Name('ctB'))), Name('nu'))), CallFunc(Name('log'), [Name('Ry3bar')], None, None))), Mul((Mul(
(Sub((Add((Mul((Mul((Mul((Const(2), Sub((Const(1), Name('nu'))))), Name('ctB'))), Name('ctB'))),
Const(1))), Mul((Const(2), Name('nu'))))), Name('cB'))), CallFunc(Name('log'), [Name('Rz3bar')]
, None, None))))))), Mul((Div((Sub((Const(1), Mul((Const(2), Name('nu'))))), Name('Ry3bar'))), A
dd((Sub((Add((Mul((Mul((Name('y1'), Name('ctB'))), Sub((Sub((Const(1), Mul((Const(2), Name('nu')
)))), Div((Name('a'), Name('Rbar'))))))), Mul((Name('nu'), Name('ybar3'))))), Name('a'))), Mul((
Div((Mul((Name('y2'), Name('y2'))), Name('Ry3bar'))), Add((Name('nu'), Div((Name('a'), Name('Rba
r'))))))))))))), Mul((Div((Mul((Mul((Sub((Const(1), Mul((Const(2), Name('nu'))))), Name('zbar3')
)), Name('ctB'))), Name('Rz3bar'))), Add((Name('cB'), Div((Name('a'), Name('Rbar'))))))))), Div(
(Mul((Mul((Mul((Name('a'), Name('y1'))), Sub((Name('ybar3'), Name('a'))))), Name('ctB'))), Name(
'Rbar3'))))), Mul((Div((Sub((Name('ybar3'), Name('a'))), Name('Ry3bar'))), Add((Add((Add((Mul((U
narySub(Const(2)), Name('nu'))), Div((Sub((Mul((Mul((Sub((Const(1), Mul((Const(2), Name('nu'))))
), Name('y1'))), Name('ctB'))), Name('a'))), Name('Rbar'))))), Mul((Div((Div((Mul((Name('y2'), N
ame('y2'))), Name('Rbar'))), Name('Ry3bar'))), Add((Mul((Const(2), Name('nu'))), Div((Name('a'),
Name('Rbar'))))))))), Div((Mul((Mul((Name('a'), Name('y2'))), Name('y2'))), Name('Rbar3')))))))
)), Mul((Div((Sub((Name('ybar3'), Name('a'))), Name('Rz3bar'))), Sub((Add((Sub((Mul((Name('cB'),
Name('cB'))), Div((Add((Mul((Mul((Sub((Const(1), Mul((Const(2), Name('nu'))))), Name('zbar1')))
, Name('ctB'))), Mul((Name('a'), Name('cB'))))), Name('Rbar'))))), Div((Mul((Mul((Mul((Name('a')
, Name('ybar3'))), Name('zbar1'))), Name('ctB'))), Name('Rbar3'))))), Div((Div((Sub((Mul((Mul((M
ul((Name('y2'), Name('y2'))), Name('cB'))), Name('cB'))), Mul((Div((Mul((Mul((Name('a'), Name('z
bar1'))), Name('ctB'))), Name('Rbar'))), Add((Mul((Name('Rbar'), Name('cB'))), Name('ybar3')))))
)), Name('Rbar'))), Name('Rz3bar'))))))))), Mul((Mul((Const(4), Name('pi'))), Sub((Const(1), Nam
e('nu'))))))))]))" =>

s_push: parser stack overflow
exceptions.MemoryError:
Traceback (most recent call last):
File "C:\pywk\parse\cse.py", line 224, in ?
print 'cksrc returned ok ==%s' % cksrc(src, verbose)
File "C:\pywk\parse\cse.py", line 176, in cksrc
v = eval(ast_repr, env)
MemoryError

when I tried the whole thing.

On a tail part of your expression, it seems to work, so I wonder what I have to set higher.
(I have 320 mb RAM so I hope it's not really out of memory).

I got (input, parse, prefix-notation):

'(\n (ybar3-a)/Ry3bar*(-2*nu+((1-2*nu)*y1*ctB-a)/Rbar + \n y2*y2/Rbar/Ry3bar*(
2*nu+a/Rbar)+a*y2*y2/Rbar3) +\n (ybar3-a)/Rz3bar*(cB*cB-((1-2*nu)*zbar1*ctB+a*cB)/Rbar +
\n a*ybar3*zbar1*ctB/Rbar3 - \n (y2*y2*cB*cB-a*zbar1*ctB/Rbar*(Rbar*cB+yba
r3))/Rbar/Rz3bar))/(4*pi*( 1-nu))' =>

"Module(None, Stmt([Discard(Div((Add((Mul((Div((Sub((Name('ybar3'), Name('a'))), Name('Ry3bar'))
), Add((Add((Add((Mul((UnarySub(Const(2)), Name('nu'))), Div((Sub((Mul((Mul((Sub((Const(1), Mul(
(Const(2), Name('nu'))))), Name('y1'))), Name('ctB'))), Name('a'))), Name('Rbar'))))), Mul((Div(
(Div((Mul((Name('y2'), Name('y2'))), Name('Rbar'))), Name('Ry3bar'))), Add((Mul((Const(2), Name(
'nu'))), Div((Name('a'), Name('Rbar'))))))))), Div((Mul((Mul((Name('a'), Name('y2'))), Name('y2'
))), Name('Rbar3'))))))), Mul((Div((Sub((Name('ybar3'), Name('a'))), Name('Rz3bar'))), Sub((Add(
(Sub((Mul((Name('cB'), Name('cB'))), Div((Add((Mul((Mul((Sub((Const(1), Mul((Const(2), Name('nu'
))))), Name('zbar1'))), Name('ctB'))), Mul((Name('a'), Name('cB'))))), Name('Rbar'))))), Div((Mu
l((Mul((Mul((Name('a'), Name('ybar3'))), Name('zbar1'))), Name('ctB'))), Name('Rbar3'))))), Div(
(Div((Sub((Mul((Mul((Mul((Name('y2'), Name('y2'))), Name('cB'))), Name('cB'))), Mul((Div((Mul((M
ul((Name('a'), Name('zbar1'))), Name('ctB'))), Name('Rbar'))), Add((Mul((Name('Rbar'), Name('cB'
))), Name('ybar3'))))))), Name('Rbar'))), Name('Rz3bar'))))))))), Mul((Mul((Const(4), Name('pi')
)), Sub((Const(1), Name('nu'))))))))]))" =>

(/ (+ (* (/ (+ ybar3 -a) Ry3bar) (+ (* -2 nu) (/ (+ (* (+ 1 (* -2 nu)) y1 ctB) -a) Rbar) (* (/ (
* y2 y2) Rbar Ry3bar) (+ (* 2 nu) (/ a Rbar))) (/ (* a y2 y2) Rbar3))) (* (/ (+ ybar3 -a) Rz3bar
) (+ (* cB cB) (- (/ (+ (* (+ 1 (* -2 nu)) zbar1 ctB) (* a cB)) Rbar)) (/ (* a ybar3 zbar1 ctB)
Rbar3) (- (/ (+ (* y2 y2 cB cB) (* (- (/ (* a zbar1 ctB) Rbar)) (+ (* Rbar cB) ybar3))) Rbar Rz3
bar))))) (* 4 pi (+ 1 -nu)))

It certainly isn't the most efficient way, but it was kind of interesting. I made a step
toward canonicalizing by transforming everything (very alpha here ;-) to
(+ ...) or (* ...), where of course the latter can be elements of the former, and parens
make for some rewrites here and there, e.g.,

Expr> (a+b)-(c-d)
(+ a b -c d)

Expr> -(a+b+c)-(d*e*f)
(+ -a -b -c (* -d e f))

Expr> -(a+b+c)-(d*e*f)*(x*y)
(+ -a -b -c (* -d e f x y))

Expr> -(a+b+c)-(d*e*f)+(x*y)
(+ -a -b -c (* -d e f) (* x y))

Expr> foo(1,2,3)
(foo 1 2 3)

Expr> foo(2*nu-1)+3*bar-4
(+ (foo (+ (* 2 nu) -1)) (* 3 bar) -4)

Expr> foo(2*nu-1, 'another')+3*bar-4
(+ (foo (+ (* 2 nu) -1) 'another') (* 3 bar) -4)

Now if I sort the (+ ...) and (* ...) depth first then I think in the string representations
I might have some keys for a dict of CSE's. At least that's the hunch ;-)

But that stack overflow makes me wonder ...

My questions are:

1. Can I do better than brute-force traversing the AST and trying to match each
subtree to the other subtrees (using something like the match() function given
in the parser module example)? The question in my mind would be what kind of source transformations are valid
given that you have perhaps arbirary scalar and matrix mixes in your formulae?
Are you expecting the system to discover equivalent subexpressions that are not
typed verbatim in the same token order? Need to know the constraints on that.

I borrowed the output of compiler.transformer.parse and let it repr itself
to text, so that it would just look like a giant bunch of nested function calls,
and then wrote some functions to generate scheme-like code (represented by a subclass
of tuple with just a __repr__ override as the only method. And then exec'd the string
in an environment that would call my functions according to the big nest.

It would be faster for the computer (but not me at the moment ;-), to use the actual tree
and walk it to do what I did. I think I saw a tree visitor example someplace, so maybe it's
practically on a silver platter already. Will have to look for it. It's just that seeing
the nested function call representation gave me a simple idea to hack ;-)

2. Are there good tools out there for doing AST transformations like this? Are
there any GUI tools for messing with the AST? I know that I will want to
interactively select which subexpressions are eliminated in this fashion.
I suspect there are at least some useful helper functions to solve the problem,
but I haven't looked yet. Nor GUI. But I can visualize a Tkinter thing ;-)
3. Is there any example code around showing how to output Python code from the
AST? I can probably figure out, then, how to do the same for
Maxima/Yacas/Maple/Mathematica as I'm sure I'm going to want to slap these
formulae into a CAS and play around with them.
Hm, I guess what I did kind of models that, since the output is almost scheme.
(sorry I didn't get to pretty-printing it).
4. And as a corollary to that last thought: are there any automatic
differentiation tools for Python that operate directly on the AST and generate
new code? Symbolic? Like (+ (* x x ) 3) => (* 2 x) ?

Googling hasn't come up with much for me, but my brain is fried from typing in
these $%&@! formulae. Pointers to introductory compiler technique references
(online or dead tree) or other useful resources are welcome.

Sorry. SICP and Dragon book I guess. But the above was just an idea using a variant
of the technique I posted the other day as a possible way to detect unsafe code, which

The code is very grotty right now, so I won't post it as is.

BTW, scheme wasn't my goal, but I find it interesting that that is the direction
things went in the process of representing expressions and manipulating them for
the purpose of generating a canonical representation.

(This doesn't mean I don't prefer Python to write the manipulation code with ;-)

I haven't taken the next step yet, which is to generate the sorted sub-expressions
and track them (count occurrences of canonicalized string representations of them)
in a dict. They could be shown and be selectable for giving names (if there's better
names than CSE_1, CSE2, etc.) and choosing which to use for a rewrite or the overall formula.
I think it's doable. ;-)

Regards,
Bengt Richter

Jul 18 '05 #5

John J. Lee

Robert Kern <ke**@ugcs.caltech.edu> writes:
[...]

3. Is there any example code around showing how to output Python code from the
AST? I can probably figure out, then, how to do the same for
Maxima/Yacas/Maple/Mathematica as I'm sure I'm going to want to slap these
formulae into a CAS and play around with them.

4. And as a corollary to that last thought: are there any automatic
differentiation tools for Python that operate directly on the AST and generate
new code?

[...]

Why not just use Maple / Mathematica / whatever?

Why does Python have to be involved?
John

Jul 18 '05 #6

Duncan Booth

Robert Kern <ke**@ugcs.caltech.edu> wrote in
news:bm**********@naig.caltech.edu:

1. Can I do better than brute-force traversing the AST and trying to
match each subtree to the other subtrees (using something like the
match() function given in the parser module example)?

Use a dictionary.

If you convert the AST to nested tuples, then a code fragment such as the
one below will suffice to make all common subexpressions share the same
tuples. It fills 'info' with exactly one entry for each unique tuple in the
original ast and counts the number of times it occurs. It should be easy
enough then to go through the dictionary in order of increasing tuple
length) and if an expression occurs more than once create a new AST to
precalculate the value and assign to a temporary.

import parser

def optimise(ast, info):
new = []
for t in ast:
if isinstance(t, tuple):
new.append(optimise(t, info))
else:
new.append(t)

new = tuple(new)
new, oldcount = info.get(new, (new, 0))
info[new] = (new, oldcount+1)
return new

t = parser.ast2tuple(parser.expr("(1-x)*(1-x)/(1-x)"))
t_info = {}
t1 = optimise(t, t_info)
for node,v in t_info.values():
print v,repr(node)[:50]

--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Jul 18 '05 #7

AST Manipulation and Common Subexpression Elimination

Similar topics