473,327 Members | 2,025 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

How does python build its AST

A quick question about how python parses a file into compiled
bytecode. Does it parse the whole file into AST first and then compile
the AST, or does it build and compile the AST on the fly as it reads
expressions? (If the former case, why can't functions be called before
their definitions?)

Thanks,
Jordan
Dec 7 '07 #1
11 2673
On Dec 7, 3:23 pm, MonkeeSage <MonkeeS...@gmail.comwrote:
A quick question about how python parses a file into compiled
bytecode. Does it parse the whole file into AST first and then compile
the AST, or does it build and compile the AST on the fly as it reads
expressions? (If the former case, why can't functions be called before
their definitions?)

Thanks,
Jordan
Python uses a highly optimized table based LL(1) parser to create a
syntax tree. In Python 2.5 it transforms the concrete syntax tree
( CST ) into an AST before compilation. Before that it compiled the
CST directly. I'm not sure what you are asking for ( in parentheses )?
Parser actions or preprocessing the tree? The latter is definitely
possible and you can build your own compilation machinery using the
parser module and the compile function.

Kay
Dec 7 '07 #2
On Dec 7, 9:50 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
On Dec 7, 3:23 pm, MonkeeSage <MonkeeS...@gmail.comwrote:
A quick question about how python parses a file into compiled
bytecode. Does it parse the whole file into AST first and then compile
the AST, or does it build and compile the AST on the fly as it reads
expressions? (If the former case, why can't functions be called before
their definitions?)
Thanks,
Jordan

Python uses a highly optimized table based LL(1) parser to create a
syntax tree. In Python 2.5 it transforms the concrete syntax tree
( CST ) into an AST before compilation. Before that it compiled the
CST directly. I'm not sure what you are asking for ( in parentheses )?
Parser actions or preprocessing the tree? The latter is definitely
possible and you can build your own compilation machinery using the
parser module and the compile function.

Kay
Thanks for your reply. You answered my main question. The secondary
question is why is it a NameError to try to use a variable/function
prior to the declaration in a source file, since python has already
seen the declaration on the first pass building the CST/AST? At
compile time, shouldn't it already know about it? (Forgive my
ignorance.)

Regards,
Jordan
Dec 7 '07 #3
On Dec 7, 9:03 am, MonkeeSage <MonkeeS...@gmail.comwrote:
On Dec 7, 9:50 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
On Dec 7, 3:23 pm, MonkeeSage <MonkeeS...@gmail.comwrote:
A quick question about how python parses a file into compiled
bytecode. Does it parse the whole file into AST first and then compile
the AST, or does it build and compile the AST on the fly as it reads
expressions? (If the former case, why can't functions be called before
their definitions?)
Thanks,
Jordan
Python uses a highly optimized table based LL(1) parser to create a
syntax tree. In Python 2.5 it transforms the concrete syntax tree
( CST ) into an AST before compilation. Before that it compiled the
CST directly. I'm not sure what you are asking for ( in parentheses )?
Parser actions or preprocessing the tree? The latter is definitely
possible and you can build your own compilation machinery using the
parser module and the compile function.
Kay

Thanks for your reply. You answered my main question. The secondary
question is why is it a NameError to try to use a variable/function
prior to the declaration in a source file, since python has already
seen the declaration on the first pass building the CST/AST? At
compile time, shouldn't it already know about it? (Forgive my
ignorance.)

Regards,
Jordan
Remember that Python is a highly dynamic language. You can't
guarantee that a name will be accessible until the actual execution
point that you try to access it. For example:
>>Hello() # What should this call?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'Hello' is not defined
>># The standard hello function
def Hello(): print 'Hello!'
....
>>Hello()
Hello!
>>>
# Modification through the globals dictionary
def Vikings(): print 'Spam, spam, spam! Spam!'
....
>>globals()[ 'olleH'[::-1] ] = Vikings
Hello()
Spam, spam, spam! Spam!
>>>
# Another dictionary-based modification
globals()['Hello'] = lambda: 'Go away!'
Hello()
'Go away!'
>>>
# Remove the syntactic sugar and make a function object directly
import new
import compiler
new_code = compiler.compile( 'print "Die in a fire!"', 'Hateful', 'single')
Hello = new.function( new_code, {}, 'Angry')
Hello()
Die in a fire!
>>>
# A callable object (not a function!)
class AnnoyingNeighbor(object):
.... def __call__(self):
.... print 'Hi-diddly-ho, neighbor!'
....
>>Hello = AnnoyingNeighbor()
Hello()
Hi-diddly-ho, neighbor!
>>>
If this was in a file, which version of Hello should the first call go
to? Should the 'Hello' name be bound to the function, the callable
class instance, the result of new.function? What if another
statement

The problem is that, in Python, functions are first-class objects,
just like class definitions and other variables. The 'def' statement
is something that gets executed. It compiles the code block for the
function into a function object, then assigns it to the name of the
function.

When Python hits a function call, it must look up the name in the
current name dictionary. In any but that most simple cases, the name
may not refer to its previous values: any intervening calls or
statements could have rebound that name to another object, either
directly, through side-effects, or even through the C interface. You
can't call a function before its def statement is parsed any more than
you can print the value of a variable before you assign anything to
it.

Unlike C, functions are not special under Python. Unlike C++, classes
aren't terribly special either. They have syntactic sugar to help the
coding go down, but they are only Python objects, and are bound to the
same rules as all other Python objects.

--Jason
Dec 7 '07 #4
On Dec 7, 5:03 pm, MonkeeSage <MonkeeS...@gmail.comwrote:
On Dec 7, 9:50 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
On Dec 7, 3:23 pm, MonkeeSage <MonkeeS...@gmail.comwrote:
A quick question about how python parses a file into compiled
bytecode. Does it parse the whole file into AST first and then compile
the AST, or does it build and compile the AST on the fly as it reads
expressions? (If the former case, why can't functions be called before
their definitions?)
Thanks,
Jordan
Python uses a highly optimized table based LL(1) parser to create a
syntax tree. In Python 2.5 it transforms the concrete syntax tree
( CST ) into an AST before compilation. Before that it compiled the
CST directly. I'm not sure what you are asking for ( in parentheses )?
Parser actions or preprocessing the tree? The latter is definitely
possible and you can build your own compilation machinery using the
parser module and the compile function.
Kay

Thanks for your reply. You answered my main question. The secondary
question is why is it a NameError to try to use a variable/function
prior to the declaration in a source file, since python has already
seen the declaration on the first pass building the CST/AST? At
compile time, shouldn't it already know about it? (Forgive my
ignorance.)

Regards,
Jordan
When the compiler finds a name not being bound in the function scope
it supposes the name is global and a LOAD_GLOBAL opcode is created
instead of LOAD_FAST ( accessing locals ). Other than the local/
function scope the global ( module level ) scope is dynamic and the
compiler can't know whether a name will be present at runtime.

Kay
Dec 7 '07 #5
MonkeeSage wrote:
A quick question about how python parses a file into compiled
bytecode. Does it parse the whole file into AST first and then compile
the AST, or does it build and compile the AST on the fly as it reads
expressions? (If the former case, why can't functions be called before
their definitions?)

Thanks,
Jordan
Python (2.5+) does parse the whole file top-to-bottom into an AST, and then
compile the entire AST, resulting in a module code object that may contain other
compiled code objects for the function bodies.

However, this isn't relevant to your parenthesized question, since compiling
functions does not make them callable. A function becomes available only once
its def statement has been executed. Since execution (generally) proceeds top
to bottom, this is why functions can't be called before their definitions.

HTH
Michael
Dec 7 '07 #6

"MonkeeSage" <Mo********@gmail.comwrote in message
news:79**********************************@x69g2000 hsx.googlegroups.com...
|A quick question about how python parses a file into compiled
| bytecode. Does it parse the whole file into AST first and then compile
| the AST, or does it build and compile the AST on the fly as it reads
| expressions? (If the former case, why can't functions be called before
| their definitions?)

The direct answer is that names cannot be entered into namespaces and bound
to objects to be looked up until the corresponding object is created by
executing the corresponding code. Compiling Python code creates the
internal code needed to create Python objects, but only exceptionally
creates Python objects in the process. In particular, compiling a function
may create code objects (since the code is a constant) referenced by the
function creation code, but not function objects themselves.

A less direct answer is the Python is designed to by usable interactively.
In CPython interactive mode, you enter and the interpreter compiles and
executes one top(module)-level statement at a time. Calling a function you
have not yet entered would be magical.

Terry Jan Reedy

Dec 7 '07 #7
On Dec 7, 4:29 pm, "Terry Reedy" <tjre...@udel.eduwrote:
"MonkeeSage" <MonkeeS...@gmail.comwrote in message

news:79**********************************@x69g2000 hsx.googlegroups.com...
|A quick question about how python parses a file into compiled
| bytecode. Does it parse the whole file into AST first and then compile
| the AST, or does it build and compile the AST on the fly as it reads
| expressions? (If the former case, why can't functions be called before
| their definitions?)

The direct answer is that names cannot be entered into namespaces and bound
to objects to be looked up until the corresponding object is created by
executing the corresponding code. Compiling Python code creates the
internal code needed to create Python objects, but only exceptionally
creates Python objects in the process. In particular, compiling a function
may create code objects (since the code is a constant) referenced by the
function creation code, but not function objects themselves.

A less direct answer is the Python is designed to by usable interactively.
In CPython interactive mode, you enter and the interpreter compiles and
executes one top(module)-level statement at a time. Calling a function you
have not yet entered would be magical.

Terry Jan Reedy
Thanks for your replies Kay, Michael and Terry. To summarize my
understanding of your answers:

- Python (meaning CPython here) first does a parsing pass over the
entire file, with 2.5+ building a syntax tree and prior versions
building a parse tree.

- It then compiles the tree into bytecode, by walking down the nodes
recursively from the top down.

- When in interactive mode (e.g., python prompt), it builds the tree
and compiles it on the fly, as individual expressions are parsed.

Is this correct? If so, may I pick your brain on two more points?

1.) What is the benefit of doing a two phase compilation (parsing/
compiling), rather than a single, joint parse + compile phase (as in
interactive mode)?

2.) Wouldn't it be possible on the parsing phase to "tag" names as
valid, even if they occur prior to the assignment of the name, if on a
later branch that assignment is found (and have the compiler be aware
of such tags)?

The reason I'm wondering about these things is that on a different
group, it came up that perl allows referencing before assignment,
which seems to require a two-phase compilation, which made me wonder
how python does things. And since (if I haven't misunderstood), python
does use two-phase compilation, I just wondered if it would be
possible to do what perl does. I'm not advocating it as a feature for
python (it seems bass-ackwards to me to reference a name before it's
assigned to in the script), this is just curiosity.

Thanks,
Jordan
Dec 8 '07 #8

"MonkeeSage" <Mo********@gmail.comwrote in message
news:7e**********************************@t47g2000 hsc.googlegroups.com...
| 1.) What is the benefit of doing a two phase compilation (parsing/
| compiling), rather than a single, joint parse + compile phase (as in
| interactive mode)?

As far as I know (without looking at the code), there is no difference
between interactive and batch mode except that the unit processed is a
statement versus file.

| 2.) Wouldn't it be possible on the parsing phase to "tag" names as
| valid, even if they occur prior to the assignment of the name, if on a
| later branch that assignment is found (and have the compiler be aware
| of such tags)?

What would be the point? The semantics of Python code is essentially
independent of whether it is executed in interactive or batch mode. (The
exceptions are not relevant to your question.) So there is no point I can
see to doing something in file mode that could not be done in statement
mode.

tjr

Dec 8 '07 #9
On Dec 8, 12:20 am, "Terry Reedy" <tjre...@udel.eduwrote:
"MonkeeSage" <MonkeeS...@gmail.comwrote in message

news:7e**********************************@t47g2000 hsc.googlegroups.com...
| 1.) What is the benefit of doing a two phase compilation (parsing/
| compiling), rather than a single, joint parse + compile phase (as in
| interactive mode)?

As far as I know (without looking at the code), there is no difference
between interactive and batch mode except that the unit processed is a
statement versus file.
I see.
| 2.) Wouldn't it be possible on the parsing phase to "tag" names as
| valid, even if they occur prior to the assignment of the name, if on a
| later branch that assignment is found (and have the compiler be aware
| of such tags)?

What would be the point? The semantics of Python code is essentially
independent of whether it is executed in interactive or batch mode. (The
exceptions are not relevant to your question.) So there is no point I can
see to doing something in file mode that could not be done in statement
mode.
It would pretty much be pointless. Referencing a name before it's
assigned seems confusing and strange to me. I suppose one sane use
would be in implementing something like Haskell's "where" clause, but
I don't think that would work well (or even be very useful) in python.
Anyhow, this was more of an academic curiosity on how CPython does
things, and I appreciate your answers.
tjr
Regards,
Jordan
Dec 8 '07 #10
On Dec 7, 9:23 am, MonkeeSage <MonkeeS...@gmail.comwrote:
A quick question about how python parses a file into compiled
bytecode. Does it parse the whole file into AST first and then compile
the AST, or does it build and compile the AST on the fly as it reads
expressions? (If the former case, why can't functions be called before
their definitions?)
Python creates certain objects at compile time but doesn't bind them
to names in the modulespace until run time.

Python could--and many other languages do--automatically bind these
objects to names upon import. Python doesn't do it because it sees a
module as "code to be executed" rather than a "list of global object
definitions".

Something like this would be awkward if Python bound the names at
import time:

if X:
def a(): do_this()
else:
def a(): do_that()

Which one gets bound to a?

To do something similar in C would require preprocessor macros (ick).
Carl Banks
Dec 8 '07 #11
On Dec 8, 3:32 am, Carl Banks <pavlovevide...@gmail.comwrote:
On Dec 7, 9:23 am, MonkeeSage <MonkeeS...@gmail.comwrote:
A quick question about how python parses a file into compiled
bytecode. Does it parse the whole file into AST first and then compile
the AST, or does it build and compile the AST on the fly as it reads
expressions? (If the former case, why can't functions be called before
their definitions?)

Python creates certain objects at compile time but doesn't bind them
to names in the modulespace until run time.

Python could--and many other languages do--automatically bind these
objects to names upon import. Python doesn't do it because it sees a
module as "code to be executed" rather than a "list of global object
definitions".

Something like this would be awkward if Python bound the names at
import time:

if X:
def a(): do_this()
else:
def a(): do_that()

Which one gets bound to a?

To do something similar in C would require preprocessor macros (ick).

Carl Banks
Gotcha. Thanks again everyone for your answers.

Regards,
Jordan
Dec 8 '07 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

30
by: Christian Seberino | last post by:
How does Ruby compare to Python?? How good is DESIGN of Ruby compared to Python? Python's design is godly. I'm wondering if Ruby's is godly too. I've heard it has solid OOP design but then...
2
by: Martin Kulas | last post by:
Hallo! I have a problem with Py_BuildValue: I want to convert an unsigned int to a PyObject *. http://docs.python.org/api/arg-parsing.html says that I can use "I" as a format string. But it...
0
by: venkatbo | last post by:
Hi, I'm trying to cross compile a C extension - on a i686 linux box, targeting a ppc-linux box. get_config_vars(*args): global _config_vars if _config_vars is None: ... else:
0
by: Kurt B. Kaiser | last post by:
Patch / Bug Summary ___________________ Patches : 431 open ( +3) / 3425 closed ( +8) / 3856 total (+11) Bugs : 916 open (-23) / 6273 closed (+44) / 7189 total (+21) RFE : 244 open...
113
by: John Nagle | last post by:
The major complaint I have about Python is that the packages which connect it to other software components all seem to have serious problems. As long as you don't need to talk to anything outside...
3
by: Daniel Fetchinson | last post by:
Does Python 2.5.2's embedded SQLite support full text searching? Sqlite itself is not distributed with python. Only a python db api compliant wrapper is part of the python stdlib and as such it...
0
by: mg | last post by:
When make gets to the _ctypes section, I am getting the following in my output: building '_ctypes' extension creating build/temp.solaris-2.10-i86pc-2.5/home/ecuser/Python-2.5.1/ Modules/_ctypes...
0
by: Akira Kitada | last post by:
Hi list, I was trying to build Python 2.6 on FreeBSD 4.11 and found it failed to build some of the modules. """ Failed to find the necessary bits to build these modules: _bsddb ...
0
by: Akira Kitada | last post by:
Hi Marc-Andre, Thanks for the suggestion. I opened a ticket for this issue: http://bugs.python.org/issue4204 Now I understand the state of the multiprocessing module, but it's too bad to see...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.