-
Welcome
1 Lecutre -
Design of CPython’s Compiler
6 Lecutre -
Running a buildbot worker
2 Lecutre
Parse Trees
Python’s parser is an LL(1) parser mostly based off of the implementation laid out in the Dragon Book [Aho86].
The grammar file for Python can be found in Grammar/Grammar
with the numeric value of grammar rules stored in Include/graminit.h
. The list of types of tokens (literal tokens, such as :
, numbers, etc.) can be found in Grammar/Tokens
with the numeric value stored in Include/token.h
. The parse tree is made up of node *
structs (as defined in Include/node.h
).
Querying data from the node structs can be done with the following macros (which are all defined in Include/node.h
):
CHILD(node *, int)
- Returns the nth child of the node using zero-offset indexing
RCHILD(node *, int)
- Returns the nth child of the node from the right side; use negative numbers!
NCH(node *)
- Number of children the node has
STR(node *)
- String representation of the node; e.g., will return
:
for aCOLON
token TYPE(node *)
- The type of node as specified in
Include/graminit.h
REQ(node *, TYPE)
- Assert that the node is the type that is expected
LINENO(node *)
- Retrieve the line number of the source code that led to the creation of the parse rule; defined in
Python/ast.c
For example, consider the rule for ‘while’:
while_stmt ::= "while"expression
":"suite
: ["else" ":"suite
]
The node representing this will have TYPE(node) == while_stmt
and the number of children can be 4 or 7 depending on whether there is an ‘else’ statement. REQ(CHILD(node, 2), COLON)
can be used to access what should be the first :
and require it be an actual :
token.