Python’s parser is an LL(1) parser mostly based off of the implementation laid out in the Dragon Book [Aho86].
The grammar file for Python can be found in
Grammar/Grammar with the numeric value of grammar rules stored in
Include/graminit.h. The list of types of tokens (literal tokens, such as
:, numbers, etc.) can be found in
Grammar/Tokens with the numeric value stored in
Include/token.h. The parse tree is made up of
node *structs (as defined in
Querying data from the node structs can be done with the following macros (which are all defined in
CHILD(node *, int)
- Returns the nth child of the node using zero-offset indexing
RCHILD(node *, int)
- Returns the nth child of the node from the right side; use negative numbers!
- Number of children the node has
- String representation of the node; e.g., will return
- The type of node as specified in
REQ(node *, TYPE)
- Assert that the node is the type that is expected
- Retrieve the line number of the source code that led to the creation of the parse rule; defined in
For example, consider the rule for ‘while’:
while_stmt ::= "while"
suite: ["else" ":"
The node representing this will have
TYPE(node) == while_stmt and the number of children can be 4 or 7 depending on whether there is an ‘else’ statement.
REQ(CHILD(node, 2), COLON) can be used to access what should be the first
: and require it be an actual