Expression parsing by precedence climbing and token insertion

This page is out of date.

See the repository Precedence Climbing Parsing based on Binding Powers and Token Insertion in Python on GitHub

To facilitate the parser's work, dummy operands are inserted as additional tokens before prefix and after postfix operators. These operands virtually convert prefix and postfix operators to infix operators. All operators, even prefix and postfix, have left and right binding powers.

The algorithm used here is called precedence climbing. Precedence climbing, Pratt parsing and the shunting yard algorithm are variants of operator precedence parsing.

The parsers are table-driven. The syntax definitions are contained in relatively simple Python data structures.

Before running the Python files change the file extensions from .txt to .py.

minimalpcp: A minimal precedence climbing parser in Python

Expressions consisting of atoms (identifiers, literal numbers) and operators (prefix, unary infix, unary postfix) can be parsed.

Read the notes (pdf).

Python versions 3.* and 2.7.* should work.

There is only a rudimentary tokenizer and no error reporting.

Python code   Parser (minimalpcp)
 Unittest (minimalpcp_test)

simplepcp: A precedence climbing parser in Python

In addition to minimalpcp, simplepcp can parse subexpressions enclosed in parentheses, function invocations and some kinds of control structures. The control structures can have pre- or post-operands (or both). Therefore the "ternary operator" ... ? ... : ... and index expressions, such as a[i], can be parsed.
Intermediate keywords in control structures (such as "then" in "if ... then ... else ...") are not allowed.

A "real" tokenizer is used. There is essentially no error reporting.
Read the notes contained in the source of simplepcp.

Python versions 3.* should work.

Python code   Parser (simplepcp)
 Tokenizer (simplerawtokenizer), required by simplepcp
 Unittest (simplepcp_test)

2018-06-05
Contact:  JoeCreu       j [dot] creu [at] e49 [dot] de