Physics Derivation Graph: abstract syntax tree

Showing posts with label abstract syntax tree. Show all posts

Friday, May 1, 2020

sympy to AST using latex

Given an expression in Latex, extract the symbols:
based on https://stackoverflow.com/a/59843709/1164295

>>> import sympy
>>> from sympy.parsing.latex import parse_latex
>>> symp_lat = parse_latex('x^2 + a x + b = 0')

>>> symp_lat.atoms(sympy.Symbol)
{x, b, a}

Given an expression in Latex, generate the graphviz of the AST:
from https://docs.sympy.org/latest/tutorial/manipulation.html
see https://docs.sympy.org/latest/modules/printing.html#sympy.printing.dot.dotprint

>>> graphviz_of_AST_for_expr = sympy.printing.dot.dotprint(symp_lat)

Monday, February 3, 2020

from Latex to Abstract Syntax Tree

In the latest revision to the Physics Derivation Graph, the tuple (unique expression identifier, latex expression) has been replaced with (unique expression identifier, latex expression, abstract syntax tree). This is similar to the split between "presentation MathML" and "content MathML." This distinction requires a translation between a (visually pleasing and easy to input representation) and (a mathematically meaningful representation).

Latex will be input by the user for the PDG; the user will not need to supply the AST as input. To validate a step, the AST is needed. This presents a few challenges:

Is the input valid tex?
Is the valid tex a mathematical expression?
Is the valid mathematical expression consistent with the step?

A step in a derivation is defined as the application of a single inference rule with one or more expressions as input, feed, or output.

There are a few options for parsing mathematical tex:

write a custom parser
use an existing parser, e.g. MathJax

Wednesday, June 7, 2017

representing inference rules as both LaTeX and Abstract Syntax Trees

All inference rules in the Physics Derivation Graph are written in LaTeX. See the full list at
https://github.com/allofphysicsgraph/proofofconcept/tree/gh-pages/v4_file_per_expression/inference_rules
For example, the inference rule "add X to both sides" in LaTeX is
Add $#1$ to both sides of Eq.~\ref{eq:#2}.

AST representation in plain text

https://calculem.us/abstract-binding-trees-1/
Inference rules are transformations to the abstract syntax trees that represent expressions.
For example, the "add X to both sides" (addition property of equality) does the following transform:

input:expression
op
LHS
RHS

input:feed
x

output:expression
op
+
LHS
x
+
RHS
x

Here I'm using a two space indent to show the tree structure of the AST.
The "LHS" and "RHS" are sides of the expression. The "op" is the operator relating LHS and RHS.
I wanted a format that is visually accessible and not to verbose, while capable of being converted to some other format.

Order matters

My AST representation needs to include order. The expression "a-b" is distinct from "b-a" even though a tree doesn't specify the order:

input:expression:1
op
c
-
a
b

which is distinct from
input:expression:2
op
c
-
b
a

This also applies to cross product since it's also non-commutative.
To provide clarification, I'll assume the "top-to-bottom" order in the above format corresponds to "left-to-right." With that specification, the top AST corresponds to "c=a-b" and the bottom AST is "c=b-a".

AST for integrals and derivatives

Shown here: https://tug.org/TUGboat/tb12-3-4/tb33arnon.pdf

Mentioned here (http://www.math.wpi.edu/IQP/BVCalcHist/calc5.html) but not explored explicitly.

A definite integral in Latex
\int_{low}^{high} LHS d(x) = \int_{low}^{high} RHS d(x)
can be written as an AST:

input:expression
op
\int
low
high
LHS
x
\int
low
high
RHS
x

Similarly, a differential equation in Latex
\frac{d}{d(x)} LHS = \frac{d}{d(x)} RHS
can be written as an AST:
input:expression
op
dif
LHS
x
dif
RHS
x

AST for Dirac notation

Distinguishing input and output expressions

Some inference rules act on multiple expressions, and some inference rules produce multiple expressions (ie the taking the square root). Here's the AST for "add Eq1 to Eq2":

input:expression:1
op
LHS:1
RHS:1

input:expression:2
op
LHS:2
RHS:2

output:expression
op
+
LHS:1
LHS:2
+
RHS:1
RHS:2

Complicated expressions as ASTs

Some expressions are more complicated than simply "LHS = RHS". Suppose we have an expression
y = { x^2 for x>0
{ 0 for x<=0

I don't know how to represent this as an AST. Here's an attempt:

op
y
set
domain
^
x
2
>
x
0
domain
0
<=
x
0

I needed to introduce two new symbols: "set" and "domain"

Related work

Mathlex thesis

http://mathlex.org/doc/how-mathlex-works