Given an expression in Latex, extract the symbols:
based on https://stackoverflow.com/a/59843709/1164295
>>> import sympy
>>> from sympy.parsing.latex import parse_latex
>>> symp_lat = parse_latex('x^2 + a x + b = 0')
>>> symp_lat.atoms(sympy.Symbol)
{x, b, a}
Given an expression in Latex, generate the graphviz of the AST:
from https://docs.sympy.org/latest/tutorial/manipulation.html
see https://docs.sympy.org/latest/modules/printing.html#sympy.printing.dot.dotprint
>>> graphviz_of_AST_for_expr = sympy.printing.dot.dotprint(symp_lat)
Showing posts with label abstract syntax tree. Show all posts
Showing posts with label abstract syntax tree. Show all posts
Friday, May 1, 2020
sympy to AST using latex
Monday, February 3, 2020
from Latex to Abstract Syntax Tree
In the latest revision to the Physics Derivation Graph, the tuple (unique expression identifier, latex expression) has been replaced with (unique expression identifier, latex expression, abstract syntax tree). This is similar to the split between "presentation MathML" and "content MathML." This distinction requires a translation between a (visually pleasing and easy to input representation) and (a mathematically meaningful representation).
Latex will be input by the user for the PDG; the user will not need to supply the AST as input. To validate a step, the AST is needed. This presents a few challenges:
Latex will be input by the user for the PDG; the user will not need to supply the AST as input. To validate a step, the AST is needed. This presents a few challenges:
- Is the input valid tex?
- Is the valid tex a mathematical expression?
- Is the valid mathematical expression consistent with the step?
A step in a derivation is defined as the application of a single inference rule with one or more expressions as input, feed, or output.
There are a few options for parsing mathematical tex:
- write a custom parser
- use an existing parser, e.g. MathJax
Wednesday, June 7, 2017
representing inference rules as both LaTeX and Abstract Syntax Trees
All inference rules in the Physics Derivation Graph are written in LaTeX. See the full list at
https://github.com/allofphysicsgraph/proofofconcept/tree/gh-pages/v4_file_per_expression/inference_rules
For example, the inference rule "add X to both sides" in LaTeX is
Add $#1$ to both sides of Eq.~\ref{eq:#2}.
Inference rules are transformations to the abstract syntax trees that represent expressions.
For example, the "add X to both sides" (addition property of equality) does the following transform:
input:expression
op
LHS
RHS
input:feed
x
output:expression
op
+
LHS
x
+
RHS
x
Here I'm using a two space indent to show the tree structure of the AST.
The "LHS" and "RHS" are sides of the expression. The "op" is the operator relating LHS and RHS.
I wanted a format that is visually accessible and not to verbose, while capable of being converted to some other format.
input:expression:1
op
c
-
a
b
which is distinct from
input:expression:2
op
c
-
b
a
This also applies to cross product since it's also non-commutative.
To provide clarification, I'll assume the "top-to-bottom" order in the above format corresponds to "left-to-right." With that specification, the top AST corresponds to "c=a-b" and the bottom AST is "c=b-a".
Mentioned here (http://www.math.wpi.edu/IQP/BVCalcHist/calc5.html) but not explored explicitly.
A definite integral in Latex
\int_{low}^{high} LHS d(x) = \int_{low}^{high} RHS d(x)
can be written as an AST:
input:expression
op
\int
low
high
LHS
x
\int
low
high
RHS
x
Similarly, a differential equation in Latex
\frac{d}{d(x)} LHS = \frac{d}{d(x)} RHS
can be written as an AST:
input:expression
op
dif
LHS
x
dif
RHS
x
AST for Dirac notation
input:expression:1
op
LHS:1
RHS:1
input:expression:2
op
LHS:2
RHS:2
output:expression
op
+
LHS:1
LHS:2
+
RHS:1
RHS:2
y = { x^2 for x>0
{ 0 for x<=0
I don't know how to represent this as an AST. Here's an attempt:
op
y
set
domain
^
x
2
>
x
0
domain
0
<=
x
0
I needed to introduce two new symbols: "set" and "domain"
https://github.com/allofphysicsgraph/proofofconcept/tree/gh-pages/v4_file_per_expression/inference_rules
For example, the inference rule "add X to both sides" in LaTeX is
Add $#1$ to both sides of Eq.~\ref{eq:#2}.
AST representation in plain text
https://calculem.us/abstract-binding-trees-1/Inference rules are transformations to the abstract syntax trees that represent expressions.
For example, the "add X to both sides" (addition property of equality) does the following transform:
input:expression
op
LHS
RHS
input:feed
x
output:expression
op
+
LHS
x
+
RHS
x
Here I'm using a two space indent to show the tree structure of the AST.
The "LHS" and "RHS" are sides of the expression. The "op" is the operator relating LHS and RHS.
I wanted a format that is visually accessible and not to verbose, while capable of being converted to some other format.
Order matters
My AST representation needs to include order. The expression "a-b" is distinct from "b-a" even though a tree doesn't specify the order:input:expression:1
op
c
-
a
b
which is distinct from
input:expression:2
op
c
-
b
a
This also applies to cross product since it's also non-commutative.
To provide clarification, I'll assume the "top-to-bottom" order in the above format corresponds to "left-to-right." With that specification, the top AST corresponds to "c=a-b" and the bottom AST is "c=b-a".
AST for integrals and derivatives
Shown here: https://tug.org/TUGboat/tb12-3-4/tb33arnon.pdf
A definite integral in Latex
\int_{low}^{high} LHS d(x) = \int_{low}^{high} RHS d(x)
can be written as an AST:
input:expression
op
\int
low
high
LHS
x
\int
low
high
RHS
x
Similarly, a differential equation in Latex
\frac{d}{d(x)} LHS = \frac{d}{d(x)} RHS
can be written as an AST:
input:expression
op
dif
LHS
x
dif
RHS
x
AST for Dirac notation
Distinguishing input and output expressions
Some inference rules act on multiple expressions, and some inference rules produce multiple expressions (ie the taking the square root). Here's the AST for "add Eq1 to Eq2":input:expression:1
op
LHS:1
RHS:1
input:expression:2
op
LHS:2
RHS:2
output:expression
op
+
LHS:1
LHS:2
+
RHS:1
RHS:2
Complicated expressions as ASTs
Some expressions are more complicated than simply "LHS = RHS". Suppose we have an expressiony = { x^2 for x>0
{ 0 for x<=0
I don't know how to represent this as an AST. Here's an attempt:
op
y
set
domain
^
x
2
>
x
0
domain
0
<=
x
0
I needed to introduce two new symbols: "set" and "domain"
Related work
Subscribe to:
Posts (Atom)