Sunday, July 11, 2021

roadmap for identifying mathematical variables in Latex documents

Context: I think there's a graph of mathematical expressions in Physics. That graph has math expressions as nodes and the nodes are related by inference rules. The graph spans textbooks and journal articles. The scope of this post is limited to the graph within one document. 

For determining consistency of mathematical content within one document, the first step is to identify mathematical variables and operators. In practice, variables commonly appear inline as $a$ or within an expression like

\begin{equation}
a = 2
\end{equation}

Another way variables appear in text is within \newcommand, like

\newcommand{\R}{\mathbb{R}}

or

\newcommand{\bb}[1]{\mathbb{#1}}
Other numerical systems have similar notations. 
The complex numbers \( \bb{C} \), the rational 
numbers \( \bb{Q} \) and the integer numbers \( \bb{Z} \).

source

For addressing the complexity of expanding newcommand macros, see https://stackoverflow.com/questions/1509799/how-to-replace-latex-macros-with-their-definitions-using-latex


Once variables and operators are identified, the next step would be to associate each symbol with the respective definition (e.g., wikipedia links) and dimensions (e.g., length, time, charge). 



Options

https://pypi.org/project/TexSoup/, https://texsoup.alvinwan.com/, https://github.com/alvinwan/TexSoup, https://stackoverflow.com/users/4855984/alvin-wan

https://pylatexenc.readthedocs.io/en/latest/latexwalker/, https://github.com/phfaist/pylatexenc/

No comments:

Post a Comment