Saturday, August 15, 2020

plan of record for parsing Latex expressions

I'm assuming there's an interactive feedback loop with the user in the Physics Derivation Graph, whereas that's not the case for bulk content like arXiv. How to respond to ambiguity depends on whether we can assume the user is available for clarifications.


Given an input string to parse,
  1. Is the string valid Latex? If yes, continue; if no, return to user with complaint
  2. Is the string valid mathematical Latex? If yes, continue; if no, return to user with complaint
  3. Can the mathematical Latex be parsed without ambiguity? If yes, return SymPy to user; if no, continue
  4. If there is ambiguity, can the ambiguity be resolved by used a different flavor of the grammar? If no, return the options to the user so they can select the right parsing.

Removing markup specific to display may be relevant. For example, replacing "\ " with " " and replacing "\quad" with " " and replacing "\qquad" with " " and replacing "\left(" with "(" would reduce the parser workload.

Example of invalid math Latex:
\frac a b

The user probably intended 
\frac{a}{b}

No comments:

Post a Comment