Wednesday, January 5, 2022

two hours to manually transcribe three typed pages of math into 150 lines of HTML+Latex

I have about 20 boxes of notes from roughly 10 years of undergrad and graduate classes in Math and Physics. I've kept the notes for the past 10 years with the intent of converting the notes into a structured and computer-readable format. The current site https://derivationmap.net/ shows the proof-of-concept that the technical capability is feasible. 

I spent two hours manually transcribing three typed pages of math into https://derivationmap.net/class_notes/math402_mathematical_physics_hale. The resulting HTML with MathJax was about 150 lines. By the end my attention/focus was waning, so a break was necessary. 

The typed notes are merely transcription into computer-readable format; the notes are not in the "equation graph" form necessary for the Physics Derivation Graph. That is a separate tedious process. 

Observations:

  • An estimate of how long transcribing all the notes would take:
    • If I type HTML+MathJax for 4 hours a day, that seems reasonable
    • If a box of notes is 1 ream (500 pages), (500 pages/box)*(2 hours/3 pages)*(20 boxes)=6666 hours
    • 6666 hours/(4 hours/day) = 1666.5 days, or 4.5 years
  • There are many diagrams that go with the text and equations. The three pages in this sample didn't have diagrams. Converting the diagrams into tikz would be time consuming.
  • The structure of the notes is not compatible with the constraints of the "equation graph" structure. 
  • I expect that scanning + OCR would be of limited value for producing Latex. Also, much of the notation in the notes is sloppy and requires translation to more rigorous notation. 
  • Even the more "rigorous notation" is merely an improvement; the notation is a long ways from the specificity needed for use in a computer algebra system. As an example, 
    • the original format for an inner product is (x, y) -- here "x" and "y" are merely bolded
    • my more rigorous notation is $(\vec{x}, \vec{y})$
    • Multiplying the inner product by a complex value $\alpha(\vec{x}, \vec{y})$ is visually ambiguous -- it could be interpreted that $\vec{x}$ and $\vec{y}$ are arguments to the function $\alpha$. See also this question.
  • Notes for a course are either a single page (with thousands of equations, which render in MathJax slowly) or split among many pages (increasing the number of context switches)

If typing every page of notes into HTML+MathJax is not reasonable (or useful), then identifying what part of the notes are useful should be the objective. 
  • not useful: Homework problems and exams are reliant on identifying and applying one or more Physics expressions and then numerically compute the result for a given scenario. Homework problems rarely demonstrate connectivity among Physics expressions. 
  • not useful: identities that arise from definitions. Example: Cauchy Schwartz
  • marginally useful: Derivation from definitions 
  • marginally useful: Derivation from experiments 
  • potentially useful: Derivation from another expression 

Simple harmonic oscillator - identify every domain it is used in