Saturday, August 29, 2020

visualization of step validation; 271 steps to address

I created a page listing all the steps from every derivation. The utility is that I can summarize how many steps are valid (180 out of 638, or 28%), how many are declarations and assumptions (179+8, or 29%), and how many fail or are not checked -- the remaining 40%.


My plan is to work on addressing the
638-(180+179+8)=271 
failed or unchecked expressions.

I expect SymPy doesn't support all the inference rules, so I need a way to categorize that issue.

Thursday, August 27, 2020

relation of inference rules to axioms and proofs

Axioms are a set of statements upon which other statements can be formed.

Lemmas are easy but irrelevant.
Corollaries are quick but irrelevant consequences
Propositions are interesting
Theorems are important and difficult

https://www.quora.com/What-is-the-difference-between-a-lemma-theorem-corollary-and-proposition
http://blogs.scienceforums.net/ajb/2013/01/12/lemma-theorem-proposition-or-corollary/
https://math.stackexchange.com/questions/463362/whats-the-difference-between-theorem-lemma-and-corollary

A proof is about consistency of statements with respect to a set of axioms.

****************

To show the connection can be made, pick the simplest PDG inference rules.
* "add X to both sides"
* "multiply both sides by"

Wednesday, August 26, 2020

linear representation of a directed graph

A derivation typically does not show all steps. The steps of a derivation are typically not linear.

 
Figure 1: left side is a directed acyclic graph where some nodes are hidden and each node can be linearly ordered. In the center all nodes have been shown in a linear presentation. On the right, all nodes in a non-linear display.

Sunday, August 23, 2020

significant challenges feel like emotional barriers to progress

The current form of the Physics Derivation Graph has served to validate my claim of feasibility. There are a number of significant challenges that inhibit scalability, regardless of whether I generate more content or other people contribute.

  • Input complexity: the current multi-step webform process is tedious and burdensome. There are many steps, resulting in a burdensome workload for both the backend developer (many features are necessary) and the front-end user (using all those features). Alternative interaction mechanisms are technically feasible but not in my current skillset.
  • Display complexity: the graph with hundreds of nodes exceeds the ability to visually navigate given the current d3.js and Graphviz interfaces. Rendering the graph in 3D might help, but the readability of node labels is important. 
  • Limited ability to query: the graph is presented visually but lacks support for responding to user queries. Currently writing custom analysis scripts that read the JSON is the only access.
  • Use correct symbol IDs: the current JSON database is populated with incorrect symbol references and a mixture of SymPy+text symbols. 
  • Validation of steps: the current JSON database is populated with incorrect steps. Inference rules are used but either are not feasible in SymPy or are implemented incorrectly. 
  • Validation of dimension: after correcting the symbols and the steps, each expression needs to be verified to have the correct dimension. 
The last three are listed in dependency order. The symbols need to be fixed, then validation of steps and dimension are possible.

I don't currently have attacks planned on either display complexity or query capability. 


My current task list:
  • Correct the symbol IDs in data.json
  • Change layout of expression input table to distinguish input, output, and feed relation to inference rule
  • add capability to edit SymPy from web interface
  • functionally, separate symbol replacement from expression-as-sympy. For the web interface, insert new steps: Latex math expression -> symbols -> SymPy expression
  • web interface for reviewing correctness of Latex -> SymPy -> Latex for expressions

Sunday, August 16, 2020

how to edit the SymPy Latex parser and rebuild the antlr artifacts for a pull request

https://github.com/sympy/sympy/wiki/Development-workflow#fork-sympy-project

Go to https://github.com/sympy/sympy
Fork to https://github.com/bhpayne/sympy/
In https://github.com/bhpayne/sympy/ create a new branch, e.g. "floor-patch"
In the bhpayne/sympy:floor-patch branch, change three files


Then, in a local directory run
mkdir build_sympy
cd build_sympy
git clone https://github.com/bhpayne/sympy/

I use a Docker container to build SymPy
cat <<EOF >> Dockerfile
FROM phusion/baseimage:0.11

RUN apt-get update && \
    apt-get install -y \
               vim \
               python3 python3-pip python3-dev \
               wget \
               default-jre \
    && rm -rf /var/lib/apt/lists/*
WORKDIR /usr/local/lib
RUN curl -O https://www.antlr.org/download/antlr-4.7.2-complete.jar
COPY sympy/ /opt/
RUN echo "alias python=python3" > /root/.bashrc
RUN ln -s /usr/bin/python3.6 /usr/bin/python
# import the pip package for integration of grammar with Python    
RUN pip3 install antlr4-python3-runtime mpmath
# build antlr grammar
WORKDIR /opt/sympy/parsing/latex
ENV CLASSPATH=".:/usr/local/lib/antlr-4.7.2-complete.jar:$CLASSPATH"
RUN java -jar /usr/local/lib/antlr-4.7.2-complete.jar LaTeX.g4 -no-visitor -no-listener -o _antlr
# from msgoff
COPY rename.py /opt/sympy/parsing/latex
RUN python3 rename.py
# set up Sympy
WORKDIR /opt/
RUN python3 setup.py install
EOF

A second file, created by msgoff, is used for the Antlr build process
cat <<EOF >> rename.py
import glob
import os
output_dir = "_antlr"
for path in glob.glob(os.path.join(output_dir, "LaTeX*.*")) + glob.glob(
    os.path.join(output_dir, "latex*.*")):
    offset = 0
    new_path = os.path.join(output_dir, os.path.basename(path).lower())
    with open(path, "r") as f:
        lines = [line.rstrip() + "\n" for line in f.readlines()]
    os.unlink(path)
    with open(new_path, "w") as out_file:
        if path.endswith(".py"):
            offset = 2
            out_file.write(header)
        out_file.writelines(lines[offset:])
EOF

Inside the container,
cd /scratch/sympy/sympy/parsing/latex
java -jar /usr/local/lib/antlr-4.7.2-complete.jar LaTeX.g4 -no-visitor -no-listener -o _antlr
python rename.py

Now rebuild sympy
cd /scratch/sympy/
python setup.py install

leave the container
exit

On the host, add the build artifacts for Antlr
cd sympy/
git status
git add sympy/parsing/latex/_antlr/latexlexer.py sympy/parsing/latex/_antlr/latexparser.py



Testing

https://github.com/sympy/sympy/wiki/Running-tests
>>> import sympy
>>> sympy.test()
takes 2 hours on my MacBook Air

The relevant test is
>>> sympy.test("sympy/parsing/tests/test_latex.py")

transition from "validation of concept" to "usable by other people"

My initial intent with the Physics Derivation Graph was to validate the concept of using a graph as a data structure for mathematical Physics. Creating the derivationmap.net website was an important milestone -- I was proud of the content and the presentation. The previous website (https://allofphysicsgraph.github.io/proofofconcept/) felt like merely a step beyond having a public code repo since the content was limited to displaying graphs.

Now I can direct people to the derivationmap.net website and not feel embarrassed by the limitations imposed by the host. The limitations of derivationmap.net are due to my creativity and technical skill.

The consequence of feedback from interested people has been increased awareness of the roughness of the code. The code (HTML, Python, JSON) is not something I would find accessible if I were reviewing the project.

Actions I can take to improve the accessibility of the code:
  • better docstrings
  • use doctests
  • document the workflow

Saturday, August 15, 2020

plan of record for parsing Latex expressions

I'm assuming there's an interactive feedback loop with the user in the Physics Derivation Graph, whereas that's not the case for bulk content like arXiv. How to respond to ambiguity depends on whether we can assume the user is available for clarifications.


Given an input string to parse,
  1. Is the string valid Latex? If yes, continue; if no, return to user with complaint
  2. Is the string valid mathematical Latex? If yes, continue; if no, return to user with complaint
  3. Can the mathematical Latex be parsed without ambiguity? If yes, return SymPy to user; if no, continue
  4. If there is ambiguity, can the ambiguity be resolved by used a different flavor of the grammar? If no, return the options to the user so they can select the right parsing.

Removing markup specific to display may be relevant. For example, replacing "\ " with " " and replacing "\quad" with " " and replacing "\qquad" with " " and replacing "\left(" with "(" would reduce the parser workload.

Example of invalid math Latex:
\frac a b

The user probably intended 
\frac{a}{b}

Wednesday, August 12, 2020

disable UFW logging to /var/log/syslog

https://serverfault.com/questions/817565/remove-ufw-block-from-kern-log-and-sys-log
https://askubuntu.com/questions/452125/redirect-ufw-logs-to-own-file

Latex math expressions that case Sympy's Latex parser to fail

$ git clone https://github.com/allofphysicsgraph/proofofconcept.git
$ cd proofofconcept/v7_pickle_web_interface/flask
$ make dockerlive
$ python
>>> import sympy
>>> sympy.__version__
'1.5.1'
>>> from sympy.parsing.latex import parse_latex
>>> import json
>>> with open('data.json') as json_file:
...     dat = json.load(json_file)

>>> for expr_id, expr_dict in dat['expressions'].items():
...     print(expr_dict['latex'])

>>> for expr_id, expr_dict in dat['expressions'].items():
    try:
        x = parse_latex(expr_dict['latex'])
    except Exception as er:
        print('expr ID =', expr_id)
        print(er)

Using that approach, I found the following problems in the current (valid) Latex expressions used in the Physics Derivation Graph.


Subscripts with spaces

expr ID = 8871333437
I don't understand this
PE_{\rm Earth\ surface}
~~~~~~~~~~~~~^

expr ID = 7053449926
I don't understand this
r_{\rm geostationary\ orbit}
~~~~~~~~~~~~~~~~~~~~^


Use of "\left."

expr ID = 0439492440
I don't understand this
\frac{1}{a^2} = \frac{1}{2}W - \frac{1}{2}\left. \frac{W}{2n\pi}\sin\left(\frac{2n\pi}{W} x\right) \right|_0^W
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

Spaces in subscript

expr ID = 7575859306
I don't understand this
\left( \delta^{l}_{\ \ j} \delta^{m}_{\ \ k} - \delta^{l}_{\ \ k} \delta^{m}_{\ \ h} \right) \hat{x}_i \nabla_j \nabla^m E^n = \vec{ \nabla}( \vec{ \nabla} \cdot \vec{E} - \nabla^2 \vec{E})
~~~~~~~~~~~~~~~~~~~^

expr ID = 7575859308
I don't understand this
\left( \delta^{l}_{\ \ j} \delta^{m}_{\ \ k} \hat{x}_i \nabla_j \nabla^m E^n\right)-\left( \delta^{l}_{\ \ k} \delta^{m}_{\ \ h} \hat{x}_i \nabla_j \nabla^m E^n \right)  = \vec{ \nabla}( \vec{ \nabla} \cdot \vec{E} - \nabla^2 \vec{E})
~~~~~~~~~~~~~~~~~~~^


Apostrophe

expr ID = 4662369843
I don't understand this
x' = \gamma (x - v t)
~^

expr ID = 2983053062
I don't understand this
x = \gamma (x' + v t')
~~~~~~~~~~~~~^

expr ID = 3426941928
I don't understand this
x = \gamma ( \gamma (x - v t) + v t' )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

Comma in subscript

expr ID = 9973952056
I expected something else here
-g t = v_y - v_{0, y}
~~~~~~~~~~~~~~~~~^


expr ID = 7391837535
I expected something else here
\cos(\theta) = \frac{v_{0, x}}{v_0}
~~~~~~~~~~~~~~~~~~~~~~~~~^

expr ID = 8949329361
I expected something else here
v_0 \sin(\theta) = v_{0, y}
~~~~~~~~~~~~~~~~~~~~~~~^

Spaces

expr ID = 3920616792
I don't understand this
T_{\rm geostationary orbit} = 24\ {\rm hours}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

Much greater than


expr ID = 9674924517
I don't understand this
K >> G
~~~^

Sunday, August 9, 2020

content categories and keyword linking - what's the relation to the PDG?

Suppose an article on ArXiv had full markup with PhysML and ScienceWise. While useful in aggregate for search, it's not clear how those would relate to the Physics Derivation Graph.

The ScienceWise concept of linking keywords to a database would need to be expanded to linking variables to a database. Then one paper using "x" for position and another paper using "d" for position could link to the same variable definition (independent of symbol used).

my first certificate expiration

This morning I was greeted with this warning from Chrome when visiting https://derivationmap.net

The error message indicated my certificates had expired.

I SSH'd into my DigitalOcean node and ran a scan of the certs that certbot can find

$ sudo certbot renew
----------------------
Processing /etc/letsencrypt/renewal/derivationmap.net.conf
----------------------
Cert not yet due for renewal
----------------------
The following certs are not due for renewal yet:
  /etc/letsencrypt/live/derivationmap.net/fullchain.pem expires on 2020-10-08 (skipped)
No renewals were attempted.

However, when I run a manual scan of the certs used by my site,

$ openssl x509 -dates -noout < /home/pdg/proofofconcept/v7_pickle_web_interface/certs/fullchain.pem
notBefore=May 11 15:26:19 2020 GMT
notAfter=Aug  9 15:26:19 2020 GMT

The corresponds with the command history entry from 2020-05-11,
sudo certbot certonly --webroot \
-w /home/pdg/proofofconcept/v7_pickle_web_interface/certs \
--server https://acme-v02.api.letsencrypt.org/directory \
-d derivationmap.net -d www.derivationmap.net

Solution

Delete existing certs
sudo rm -rf /etc/letsencrypt/{live,renewal,archive}/{derivationmap.net,derivationmap.net.conf}/

Request new certs

sudo certbot certonly --webroot \
-w /home/pdg/proofofconcept/v7_pickle_web_interface/certs \
--server https://acme-v02.api.letsencrypt.org/directory \
-d derivationmap.net -d www.derivationmap.net

Copy new certs to directory that nginx mounts in Docker-compose

cd /home/pdg/proofofconcept/v7_pickle_web_interface/certs
sudo cp /etc/letsencrypt/live/derivationmap.net/fullchain.pem .
sudo cp /etc/letsencrypt/live/derivationmap.net/privkey.pem .
sudo chown pdg:pdg privkey.pem
openssl dhparam -out dhparam.pem 2048

Restart Docker-compose

docker-compose up --build --force-recreate --remove-orphans --detach

If the docker containers are not restarted, the changes made to the file on the host won't take effect.

Verify in a browser that https://derivationmap.net/ has the updated certificate.

Set a calendar reminder to renew the certificate

Friday, August 7, 2020

a web-based GUI for drawing graphs with latex

Back in 2014 the EquationMap interface featured the ability enter latex into nodes and create graphs with edges.



Similar web interfaces for drawing graphs include
However, none of these support rendering Latex or Mathjax


My current approach of generating PNG images from Latex and then rendering the PNGs as a graph using d3.js seems to work sufficiently well. I could modify the interface such that the existing webform is on the same page as the rendered graph.

The user would select an inference rule, provide the expressions, and render the graph all in a single page.
What isn't clear in my mental model is how to connect edges to new steps. 

where to invest my efforts; prioritization depends on objective

Options on where to spend my attention on this project include

  • semantic decoration of text
  • trying to figure out how semantic decoration connects to the underlying inference rules
  • generating PDG Latex content using existing infrastructure
  • verifying existing PDG content by converting to SymPy
  • expanding the SymPy Latex grammar for better parsing and conversion
  • improving the front-end presentation
  • improving the data input interface
  • changing the backend database from JSON + SQL to something cleaner
I'm not completely clear on what my motives for the PDG are. 
  • I find the work interesting
  • I think other people might also be interested
  • I'm not clear what specific value the project provides (other than attempting to document the beauty of the math and Physics) 
I'm not sure whether I'm doing this for myself, for my legacy, or for other people.

My own lack of clarity translates into not being certain about where to invest my attention.
Options like more content, more integration with other aspects, better code all pull in different directions.
I think that if I better understood my own intentions I would have the ability to more easily prioritize.