Monday, December 30, 2019

Use of MathJax for expressions

Now that I've migrated my interface to Flask web page hosted in a Docker container, I could use MathJax to display LaTeX expressions.

http://localhost:5000/list_all_expressions could use MathJax

Automated web interface testing


https://robotframework.org/
 is better than Selenium because it produce reports

UML to Python Flask and WTForms

UML to Python Flask and WTForms rather than iterative form development

lesson learned for model-view-controller: form workflows

Model-view-controller (MVC) is a way to separate presentation from the backend computation and data transformation.

For my application, "model" = compute.py; "view" = a collection of webpages; "controller" = controller.py

MVC nuances with Flask and WTForms --

  • The business logic workflow is captured exclusively in controller.py
  • Upon form submission the workflow should return back to controller.py rather than linking to next page. The controller.py uses redirect(url_for()) to transition between pages.
  • Form data extraction occurs in the controller; manipulations to the form data are made in compute.py (the model). 

Using OCaml to identify equivalent expressions

OCaml: simplify expressions to basis for equivalence class
LaTeX --> OCaml AST, then compare against all other OCaml ASTs

Scope is for +, -, * for integers
Next steps would be expand to real, then complex

Use case: find other equivalent expressions

Use case: validation

This requires typed variables (eg int, real, constant, real, complex, matrix, vector

Rules:
  • addition rule of equality
  • multiplication rule of equality
  • substitution property of equality
  • associative property
  • commutative property
  • transitive property
  • symmetric property
  • additive inverse property

Wednesday, December 25, 2019

when to use a dropdown menu versus list of links in the web interface

For the web interface, there are multiple pages that have a list from which a user can select. One way to render the list would be a set of hyperlinks; another way would be a dropdown menu.

Suppose the user needs to select an inference rule for a step. They should only chose one, so a dropdown is the preferred method.
--> when the user should be restricted to one option, use a dropdown menu.

Suppose the user is presented with a list of derivations to view. They could chose one or more (to open in new tabs), so a list is the preferred method.
--> when the user can chose one or more options, use a dynamically generated list of hyperlinks.

Thursday, December 12, 2019

SQL vs CSV vs PKL for data storage

The CSV files are sufficient to build a proof of concept. Given that sufficiency, and my lack of familiarity with SQL, perhaps I shouldn't focus on using sqlite for the MVP.
(See https://physicsderivationgraph.blogspot.com/2019/12/mvp-for-pdg-with-sql.html)

CSVs feel hacky and don't enable enforcement of consistency checks. However, they are easier to view and decrease the dependencies.

The easiest solution might be using Pickle (PKL) files rather than converting from Python variables in-memory to a distinct off-line representation. A Pickle file is enables consolidation.

Wednesday, December 11, 2019

mathematical bases for inference rules


"add x to both sides" = addition property of equality
"multiply both sides by x" = multiplication property of equality
"divide both sides by x" = division property of equality


abstract syntax trees for expressions

https://tug.org/TUGboat/tb12-3-4/tb33arnon.pdf

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.5712&rep=rep1&type=pdf

https://pdfs.semanticscholar.org/7214/d4805660042521d4b825eb3324742b215072.pdf

http://mathlex.org/doc/how-mathlex-works

https://calculem.us/abstract-binding-trees-1/
https://semantic-domain.blogspot.com/2015/03/abstract-binding-trees.html
https://arxiv.org/abs/1601.06298

MVP for PDG with SQL

Currently I have a Docker container that runs flask on Ubuntu to present a web interface that uses forms to enter information.
sandbox/docker_images/flask_ubuntu/web
A Python script on the backend handles conversion from string LaTeX to PNG using dvipng, with graphviz generating the static graph PNG.

The other major component of the backend is an sqlite3 database that holds the data when the container is offline. I don't have experience with SQL, so I need a plan to get to the minimum viable product.

The purpose of the sqlite3 file is to store the multiple tables offline.  I could use a Python Pickle file, but that would be specific to Python; the sqlite approach seems more portable and generic.

The only actions I need are
  • write data structure from memory (in Python) to sqlite
  • read data structure from sqlite into Python
Summary:
SQL tables <--> Python data structures <--> graph structure <--> graph viz, website generation, UI web


On startup, read data into Python from sqlite.
After that, every time there is a change to the structure in Python, write to sqlite.
This approach is not elegant compared to "write only diff" or "write at end of session" but it eliminates any possibility of inconsistency.
This approach doesn't scale for large databases or multiple users, but those aren't problems I need to solve right now (I'm intentionally incurring technical debt).

If I'm using SQL to store data structures from Python, I'll need to enumerate the table schemas. See
https://allofphysicsgraph.github.io/proofofconcept/site/how_to_build_the_physics_derivation.html
which shows the tables
  • per derivation
    • edge list (expression local ID to inference local ID)
    • expression identifiers (expression ID to local ID)
    • feeds (latex to local ID)
    • inference rule identifiers (inference rule to local ID)
  • global expression latex to expression ID
  • global inference rule to latex, description, CAS representation

Reviewing the options described on
https://stackoverflow.com/questions/695752/how-to-design-a-product-table-for-many-kinds-of-product-where-each-product-has-m
I don't know which is applicable.

Motives for SQLite use:

  • enforce column consistency (each row has N columns)
  • enforce column types (e.g., string, integer)
  • enforce entry length (e.g., local ID must be an integer with M digits)

SQLite options

From the perspective of file management, having one file feels cleaner than a file per derivation. 

5 tables in 1 SQLite file

One option is to implement 5 table schemas:
  • expression latex to expression ID. Columns:
    • expression_latex (string)
    • expression ID (integer)
  • inference rule to latex, description, CAS representation
    • inference rule (string)
    • inference rule latex (string)
    • inference rule description (string)
    • CAS representation (string)
  • derivation edges. Columns:
    • derivation name (string)
    • from local ID (integer)
    • to local ID (integer)
  • derivation feeds. Columns:
    • derivation name (string)
    • latex (string)
    • local ID (integer)
  • derivation expressions. Columns:
    • derivation name (string)
    • expression ID (integer)
    • local ID (integer)
  • derivation inference rules. Columns:
    • derivation name (string)
    • inference rule (string)
    • local ID (integer)
I suspect this layout of tables is suboptimal -- having the "derivation name" repeating in a column is an indicator that the table count should be 2+3*D to eliminate duplication (rather than 5). This 2+3*D (where "D" is the number of derivations) design is also apparent in the "dict of derivations" structure described below. My motive for using 5 is that if I use 2+3*D, the table names are not static.

2+3*D tables in 1 SQLite file

Two tables are independent of derivations:
  • expression latex to expression ID. Columns:
    • expression_latex (string)
    • expression ID (integer)
  • inference rule to latex, description, CAS representation
    • inference rule (string)
    • inference rule latex (string)
    • inference rule description (string)
    • CAS representation (string)
And 3 tables are needed per derivation. Problem with this is that the name of the tables isn't known in advance.

2 tables in 1 SQLite file; 3 tables in D SQLite files

Same as previous option, except instead of a single SQLite file, the derivations are in separate files.

SQLite to Python

These tables in SQL are equivalently stored in Python as three data structures:

  • list of inference rules = [{'inf rule':'inf rule 1','in':1, 'out': 0},{'inf rule':'inf rule 2', 'in':2, 'out': 3}]
  • list of expressions = [{'expr 1':59285924, 'expr 2': 954849, 'expr 3': 948299}]
  • dict of derivations = {'derivation name 1':[<step1>, <step2>, <step3>]}
where each <step> has the structure 
{'inf rule': 'this inf rule', 
input: [{'expr local ID': 942, 'expr ID': 59285924}], 
output: [{'expr local ID': 218, 'expr ID': 954849}]}


Friday, August 9, 2019

updated task list for August 2019: SQL, arxiv

Since posting my previous task list in May I made good progress with Docker and a Flask-based web interface. The Flask web interface progressed far enough that I need to connect the front end with the SQL database backend. Listing out next steps,
  • The SQL database needs to be connected to the Flask-based web frontend
  • The SQL database needs to be populated with content from the CSVs (expressions and inference rules and derivations in version 6)

A colleague found that Latex source for arxiv articles is available in bulk in S3 buckets. As an alternative to S3 arxiv points to a subset that's available without going through AWS: https://www.cs.cornell.edu/projects/kddcup/datasets.html
The value of having a large number of expressions in Latex is that we could use the expressions to predict what a user wants to enter, decreasing the amount of manual entry required. Also, if a derivation contains similar expressions to what exists in the arxiv content, we could investigate whether the derivation is related to the arxiv paper.

Sunday, July 28, 2019

improving efficiency of manually entered content

One of the major barriers to growth for the Physics Derivation Graph will be manual entry of expressions in support of derivations. Reducing "minor" inconveniences and adding "minor" help features can improve the user experience, the consistency of content, and the throughput of content creation.

As an example, for a given step in a derivation the user will need to specify the inference rule. Manually typing the name is laborious and induces cognitive burden. An "autocomplete" feature reduces labor but still requires thinking. A dropdown menu of all possible inference rules could be used, but a reduction to a dropdown with only relevant inference rules is better.

Similarly for manually entering expressions, autocompletion is helpful. Being able to specify which expression that has been previously entered is better.

A drag-and-drop interface for connecting nodes in the graph would be ideal.

Monday, June 3, 2019

SQL schema

Tables as bullets, columns per table as sub-bullets: 
  • expressions
    • unique numeric ID
    • Latex
  • inference rules
    • unique string name
    • text expansion in Latex
    • number of input arguments
    • number of output arguments
For each "name of derivation",
  • edge list
    • source temp index
    • destination temp index
  • expression identifiers
    • unique numeric ID
    • temp index
  • inference rule identifiers
    • unique string name
    • temp index
  • feeds
    • temp index
    • Latex

Saturday, May 25, 2019

task list for SQL and Docker and Flask

Ordered tasks below. None of these tasks are extraordinary.
  • Convert database from CSV files to single .sqlite3 database. 
  • Rewrite interface to database to read/write .sqlite3 rather than from/to CSV

  • Create Docker image that supports Python Flask and Latex and Graphviz [done]
  • Put current scripts inside Docker image; mount external data; be able to edit content from within container
  • Run static content in Docker image and display images in web browser
  • Rewrite command-line editing interface to use Python Flask as web interface

Update 20190602: created a Docker container that supports Python Flask and Latex. I can generate a PNG from user entered strings.

Flask and Docker

I wanted to improve the portability of the Physics Derivation Graph, and I recognized the value of putting the code in a Docker image. As a Docker image, anyone would be able to get started with editing and contributing quickly rather than resolving software dependencies.

In addition to improving portability, I also recognize a command-line interface is not sufficient for most users. With the code in a Docker image, another useful change would be to run a webserver in the container. The web server could both display the current state and be the interface for making edits to the content.

I have heard of nginx but not used it. Another option is lighttpd; I haven't used that either. While either of these two options would be sufficient for running static HTML content or something interactive (eg PHP or cgi), my backend code is currently Python. Therefore, I think Flask is a reasonable choice for presenting a web interface and enabling edits to the database.

Update 20200512: good explanation of the relevance of Nginx: https://rushter.com/blog/gunicorn-and-low-and-slow-attacks/

Update 20190602:

cd to your local copy of
https://github.com/allofphysicsgraph/proofofconcept/tree/gh-pages/sandbox/docker_images/flask_ubuntu
docker build -t flask_ub:latest .
To run interactively and manually, use
docker run -it --publish 5000:5000 --mount type=bind,source="$(pwd)",target=/another --entrypoint /bin/bash flask_ub
and then run
python3 app/controller.py
inside the Docker container.  To run interactively automatically, use
docker run -it --publish 5000:5000 flask_ub
In either case, navigate your browser to http://localhost:5000 to use the interface.