For documenting the code base:
https://github.com/sphinx-doc/sphinx
code formatting
https://github.com/psf/black
Autodetect basic issues
https://github.com/PyCQA/prospector
autopep8
https://github.com/hhatto/autopep8
Refactoring
https://github.com/PyCQA/redbaron
Tuesday, February 25, 2020
Monday, February 24, 2020
ASTs for Integrals
I've understood ASTs for simple expressions that only involve binary operators. I don't understand how ASTs deal with operators that have more than two arguments.
https://reference.wolfram.com/language/ref/TreeForm.html
https://demonstrations.wolfram.com/ExpressionTreesForIntegrals/
https://reference.wolfram.com/language/ref/TreeForm.html
https://demonstrations.wolfram.com/ExpressionTreesForIntegrals/
Sunday, February 23, 2020
Integration path for contributions
So far I've been hesitant on collaborations involving software in the Physics Derivation Graph. I didn't have a good path for integration of contributions, especially of complex features. I think I can provide both more details explanation of what would be helpful, as well as a clear integration path.
For example, in this post I provided a set of valid and invalid and ambiguous Latex examples. I did not provide details on how I would integrate a suggested solution written by a contributor.
Here are three specific aspects I would need for integration of contributed code:
As an example from the above blog post, I can express the interface as a Python3 function
def is_expression_valid_latex(expr_latex: str) -> bool:
"""
>>> is_expression_valid_latex("a = b")
True
>>> is_expression_valid_latex("a = b +")
True
>>> is_expression_valid_latex("\si a")
False
"""
By using sys.stdin, we could expose that function to the container such that the following would be an acceptance test:
docker run -it --rm demo:latest python3 /opt/my_script.py "a = b"
True
For example, in this post I provided a set of valid and invalid and ambiguous Latex examples. I did not provide details on how I would integrate a suggested solution written by a contributor.
Here are three specific aspects I would need for integration of contributed code:
- I will write doctests in Python. That way I can express the function as it would be integrated in the PDG project code
- The contributed Python script should run inside a Docker image. That way the dependencies are made explicit
- The "docker build" can assume to have Internet access, but the "docker run" process should assume no Internet connection
As an example from the above blog post, I can express the interface as a Python3 function
def is_expression_valid_latex(expr_latex: str) -> bool:
"""
>>> is_expression_valid_latex("a = b")
True
>>> is_expression_valid_latex("a = b +")
True
>>> is_expression_valid_latex("\si a")
False
"""
By using sys.stdin, we could expose that function to the container such that the following would be an acceptance test:
docker run -it --rm demo:latest python3 /opt/my_script.py "a = b"
True
Monday, February 17, 2020
similar projects
Map of Mathematics by topic
https://www.quantamagazine.org/the-map-of-mathematics-20200213/
https://news.ycombinator.com/item?id=22328516
Sunday, February 9, 2020
todo list for February 2020 (completed!)
Current status: I have an interactive web interface using Docker and Flask that I'm reasonably happy with. In this post I outline tasks that need to be done prior to wider exposure.
Functionality
functionality
visualize trace of flow
convert trace of flow to Selenium script
generate PDG website
host on DigitalOcean droplet
account management
Previous task list:
https://physicsderivationgraph.blogspot.com/2018/07/snapshot-of-milestones-for-physics.html
see also
https://physicsderivationgraph.blogspot.com/2017/06/not-getting-caught-in-details.html
Functionality
- list all
operators- in which derivation is each used?
- popularity: how many references are there to this operator?
symbols- in which derivation is each used?
- popularity: how many references are there to this symbol?
- derivations
- popularity: include stats -- number of steps, number of inf rules, number of expressions
expressions- popularity: list which derivations use which expressions
inference rules- include number of inputs, outputs
- popularity: which derivations use each inference rule?
- show a complete derivation
- edit
- an inference rule
- how to address all the places that inference rule gets used?
- a derivation
- edit a step
- how to address dangling steps?
- an expression
- where else is that expression used?
functionality
- Latex to AST
- suggest related expressions
- Web interface
- download pkl file
- upload pkl file
- export derivation PNG
- export derivation to PDF
- CAS integration
- validate a single step of a derivation
- use d3.js instead of graphviz
visualize trace of flow
convert trace of flow to Selenium script
generate PDG website
host on DigitalOcean droplet
account management
Previous task list:
https://physicsderivationgraph.blogspot.com/2018/07/snapshot-of-milestones-for-physics.html
see also
https://physicsderivationgraph.blogspot.com/2017/06/not-getting-caught-in-details.html
type hinting and linting in the Docker image
See also
https://physicsderivationgraph.blogspot.com/2018/08/cleaning-up-code-using-pylint-and.html
Usually I start my Docker container using
$ python create_tmp_pkl.py ; docker build -t flask_ub .; docker run -it --rm --publish 5000:5000 flask_ub
However, if I need the command line to run mypy or flake8, I'll start a shell using
$ python create_tmp_pkl.py ; docker build -t flask_ub .; docker run -it --rm --entrypoint='' --publish 5000:5000 flask_ub /bin/bash
Then, in the container, I can run commands like
$ mypy compute.py
Success: no issues found in 1 source file
$ mypy --ignore-missing-imports controller.py
Success: no issues found in 1 source file
see https://mypy.readthedocs.io/en/latest/running_mypy.html#ignore-missing-imports
and linting with
$ flake8 compute.py
compute.py:4:80: E501 line too long (89 > 79 characters)
and check doctest using
$ python3 -m doctest -v compute.py
Code complexity measurement:
$ python3 -m mccabe compute.py
https://physicsderivationgraph.blogspot.com/2018/08/cleaning-up-code-using-pylint-and.html
Usually I start my Docker container using
$ python create_tmp_pkl.py ; docker build -t flask_ub .; docker run -it --rm --publish 5000:5000 flask_ub
However, if I need the command line to run mypy or flake8, I'll start a shell using
$ python create_tmp_pkl.py ; docker build -t flask_ub .; docker run -it --rm --entrypoint='' --publish 5000:5000 flask_ub /bin/bash
Then, in the container, I can run commands like
$ mypy compute.py
Success: no issues found in 1 source file
$ mypy --ignore-missing-imports controller.py
Success: no issues found in 1 source file
see https://mypy.readthedocs.io/en/latest/running_mypy.html#ignore-missing-imports
and linting with
$ flake8 compute.py
compute.py:4:80: E501 line too long (89 > 79 characters)
and check doctest using
$ python3 -m doctest -v compute.py
Code complexity measurement:
$ python3 -m mccabe compute.py
Monday, February 3, 2020
example derivation steps for a CAS or theorem prover to validate
in order of increasing complexity, here are a set of derivation steps for a CAS or theorem prover to validate
start with "a = b"
add "2" to both sides
end with "a + 2 = b + 2"
start with "\sin x = f(x)"
multiply both sides by "2"
end with "2 \sin x = 2 f(x)"
start with "\sin x = f(x)"
substitute "2 y" for "x"
end with "\sin (2 y) = f(2 y)"
start with "a = b"
add "2" to both sides
end with "a + 2 = b + 2"
start with "\sin x = f(x)"
multiply both sides by "2"
end with "2 \sin x = 2 f(x)"
start with "\sin x = f(x)"
substitute "2 y" for "x"
end with "\sin (2 y) = f(2 y)"
example Latex expressions to parse
valid math latex in order of increasing complexity
a = b\sin x
\sin x \in f
f \in g
invalid math latex in order of increasing complexity
a = b + operator with no input\sin x \left( unpaired "("
\sin x \sum operator with no input
valid ambiguous latex in order of increasing complexity
1/2\pi = (1/2) \pi OR 1/(2 \pi); source: https://www.ntg.nl/maps/26/16.pdf\sin x / y = (\sin x)/y OR \sin (x/y); source: https://www.ntg.nl/maps/26/16.pdf
\sin x + 2 = (\sin x) + 2 OR \sin (x + 2)
https://math.stackexchange.com/a/1025217
https://math.stackexchange.com/a/1026483
valid ambiguous latex in a step in which the ambiguity can be resolved
input expression: \sin x / y = ginf rule: multiply both sides by y
output expression: \sin x = g y
Here the input expression is ambiguous -- it isn't clear whether "\sin x / y" = (\sin x)/y OR \sin (x/y)
The output expression implies that (\sin x)/y was the user's intention.
input expression: \sin x + 2 = g
inf rule: subtract "2" from both sides
output expression: \sin x = g - 2
Here the input expression is ambiguous -- it isn't clear whether "\sin x + 2" = (\sin x) + 2 OR \sin (x + 2)
The output expression implies that (\sin x) + 2 was the user's intention.
valid ambiguous latex in a step in which the ambiguity cannot be resolved
a = bfrom Latex to Abstract Syntax Tree
In the latest revision to the Physics Derivation Graph, the tuple (unique expression identifier, latex expression) has been replaced with (unique expression identifier, latex expression, abstract syntax tree). This is similar to the split between "presentation MathML" and "content MathML." This distinction requires a translation between a (visually pleasing and easy to input representation) and (a mathematically meaningful representation).
Latex will be input by the user for the PDG; the user will not need to supply the AST as input. To validate a step, the AST is needed. This presents a few challenges:
Latex will be input by the user for the PDG; the user will not need to supply the AST as input. To validate a step, the AST is needed. This presents a few challenges:
- Is the input valid tex?
- Is the valid tex a mathematical expression?
- Is the valid mathematical expression consistent with the step?
A step in a derivation is defined as the application of a single inference rule with one or more expressions as input, feed, or output.
There are a few options for parsing mathematical tex:
- write a custom parser
- use an existing parser, e.g. MathJax
Sunday, February 2, 2020
significant changes to the Physics Derivation Graph
This weekend I initiated a significant rewrite of the Physics Derivation Graph.
- I revised the data structures, the level of details present in the data structure, and how the data structure is accessed.
- I also better understand the model-view-controller paradigm; this led to a better workflow.
- I improved the logging used in the Python code.
Improved Data Structures
I've investigated many different file formats (XML, CSV, plain text, SQL), each of which impose different constraints on the data structure, as well as imposing a translation between the file format and the representation internal to Python. I recently arrived at the insight that I could avoid both file format choices and the associated translation work by using Python's serialization -- the pickles module.
In addition to eliminating work associated with translation, it freed my cognitive focus. This second aspect was vital as it led to improved mental agility in analyzing other options. Once I didn't have to worry about choosing the best file format, I could identify what work would lead to rapid progress.
In addition to eliminating work associated with translation, it freed my cognitive focus. This second aspect was vital as it led to improved mental agility in analyzing other options. Once I didn't have to worry about choosing the best file format, I could identify what work would lead to rapid progress.
The first big change was having a single data structure (the dictionary "dat") which had all the other data structures (expressions, inference rules, derivations) as keys. Each of those was initially a list of dictionaries, but this proved to be cumbersome in implementing data access. I realized I could leverage the unique identifiers present in the Physics Derivation Graph as keys. That lead to a dictionary (top level "dat") of dictionaries (expressions, inference rules, derivations) of dictionaries (each expression, each inference rule, each derivation, respectively). While this may sound messy, accessing specific elements of the PDG is now much easier.
Motivated by a conversation about how the PDG will integrate with a Computer algebra system, I decided to include a few additional keys in the top level data structure.
Enabling validation of steps requires supporting a computer algebra system (CAS). To enable an arbitrary choice of CAS, I need to support abstract syntax trees (ASTs). To enable an AST, I need to define symbols and operators. To enable symbols, I need units and measures. To summarize, I now track the following:
- derivations
- expressions
- latex
- AST
- inference rules
- symbols
- operators
- units
- measures
Improved understanding of the model-view-controller paradigm
Previously I had web form actions that led to a follow-on page. While technically possible, this turned out to be a bad decision. The problems are in tracking state (which variables get passed between pages) and poor visibility on the state changes. I updated the web forms to pass their action back to the "controller.py" which maintains both the variable passing and flow control (which page calls another page).
By adhering to the model-view-controller paradigm, troubleshooting and implementation were made much easier. This ease resulted in faster implementation of ideas.
Improved logging in Python
I use print statements throughout my code to help in troubleshooting. There are different categories of print statements: trace, debug, error. These are now present in (almost) every print statement. I've also included the name of the file (either "compute" or "controller") in print statements, as well as the function the print statement is in. These changes help track the state of the application.
Subscribe to:
Posts (Atom)