- survey all topics in Physics. [See this 2017-07 post.]
- investigate refactoring the database from JSON to RDF. [See this 2020-12 post.]
- goal: add a new derivation to the Physics Derivation Graph to identify bugs and inefficiencies in the workflow
- instead of entering a new derivation, new goal is to create tutorials for an existing derivation. [See this 2021-08 post.]
- before creating tutorials, write a "gold standard" for one derivation (then create tutorials). [See this 2021-07 post and this page.]
- instead of fixating on a "gold standard," refocus on the original intent of the project: connecting expressions from the breadth of Physics. [See this 2021-08 post and this page.]
Saturday, August 21, 2021
documenting my frequent pivots
Friday, August 20, 2021
refocusing the motivation for the Physics Derivation Graph
The following aspects are "nice-to-have" in the Physics Derivation Graph:
- validation of derivation steps using computer algebra systems
- validation of consistency expression dimensions
- validation of consistency of units when present in expressions
- visualization of the graph
- make the code pretty and easy to navigate for contributors
- document the source code and design decisions
- searching the graph (see this and this and this)
- ability to determine whether paper is written by a crackpot
- ability to detect unintentional errors in an article or book
- cross-document references
- semantic tagging of Latex documents
- converting Latex from arxiv into formats supporting the above objectives
- linking Physics Derivation Graph to existing ontology databases
- identify symbol re-use
- identify inter-related derivations at the level of expressions
- identify every named expression in Physics -- see https://derivationmap.net/central_expessions
- identify every variable in that list of named expressions
- enter all named expressions and variables into the PDG database
- determine which expressions are not connected to anything
searching the Physics Derivation Graph database
The Physics Derivation Graph database is just a JSON file, so searching a plain text file is easy. However, the value of search is in the relations, not just finding a match.
There are multiple aspects of the Physics Derivation Graph database to search: symbols, units, dimensions, inference rules, symbol names, operators.
A workflow for search is
- user searches web interface for "x"
- the search interface returns multiple rows of a table; something like
symbol ID symbol name dimension units 948294 x_m asdf L:0,T:1,M:0 113942 y_x mimi L:-1,T:0,M:0 Kg 501901 XM minX L:0,T:2,M:1 seconds^2 - if the user selects one of the symbols, then
- the search interface shows which expressions contain that symbol
- if the user selects one of the expressions, then
- the search interface shows which derivations contain that expression
Sunday, August 8, 2021
troubleshooting why docker-compose does not successfully launch locally
Normally I make changes to the repo https://github.com/allofphysicsgraph/proofofconcept/tree/gh-pages/v7_pickle_web_interface and then run git pull
on my DigitalOcean VPS Droplet. It has been a long time since I tried running the web server locally on my laptop.
Here's the process I went through to get the web server running locally.
First I had populated the "certs" directory
/Users/username/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface/certson my laptop from the remote VPS.
Then I had to create
/Users/username/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface/.envwith the Google variables for login authentication.
In the directory on my laptop
/Users/username/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interfacerunning the command
docker-compose up --build
failed. Specifically, the nginx and flask containers would start, but nginx would fail because the flask container wasn't responding. I added
restart: on-failureto the nginx section in docker-compose.yaml
That didn't solve the flask issue, but it allowed the containers to persist while I inspected the logs. Even though I couldn't enter the flask container, I could review the logs produced by gunicorn by running the command
tail -f flask/logs/gunicorn_error.logwhich showed the causal issue
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "/home/appuser/app/wsgi.py", line 15, in <module> from controller import app File "/home/appuser/app/controller.py", line 63, in <module> from secure import SecureHeaders # type: ignore ImportError: cannot import name 'SecureHeaders' [2021-08-08 19:56:37 +0000] [11] [INFO] Worker exiting (pid: 11) [2021-08-08 19:56:37 +0000] [1] [WARNING] Worker with pid 11 was terminated due to signal 15 [2021-08-08 19:56:37 +0000] [1] [INFO] Shutting down: Master [2021-08-08 19:56:37 +0000] [1] [INFO] Reason: Worker failed to boot.Root case: In my requirements.txt I hadn't pinned the version of the Python library
secure
. According to this issue there was a recent update . I ended up pinning secure==0.2.1 in requirements.txtNow I am able to run docker-compose up
and get a web page at https://localhost/
domains in Physics that are difficult for the Physics Derivation Graph
Geometry-based derivations
- force diagram or free-body diagram
- Centripetal forces, e.g., https://physics.info/centripetal/
- classical optics
- orbital mechanics
- Feynman diagrams
Experiments
consolidating content from Physics Derivation Graph websites
This post outlines the current resources associated with the project, and serves as a reminder to the author for what actions need to be taken.
https://derivationmap.net/ -- primary website with content
https://github.com/allofphysicsgraph/proofofconcept -- website for source code
https://physicsderivationgraph.blogspot.com/ -- blog containing ideas and design decisions
https://allofphysicsgraph.github.io/ redirects to https://derivationmap.net/
TODO
https://github.com/allofphysicsgraph/proofofconcept/wiki -- unorganized wiki; should be consolidated with https://derivationmap.net/
- Specifically, https://github.com/allofphysicsgraph/proofofconcept/wiki/similar-projects needs to be merged into https://derivationmap.net/other_projects
https://sites.google.com/site/physicsderivationgraph/home -- old website, not maintained; should be consolidated with this blog.
comparison of Latex versus knowledge management efforts
In this post I assess why Latex is widely use and knowledge management efforts are not.
I'm using the spectrum documented in this previous post.
Latex for typesetting scientific documents
Latex was free, open source, well designed, solved a specific problem that had been unaddressed, the problem was felt by the content creators, is comprehensive, written by a single author, and the author was famous.
Now Latex has a user community, developers, libraries of software, reference books, support on multiple operating systems as well as the web.
Anything seeking to augment or displace Latex will need a value differentiation that is felt by content creators.
Semantic enrichment
- Start with Latex and manually annotate
- Start with Latex; use manually annotated corpus to train supervised machine learning model
- Start with a custom domain specific language
Manual annotation or supervised machine learning may be incomplete if there are missing steps or assumptions.
"Better search" isn't a problem felt by content creators. This need is somewhat addressed by citations.
I'm not aware of any models of success for semantic enrichment.
Controlled Natural Languages
Formal Verification
Steps to convert a published scientific paper into the Physics Derivation Graph
This post documents the steps taken to get the derivations in the paper Speed of sound from fundamental physical constants into the Physics Derivation Graph.
1) Read the paper. I need to understand the point of the paper and where content is in the text.
2) Identify the primary result of the derivation. Which equation is the output? Is there more than one? In this paper, equation 1 is the primary result.
Tangent: I tested whether the most popular equation is the most important.
This is measurable using the command
cat sound1.tex | tr " " "\n" |\ # replace spaces with newlines -- word tokenization grep "ref{" |\ # find internal references to labeled expressions sed 's/.*\\ref{//' | sed 's/}.*//' |\ # remove the irrelevant latex "\ref{}" to isolate the expression label ID sort | uniq -c | sort -n # rank the popularity of expression labels 2 v001 3 abinitio 3 all 3 ratio 3 v0 3 v01 3 v1 5 elemental 6 bohr 6 rydberg 11 a 15 v00 15 v3Decoding those latex labels to equations in the PDF, "v3" is equation 9, which is tied for most popular with "v00" -- equation 4. The second most popular label "a" is equation 10. The equation I'm claiming as the primary result, equation 1, is labeled as "v0" and is referenced 3 times.
3) What sections of the paper contain the derivation?
There are actually two derivations of the same result.
- one derivation is on page 2 column 1
- another derivation is on page 2 column 2
4) Determine whether the derivation already exist in the Physics Derivation Graph database. Check
- https://derivationmap.net/list_all_expressions
- https://derivationmap.net/list_all_symbols
- https://derivationmap.net/list_all_derivations
5) On paper, write out equations from the paper. Here I'm limiting the scope to one derivation on page 2 column 1. Fill in additional equations that are missing from the text. Use arrows to denote flow of derivation.
Often I'll make math mistakes or go down paths that turn out to be irrelevant. If I were a typical researcher, this is where the reconstruction process would stop.
6) On a separate paper, rewrite the graph and label the edges in the directed graph to describe the operation being carried out.
7) On a separate paper, rewrite the content from the previous step to refine the "operation being carried out" to a directed graph with inference rules available in the Physics Derivation Graph.
8) Identify all symbols used in the derivation: \rho, V, x, a, m, \alpha, M, K, \hbar, c, \pi, \epsilon_0
9) Identify which symbols in the derivation already exist in the Physics Derivation Graph and which need to be added. Record the symbol ID of each for later use.
10) Add missing symbols into the the Physics Derivation Graph database using https://derivationmap.net/list_all_symbols#add%20symbol
11) Identify which inference rules the derivation uses.
12) Identify which inference rules for this derivation exist in the Physics Derivation Graph and which need to be added.
13) If novel inference rules are needed for this derivation, enter those in https://derivationmap.net/list_all_inference_rules#add_inf_rule
14) Using https://derivationmap.net/start_new_derivation/, enter the expressions and inference rules for the steps of the derivation. For each step,
- ensure the symbols are referenced correctly in each expression
- ensure the SymPy version of the expression is correct
The output of this step is https://derivationmap.net/review_derivation/608598/
15) Offline, in the JSON file containing the derivation, edit the symbol IDs and SymPy representations as needed to get the step validations to be correct.
Review the modifications on https://derivationmap.net/review_derivation/608598/
TODO: The extensive linking of variables to definitions is not made accessible in the output, so there is no way for the user to leverage the linked information.
- expressions in the PDF/HTML output contain variables which have no link to their definitions
- variables in the text of the PDF/HTML output are not linked to the expressions or their definitions
Saturday, August 7, 2021
spectrum of formalized scientific documentation
This post is an expansion of July 2020 post which was informed by Kaliszyk's survey.
Spectrum of formality:
category | verbal natural language | verbal presentation with aids | written text | semantic enrichment | human-oriented controlled natural languages | machine-oriented controlled natural languages | computer algebra systems | proof assistants, theorem provers |
media | human speaking | human speaking; Latex containing text and equations -- Beamer | Latex containing text and equations and hyperlinks | Latex containing text and equations and hyperlinks; sTex |
AIDA, ACE | text: RDF, OWL mathematical physics: Physics Derivation Graph math: ForTheL, Mizar |
SymPy | Coq, Lean, Isabelle |
purpose | social knowledge transfer, one-to-one | social knowledge transfer, one-to-many | human readers; text search | human readers; semantic search | consistency validation of logical claims | inference of novel knowledge | correctness of inference rules; dimensional consistency; unit consistency | mathematical correctness (consistency wrt a logical basis) |
grouping | standard practice | knowledge representation | mathematical correctness | |||||
techniques | manual annotation; supervised machine learning | needs vocabulary and grammar: |
Left side of the spectrum is less rigorous, right side is more rigorous.
Left side of the spectrum has higher de Bruijn factor, right side has lower de Bruijn factor.
The Physics Derivation Graph is a machine-oriented controlled natural language that is integrated with a computer algebra system.
Semantic tagging can relate definitions for terms within a document to the associated variables used in that document.
The motive for the semantic tagging is to enhance search-ability. Since search-ability doesn't benefit the author, the author is not motivated to do the work of tagging.
Semantic tagging can identify common words across documents in a corpus. OWL is interesting conceptually but since the author doesn't benefit, the author is not motivated to do the work of linking.