Physics Derivation Graph: August 2021

Saturday, August 21, 2021

documenting my frequent pivots

survey all topics in Physics. [See this 2017-07 post.]
investigate refactoring the database from JSON to RDF. [See this 2020-12 post.]
goal: add a new derivation to the Physics Derivation Graph to identify bugs and inefficiencies in the workflow
instead of entering a new derivation, new goal is to create tutorials for an existing derivation. [See this 2021-08 post.]
before creating tutorials, write a "gold standard" for one derivation (then create tutorials). [See this 2021-07 post and this page.]
instead of fixating on a "gold standard," refocus on the original intent of the project: connecting expressions from the breadth of Physics. [See this 2021-08 post and this page.]

Friday, August 20, 2021

refocusing the motivation for the Physics Derivation Graph

The following aspects are "nice-to-have" in the Physics Derivation Graph:

validation of derivation steps using computer algebra systems
validation of consistency expression dimensions
validation of consistency of units when present in expressions
visualization of the graph
make the code pretty and easy to navigate for contributors
document the source code and design decisions
searching the graph (see this and this and this)
ability to determine whether paper is written by a crackpot
ability to detect unintentional errors in an article or book
cross-document references
semantic tagging of Latex documents
converting Latex from arxiv into formats supporting the above objectives
linking Physics Derivation Graph to existing ontology databases

The core objective of the Physics Derivation Graph is identifying the mathematical connectivity of the various domains of Physics. The same connectivity should be able to relate "basic" Physics (e.g., F=ma) to advanced Physics (e.g., the theory of the Standard model, string theory).

I know of two ways to document the connectivity:

identify symbol re-use
identify inter-related derivations at the level of expressions

The symbol re-use is less interesting than the re-use of expressions.

I've previously documented a plan,

https://physicsderivationgraph.blogspot.com/2017/07/finding-edges-for-physics-derivation.html

but I didn't specify what would qualify as sufficient to show completeness.

What core expressions are representative of all of Physics? Are there a set of expressions that, if shown to be connected by derivations, would be sufficient to demonstrate the span of the concept?

I expect that, for any given subdomain of Physics, there are central expressions. The task list is then

identify every named expression in Physics -- see https://derivationmap.net/central_expessions
identify every variable in that list of named expressions
enter all named expressions and variables into the PDG database
determine which expressions are not connected to anything

searching the Physics Derivation Graph database

The Physics Derivation Graph database is just a JSON file, so searching a plain text file is easy. However, the value of search is in the relations, not just finding a match.

There are multiple aspects of the Physics Derivation Graph database to search: symbols, units, dimensions, inference rules, symbol names, operators.

A workflow for search is

user searches web interface for "x"

the search interface returns multiple rows of a table; something like

symbol ID	symbol	name	dimension	units
948294	x_m	asdf	L:0,T:1,M:0
113942	y_x	mimi	L:-1,T:0,M:0	Kg
501901	XM	minX	L:0,T:2,M:1	seconds^2

if the user selects one of the symbols, then
the search interface shows which expressions contain that symbol
if the user selects one of the expressions, then
the search interface shows which derivations contain that expression

Currently, the search of symbols ( https://derivationmap.net/list_all_symbols ) is separate from the search of expressions ( https://derivationmap.net/list_all_expressions ) though the two are hyperlinked in both directions.

Sunday, August 8, 2021

troubleshooting why docker-compose does not successfully launch locally

Normally I make changes to the repo https://github.com/allofphysicsgraph/proofofconcept/tree/gh-pages/v7_pickle_web_interface and then run git pull on my DigitalOcean VPS Droplet. It has been a long time since I tried running the web server locally on my laptop.

Here's the process I went through to get the web server running locally.

First I had populated the "certs" directory

/Users/username/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface/certs

on my laptop from the remote VPS.

Then I had to create

/Users/username/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface/.env

with the Google variables for login authentication.

In the directory on my laptop

/Users/username/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface

running the command docker-compose up --build failed. Specifically, the nginx and flask containers would start, but nginx would fail because the flask container wasn't responding. I added

restart: on-failure

to the nginx section in docker-compose.yaml

That didn't solve the flask issue, but it allowed the containers to persist while I inspected the logs. Even though I couldn't enter the flask container, I could review the logs produced by gunicorn by running the command

tail -f flask/logs/gunicorn_error.log

which showed the causal issue

  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/appuser/app/wsgi.py", line 15, in <module>
    from controller import app
  File "/home/appuser/app/controller.py", line 63, in <module>
    from secure import SecureHeaders  # type: ignore
ImportError: cannot import name 'SecureHeaders'
[2021-08-08 19:56:37 +0000] [11] [INFO] Worker exiting (pid: 11)
[2021-08-08 19:56:37 +0000] [1] [WARNING] Worker with pid 11 was terminated due to signal 15
[2021-08-08 19:56:37 +0000] [1] [INFO] Shutting down: Master
[2021-08-08 19:56:37 +0000] [1] [INFO] Reason: Worker failed to boot.

Root case: In my requirements.txt I hadn't pinned the version of the Python library secure. According to this issue there was a recent update . I ended up pinning secure==0.2.1 in requirements.txt

Now I am able to run docker-compose up and get a web page at https://localhost/

domains in Physics that are difficult for the Physics Derivation Graph

Geometry-based derivations

The spatial reasoning needed does not conform to a computer algebra system

force diagram or free-body diagram

Centripetal forces, e.g., https://physics.info/centripetal/

Experiments

The chain of reasoning needed to justify design choices is difficult. The motivating insight is subjective.

consolidating content from Physics Derivation Graph websites

This post outlines the current resources associated with the project, and serves as a reminder to the author for what actions need to be taken.

https://derivationmap.net/ -- primary website with content

https://github.com/allofphysicsgraph/proofofconcept -- website for source code

https://physicsderivationgraph.blogspot.com/ -- blog containing ideas and design decisions

https://allofphysicsgraph.github.io/ redirects to https://derivationmap.net/

TODO

https://github.com/allofphysicsgraph/proofofconcept/wiki -- unorganized wiki; should be consolidated with https://derivationmap.net/

Specifically, https://github.com/allofphysicsgraph/proofofconcept/wiki/similar-projects needs to be merged into https://derivationmap.net/other_projects

https://sites.google.com/site/physicsderivationgraph/home -- old website, not maintained; should be consolidated with this blog.

comparison of Latex versus knowledge management efforts

In this post I assess why Latex is widely use and knowledge management efforts are not.
I'm using the spectrum documented in this previous post.

Latex for typesetting scientific documents

Latex was free, open source, well designed, solved a specific problem that had been unaddressed, the problem was felt by the content creators, is comprehensive, written by a single author, and the author was famous.

Now Latex has a user community, developers, libraries of software, reference books, support on multiple operating systems as well as the web.

Anything seeking to augment or displace Latex will need a value differentiation that is felt by content creators.

Semantic enrichment

Start with Latex and manually annotate
Start with Latex; use manually annotated corpus to train supervised machine learning model
Start with a custom domain specific language

Manual annotation or supervised machine learning may be incomplete if there are missing steps or assumptions.

"Better search" isn't a problem felt by content creators. This need is somewhat addressed by citations.

I'm not aware of any models of success for semantic enrichment.

Controlled Natural Languages

Not widely adopted: Mizar, ForTheL, Physics Derivation Graph

I'm not aware of any models of success for CNLs.

Formal Verification

Lean, Coq, Isabelle,

Burdensome to learn and to use.

Steps to convert a published scientific paper into the Physics Derivation Graph

This post documents the steps taken to get the derivations in the paper Speed of sound from fundamental physical constants into the Physics Derivation Graph.

1) Read the paper. I need to understand the point of the paper and where content is in the text.

2) Identify the primary result of the derivation. Which equation is the output? Is there more than one? In this paper, equation 1 is the primary result.

Tangent: I tested whether the most popular equation is the most important.
This is measurable using the command

cat sound1.tex | tr " " "\n" |\          # replace spaces with newlines -- word tokenization
   grep "ref{" |\                        # find internal references to labeled expressions
   sed 's/.*\\ref{//' | sed 's/}.*//' |\ # remove the irrelevant latex "\ref{}" to isolate the expression label ID
   sort | uniq -c | sort -n              # rank the popularity of expression labels
2 v001
3 abinitio
3 all
3 ratio
3 v0
3 v01
3 v1
5 elemental
6 bohr
6 rydberg
11 a
15 v00
15 v3

Decoding those latex labels to equations in the PDF, "v3" is equation 9, which is tied for most popular with "v00" -- equation 4. The second most popular label "a" is equation 10. The equation I'm claiming as the primary result, equation 1, is labeled as "v0" and is referenced 3 times.

3) What sections of the paper contain the derivation?
There are actually two derivations of the same result.

one derivation is on page 2 column 1
another derivation is on page 2 column 2

4) Determine whether the derivation already exist in the Physics Derivation Graph database. Check

5) On paper, write out equations from the paper. Here I'm limiting the scope to one derivation on page 2 column 1. Fill in additional equations that are missing from the text. Use arrows to denote flow of derivation.

Often I'll make math mistakes or go down paths that turn out to be irrelevant. If I were a typical researcher, this is where the reconstruction process would stop.

6) On a separate paper, rewrite the graph and label the edges in the directed graph to describe the operation being carried out.

7) On a separate paper, rewrite the content from the previous step to refine the "operation being carried out" to a directed graph with inference rules available in the Physics Derivation Graph.

8) Identify all symbols used in the derivation: \rho, V, x, a, m, \alpha, M, K, \hbar, c, \pi, \epsilon_0

9) Identify which symbols in the derivation already exist in the Physics Derivation Graph and which need to be added. Record the symbol ID of each for later use.

10) Add missing symbols into the the Physics Derivation Graph database using https://derivationmap.net/list_all_symbols#add%20symbol

11) Identify which inference rules the derivation uses.

12) Identify which inference rules for this derivation exist in the Physics Derivation Graph and which need to be added.

13) If novel inference rules are needed for this derivation, enter those in https://derivationmap.net/list_all_inference_rules#add_inf_rule

14) Using https://derivationmap.net/start_new_derivation/, enter the expressions and inference rules for the steps of the derivation. For each step,

ensure the symbols are referenced correctly in each expression
ensure the SymPy version of the expression is correct

The output of this step is https://derivationmap.net/review_derivation/608598/

15) Offline, in the JSON file containing the derivation, edit the symbol IDs and SymPy representations as needed to get the step validations to be correct.

Review the modifications on https://derivationmap.net/review_derivation/608598/

TODO: The extensive linking of variables to definitions is not made accessible in the output, so there is no way for the user to leverage the linked information.

expressions in the PDF/HTML output contain variables which have no link to their definitions
variables in the text of the PDF/HTML output are not linked to the expressions or their definitions

As a result of the above observation, https://github.com/allofphysicsgraph/proofofconcept/issues/217 was opened.

Saturday, August 7, 2021

spectrum of formalized scientific documentation

This post is an expansion of July 2020 post which was informed by Kaliszyk's survey.

Spectrum of formality:

category	verbal natural language	verbal presentation with aids	written text	semantic enrichment	human-oriented controlled natural languages	machine-oriented controlled natural languages	computer algebra systems	proof assistants, theorem provers
media	human speaking	human speaking; Latex containing text and equations -- Beamer	Latex containing text and equations and hyperlinks	Latex containing text and equations and hyperlinks; sTex	AIDA, ACE	text: RDF, OWL mathematical physics: Physics Derivation Graph math: ForTheL, Mizar	SymPy	Coq, Lean, Isabelle
purpose	social knowledge transfer, one-to-one	social knowledge transfer, one-to-many	human readers; text search	human readers; semantic search	consistency validation of logical claims	inference of novel knowledge	correctness of inference rules; dimensional consistency; unit consistency	mathematical correctness (consistency wrt a logical basis)
grouping	standard practice			knowledge representation			mathematical correctness
techniques				manual annotation; supervised machine learning	needs vocabulary and grammar: Mizar vocab ForTheL grammar PDG grammar

Left side of the spectrum is less rigorous, right side is more rigorous.

Left side of the spectrum has higher de Bruijn factor, right side has lower de Bruijn factor.

The Physics Derivation Graph is a machine-oriented controlled natural language that is integrated with a computer algebra system.

Semantic tagging can relate definitions for terms within a document to the associated variables used in that document.
The motive for the semantic tagging is to enhance search-ability. Since search-ability doesn't benefit the author, the author is not motivated to do the work of tagging.

Semantic tagging can identify common words across documents in a corpus. OWL is interesting conceptually but since the author doesn't benefit, the author is not motivated to do the work of linking.