Saturday, August 21, 2021

documenting my frequent pivots

  • survey all topics in Physics. [See this 2017-07 post.]
  • investigate refactoring the database from JSON to RDF. [See this 2020-12 post.]
  • goal: add a new derivation to the Physics Derivation Graph to identify bugs and inefficiencies in the workflow
  • instead of entering a new derivation, new goal is to create tutorials for an existing derivation. [See this 2021-08 post.]
  • before creating tutorials, write a "gold standard" for one derivation (then create tutorials). [See this 2021-07 post and this page.]
  • instead of fixating on a "gold standard," refocus on the original intent of the project: connecting expressions from the breadth of Physics. [See this 2021-08 post and this page.]

Friday, August 20, 2021

refocusing the motivation for the Physics Derivation Graph

The following aspects are "nice-to-have" in the Physics Derivation Graph:

  • validation of derivation steps using computer algebra systems
  • validation of consistency expression dimensions
  • validation of consistency of units when present in expressions
  • visualization of the graph
  • make the code pretty and easy to navigate for contributors
  • document the source code and design decisions
  • searching the graph (see this and this and this)
  • ability to determine whether paper is written by a crackpot
  • ability to detect unintentional errors in an article or book
  • cross-document references
  • semantic tagging of Latex documents
  • converting Latex from arxiv into formats supporting the above objectives
  • linking Physics Derivation Graph to existing ontology databases

The core objective of the Physics Derivation Graph is identifying the mathematical connectivity of the various domains of Physics. The same connectivity should be able to relate "basic" Physics (e.g., F=ma) to advanced Physics (e.g., the theory of the Standard model, string theory). 

I know of two ways to document the connectivity:
  • identify symbol re-use
  • identify inter-related derivations at the level of expressions
The symbol re-use is less interesting than the re-use of expressions. 


I've previously documented a plan,
but I didn't specify what would qualify as sufficient to show completeness. 

What core expressions are representative of all of Physics? Are there a set of expressions that, if shown to be connected by derivations, would be sufficient to demonstrate the span of the concept?

I expect that, for any given subdomain of Physics, there are central expressions. The task list is then
  1. identify every named expression in Physics -- see https://derivationmap.net/central_expessions
  2. identify every variable in that list of named expressions
  3. enter all named expressions and variables into the PDG database
  4. determine which expressions are not connected to anything

searching the Physics Derivation Graph database

The Physics Derivation Graph database is just a JSON file, so searching a plain text file is easy. However, the value of search is in the relations, not just finding a match. 

There are multiple aspects of the Physics Derivation Graph database to search: symbols, units, dimensions, inference rules, symbol names, operators. 

A workflow for search is

  1. user searches web interface for "x"
  2. the search interface returns multiple rows of a table; something like
    symbol IDsymbolnamedimension units
    948294 x_m asdfL:0,T:1,M:0
    113942 y_x mimiL:-1,T:0,M:0Kg
    501901 XM minXL:0,T:2,M:1 seconds^2
  3. if the user selects one of the symbols, then
  4. the search interface shows which expressions contain that symbol
  5. if the user selects one of the expressions, then
  6. the search interface shows which derivations contain that expression

Currently, the search of symbols ( https://derivationmap.net/list_all_symbols ) is separate from the search of expressions ( https://derivationmap.net/list_all_expressions ) though the two are hyperlinked in both directions.

Sunday, August 8, 2021

troubleshooting why docker-compose does not successfully launch locally

Normally I make changes to the repo https://github.com/allofphysicsgraph/proofofconcept/tree/gh-pages/v7_pickle_web_interface and then run git pull on my DigitalOcean VPS Droplet. It has been a long time since I tried running the web server locally on my laptop. 

Here's the process I went through to get the web server running locally.


First I had populated the "certs" directory

/Users/username/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface/certs
on my laptop from the remote VPS. 


Then I had to create 

/Users/username/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface/.env
with the Google variables for login authentication. 


In the directory on my laptop

/Users/username/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface
running the command docker-compose up --build failed. Specifically, the nginx and flask containers would start, but nginx would fail because the flask container wasn't responding. I added
restart: on-failure
to the nginx section in docker-compose.yaml

That didn't solve the flask issue, but it allowed the containers to persist while I inspected the logs. Even though I couldn't enter the flask container, I could review the logs produced by gunicorn by running the command

tail -f flask/logs/gunicorn_error.log
which showed the causal issue
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/appuser/app/wsgi.py", line 15, in <module>
    from controller import app
  File "/home/appuser/app/controller.py", line 63, in <module>
    from secure import SecureHeaders  # type: ignore
ImportError: cannot import name 'SecureHeaders'
[2021-08-08 19:56:37 +0000] [11] [INFO] Worker exiting (pid: 11)
[2021-08-08 19:56:37 +0000] [1] [WARNING] Worker with pid 11 was terminated due to signal 15
[2021-08-08 19:56:37 +0000] [1] [INFO] Shutting down: Master
[2021-08-08 19:56:37 +0000] [1] [INFO] Reason: Worker failed to boot.
Root case: In my requirements.txt I hadn't pinned the version of the Python library secure. According to this issue there was a recent update . I ended up pinning secure==0.2.1 in requirements.txt

Now I am able to run docker-compose up and get a web page at https://localhost/

domains in Physics that are difficult for the Physics Derivation Graph

Geometry-based derivations

The spatial reasoning needed does not conform to a computer algebra system


Experiments

The chain of reasoning needed to justify design choices is difficult. The motivating insight is subjective.

consolidating content from Physics Derivation Graph websites

This post outlines the current resources associated with the project, and serves as a reminder to the author for what actions need to be taken. 

https://derivationmap.net/ -- primary website with content

https://github.com/allofphysicsgraph/proofofconcept -- website for source code

https://physicsderivationgraph.blogspot.com/ -- blog containing ideas and design decisions


https://allofphysicsgraph.github.io/ redirects to https://derivationmap.net/


TODO

https://github.com/allofphysicsgraph/proofofconcept/wiki -- unorganized wiki; should be consolidated with https://derivationmap.net/


https://sites.google.com/site/physicsderivationgraph/home -- old website, not maintained; should be consolidated with this blog.



comparison of Latex versus knowledge management efforts

In this post I assess why Latex is widely use and knowledge management efforts are not. 
I'm using the spectrum documented in this previous post.


Latex for typesetting scientific documents

Latex was free, open source, well designed, solved a specific problem that had been unaddressed, the problem was felt by the content creators, is comprehensive, written by a single author, and the author was famous.

Now Latex has a user community, developers, libraries of software, reference books, support on multiple operating systems as well as the web.

Anything seeking to augment or displace Latex will need a value differentiation that is felt by content creators. 


Semantic enrichment

  • Start with Latex and manually annotate
  • Start with Latex; use manually annotated corpus to train supervised machine learning model
  • Start with a custom domain specific language

Manual annotation or supervised machine learning may be incomplete if there are missing steps or assumptions.

"Better search" isn't a problem felt by content creators. This need is somewhat addressed by citations. 

I'm not aware of any models of success for semantic enrichment.


Controlled Natural Languages

Not widely adopted: Mizar, ForTheL, Physics Derivation Graph

I'm not aware of any models of success for CNLs.

Formal Verification

Lean, Coq, Isabelle, 

Burdensome to learn and to use. 


Steps to convert a published scientific paper into the Physics Derivation Graph

This post documents the steps taken to get the derivations in the paper Speed of sound from fundamental physical constants into the Physics Derivation Graph. 

1) Read the paper. I need to understand the point of the paper and where content is in the text. 


2) Identify the primary result of the derivation. Which equation is the output? Is there more than one? In this paper, equation 1 is the primary result.

Tangent: I tested whether the most popular equation is the most important.
This is measurable using the command

cat sound1.tex | tr " " "\n" |\          # replace spaces with newlines -- word tokenization
   grep "ref{" |\                        # find internal references to labeled expressions
   sed 's/.*\\ref{//' | sed 's/}.*//' |\ # remove the irrelevant latex "\ref{}" to isolate the expression label ID
   sort | uniq -c | sort -n              # rank the popularity of expression labels
2 v001
3 abinitio
3 all
3 ratio
3 v0
3 v01
3 v1
5 elemental
6 bohr
6 rydberg
11 a
15 v00
15 v3
Decoding those latex labels to equations in the PDF, "v3" is equation 9, which is tied for most popular with "v00" -- equation 4. The second most popular label "a" is equation 10. The equation I'm claiming as the primary result, equation 1, is labeled as "v0" and is referenced 3 times.


3) What sections of the paper contain the derivation?
There are actually two derivations of the same result.

  • one derivation is on page 2 column 1
  • another derivation is on page 2 column 2


4) Determine whether the derivation already exist in the Physics Derivation Graph database. Check


5) On paper, write out equations from the paper. Here I'm limiting the scope to one derivation on page 2 column 1. Fill in additional equations that are missing from the text. Use arrows to denote flow of derivation.

Often I'll make math mistakes or go down paths that turn out to be irrelevant. If I were a typical researcher, this is where the reconstruction process would stop.


6) On a separate paper, rewrite the graph and label the edges in the directed graph to describe the operation being carried out. 


7) On a separate paper, rewrite the content from the previous step to refine the "operation being carried out" to a directed graph with inference rules available in the Physics Derivation Graph. 


8) Identify all symbols used in the derivation: \rho, V, x, a, m, \alpha, M, K, \hbar, c, \pi, \epsilon_0


9) Identify which symbols in the derivation already exist in the Physics Derivation Graph and which need to be added. Record the symbol ID of each for later use.


10) Add missing symbols into the the Physics Derivation Graph database using  https://derivationmap.net/list_all_symbols#add%20symbol 


11) Identify which inference rules the derivation uses. 


12) Identify which inference rules for this derivation exist in the Physics Derivation Graph and which need to be added. 


13) If novel inference rules are needed for this derivation, enter those in https://derivationmap.net/list_all_inference_rules#add_inf_rule


14) Using https://derivationmap.net/start_new_derivation/, enter the expressions and inference rules for the steps of the derivation. For each step, 

  • ensure the symbols are referenced correctly in each expression
  • ensure the SymPy version of the expression is correct

The output of this step is https://derivationmap.net/review_derivation/608598/


15) Offline, in the JSON file containing the derivation, edit the symbol IDs and SymPy representations as needed to get the step validations to be correct. 

Review the modifications on https://derivationmap.net/review_derivation/608598/



TODO: The extensive linking of variables to definitions is not made accessible in the output, so there is no way for the user to leverage the linked information. 

  • expressions in the PDF/HTML output contain variables which have no link to their definitions
  • variables in the text of the PDF/HTML output are not linked to the expressions or their definitions
As a result of the above observation, https://github.com/allofphysicsgraph/proofofconcept/issues/217 was opened. 

Saturday, August 7, 2021

spectrum of formalized scientific documentation

This post is an expansion of July 2020 post which was informed by Kaliszyk's survey.

Spectrum of formality:

category verbal natural language verbal presentation with aids written text semantic enrichment human-oriented controlled natural languages machine-oriented controlled natural languages computer algebra systems proof assistants, theorem provers
media human speaking human speaking; Latex containing text and equations -- BeamerLatex containing text and equations and hyperlinks Latex containing text and equations and hyperlinks;
sTex
AIDA, ACE text: RDF, OWL
mathematical physics: Physics Derivation Graph
mathForTheLMizar
SymPy Coq, Lean, Isabelle
purpose social knowledge transfer, one-to-one social knowledge transfer, one-to-many human readers; text search human readers; semantic search consistency validation of logical claims inference of novel knowledge correctness of inference rules; dimensional consistency; unit consistency mathematical correctness (consistency wrt a logical basis)
grouping standard practice knowledge representation mathematical correctness
techniques

manual annotation; supervised machine learning needs vocabulary and grammar:

Left side of the spectrum is less rigorous, right side is more rigorous. 

Left side of the spectrum has higher de Bruijn factor, right side has lower de Bruijn factor.


The Physics Derivation Graph is a machine-oriented controlled natural language that is integrated with a computer algebra system.


Semantic tagging can relate definitions for terms within a document to the associated variables used in that document. 
The motive for the semantic tagging is to enhance search-ability. Since search-ability doesn't benefit the author, the author is not motivated to do the work of tagging.

Semantic tagging can identify common words across documents in a corpus. OWL is interesting conceptually but since the author doesn't benefit, the author is not motivated to do the work of linking.