Saturday, October 9, 2021

periodic renewal of https letsencrypt certificates

Confirm expiration date of current certificates
openssl x509 -dates -noout < \
/home/pdg/proofofconcept/v7_pickle_web_interface/certs/fullchain.pem

Delete existing certs
sudo rm -rf /etc/letsencrypt/live/derivationmap.net
sudo rm -rf /etc/letsencrypt/renewal/derivationmap.net
sudo rm -rf /etc/letsencrypt/archive/derivationmap.net
Confirm folders are empty
sudo ls -hal /etc/letsencrypt/live/
sudo ls -hal /etc/letsencrypt/renewal/
sudo ls -hal /etc/letsencrypt/archive/

Request new certs
sudo certbot certonly --webroot \
-w /home/pdg/proofofconcept/v7_pickle_web_interface/certs \
--server https://acme-v02.api.letsencrypt.org/directory \
-d derivationmap.net -d www.derivationmap.net
To use multiple domains and a single cert, use
sudo certbot certonly --webroot \
-w /home/pdg/proofofconcept/v7_pickle_web_interface/certs \
--server https://acme-v02.api.letsencrypt.org/directory \
-d derivationmap.net -d www.derivationmap.net \
-d allofphysics.com -d www.allofphysics.com
Output should be something like
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator webroot, Installer None
Obtaining a new certificate

IMPORTANT NOTES:
 - Congratulations! Your certificate and chain have been saved at:
   /etc/letsencrypt/live/derivationmap.net/fullchain.pem
   Your key file has been saved at:
   /etc/letsencrypt/live/derivationmap.net/privkey.pem
   Your cert will expire on YYYY-MM-DD. To obtain a new or tweaked
   version of this certificate in the future, simply run certbot
   again. To non-interactively renew *all* of your certificates, run
   "certbot renew"

Copy new certs to directory that nginx mounts in Docker-compose
cd /home/pdg/proofofconcept/v7_pickle_web_interface/certs

mv dhparam.pem dhparam.pem_OLD
mv fullchain.pem fullchain.pem_OLD
mv privkey.pem privkey.pem_OLD

sudo cp /etc/letsencrypt/live/derivationmap.net/fullchain.pem .
sudo cp /etc/letsencrypt/live/derivationmap.net/privkey.pem .
sudo chown pdg:pdg privkey.pem
sudo openssl dhparam -out dhparam.pem 2048

Restart Docker-compose
cd /home/pdg/proofofconcept/v7_pickle_web_interface
docker-compose up --build --force-recreate --remove-orphans --detach
If the docker containers are not restarted, the changes made to the file on the host won't take effect.

Verify in a browser that https://derivationmap.net/ has the updated certificate. 

Set a calendar reminder to renew the certificate.

Saturday, August 21, 2021

documenting my frequent pivots

  • survey all topics in Physics. [See this 2017-07 post.]
  • investigate refactoring the database from JSON to RDF. [See this 2020-12 post.]
  • goal: add a new derivation to the Physics Derivation Graph to identify bugs and inefficiencies in the workflow
  • instead of entering a new derivation, new goal is to create tutorials for an existing derivation. [See this 2021-08 post.]
  • before creating tutorials, write a "gold standard" for one derivation (then create tutorials). [See this 2021-07 post and this page.]
  • instead of fixating on a "gold standard," refocus on the original intent of the project: connecting expressions from the breadth of Physics. [See this 2021-08 post and this page.]

Friday, August 20, 2021

refocusing the motivation for the Physics Derivation Graph

The following aspects are "nice-to-have" in the Physics Derivation Graph:

  • validation of derivation steps using computer algebra systems
  • validation of consistency expression dimensions
  • validation of consistency of units when present in expressions
  • visualization of the graph
  • make the code pretty and easy to navigate for contributors
  • document the source code and design decisions
  • searching the graph (see this and this and this)
  • ability to determine whether paper is written by a crackpot
  • ability to detect unintentional errors in an article or book
  • cross-document references
  • semantic tagging of Latex documents
  • converting Latex from arxiv into formats supporting the above objectives
  • linking Physics Derivation Graph to existing ontology databases

The core objective of the Physics Derivation Graph is identifying the mathematical connectivity of the various domains of Physics. The same connectivity should be able to relate "basic" Physics (e.g., F=ma) to advanced Physics (e.g., the theory of the Standard model, string theory). 

I know of two ways to document the connectivity:
  • identify symbol re-use
  • identify inter-related derivations at the level of expressions
The symbol re-use is less interesting than the re-use of expressions. 


I've previously documented a plan,
but I didn't specify what would qualify as sufficient to show completeness. 

What core expressions are representative of all of Physics? Are there a set of expressions that, if shown to be connected by derivations, would be sufficient to demonstrate the span of the concept?

I expect that, for any given subdomain of Physics, there are central expressions. The task list is then
  1. identify every named expression in Physics -- see https://derivationmap.net/central_expessions
  2. identify every variable in that list of named expressions
  3. enter all named expressions and variables into the PDG database
  4. determine which expressions are not connected to anything

searching the Physics Derivation Graph database

The Physics Derivation Graph database is just a JSON file, so searching a plain text file is easy. However, the value of search is in the relations, not just finding a match. 

There are multiple aspects of the Physics Derivation Graph database to search: symbols, units, dimensions, inference rules, symbol names, operators. 

A workflow for search is

  1. user searches web interface for "x"
  2. the search interface returns multiple rows of a table; something like
    symbol IDsymbolnamedimension units
    948294 x_m asdfL:0,T:1,M:0
    113942 y_x mimiL:-1,T:0,M:0Kg
    501901 XM minXL:0,T:2,M:1 seconds^2
  3. if the user selects one of the symbols, then
  4. the search interface shows which expressions contain that symbol
  5. if the user selects one of the expressions, then
  6. the search interface shows which derivations contain that expression

Currently, the search of symbols ( https://derivationmap.net/list_all_symbols ) is separate from the search of expressions ( https://derivationmap.net/list_all_expressions ) though the two are hyperlinked in both directions.

Sunday, August 8, 2021

troubleshooting why docker-compose does not successfully launch locally

Normally I make changes to the repo https://github.com/allofphysicsgraph/proofofconcept/tree/gh-pages/v7_pickle_web_interface and then run git pull on my DigitalOcean VPS Droplet. It has been a long time since I tried running the web server locally on my laptop. 

Here's the process I went through to get the web server running locally.


First I had populated the "certs" directory

/Users/username/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface/certs
on my laptop from the remote VPS. 


Then I had to create 

/Users/username/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface/.env
with the Google variables for login authentication. 


In the directory on my laptop

/Users/username/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface
running the command docker-compose up --build failed. Specifically, the nginx and flask containers would start, but nginx would fail because the flask container wasn't responding. I added
restart: on-failure
to the nginx section in docker-compose.yaml

That didn't solve the flask issue, but it allowed the containers to persist while I inspected the logs. Even though I couldn't enter the flask container, I could review the logs produced by gunicorn by running the command

tail -f flask/logs/gunicorn_error.log
which showed the causal issue
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/appuser/app/wsgi.py", line 15, in <module>
    from controller import app
  File "/home/appuser/app/controller.py", line 63, in <module>
    from secure import SecureHeaders  # type: ignore
ImportError: cannot import name 'SecureHeaders'
[2021-08-08 19:56:37 +0000] [11] [INFO] Worker exiting (pid: 11)
[2021-08-08 19:56:37 +0000] [1] [WARNING] Worker with pid 11 was terminated due to signal 15
[2021-08-08 19:56:37 +0000] [1] [INFO] Shutting down: Master
[2021-08-08 19:56:37 +0000] [1] [INFO] Reason: Worker failed to boot.
Root case: In my requirements.txt I hadn't pinned the version of the Python library secure. According to this issue there was a recent update . I ended up pinning secure==0.2.1 in requirements.txt

Now I am able to run docker-compose up and get a web page at https://localhost/

domains in Physics that are difficult for the Physics Derivation Graph

Geometry-based derivations

The spatial reasoning needed does not conform to a computer algebra system


Experiments

The chain of reasoning needed to justify design choices is difficult. The motivating insight is subjective.

consolidating content from Physics Derivation Graph websites

This post outlines the current resources associated with the project, and serves as a reminder to the author for what actions need to be taken. 

https://derivationmap.net/ -- primary website with content

https://github.com/allofphysicsgraph/proofofconcept -- website for source code

https://physicsderivationgraph.blogspot.com/ -- blog containing ideas and design decisions


https://allofphysicsgraph.github.io/ redirects to https://derivationmap.net/


TODO

https://github.com/allofphysicsgraph/proofofconcept/wiki -- unorganized wiki; should be consolidated with https://derivationmap.net/


https://sites.google.com/site/physicsderivationgraph/home -- old website, not maintained; should be consolidated with this blog.



comparison of Latex versus knowledge management efforts

In this post I assess why Latex is widely use and knowledge management efforts are not. 
I'm using the spectrum documented in this previous post.


Latex for typesetting scientific documents

Latex was free, open source, well designed, solved a specific problem that had been unaddressed, the problem was felt by the content creators, is comprehensive, written by a single author, and the author was famous.

Now Latex has a user community, developers, libraries of software, reference books, support on multiple operating systems as well as the web.

Anything seeking to augment or displace Latex will need a value differentiation that is felt by content creators. 


Semantic enrichment

  • Start with Latex and manually annotate
  • Start with Latex; use manually annotated corpus to train supervised machine learning model
  • Start with a custom domain specific language

Manual annotation or supervised machine learning may be incomplete if there are missing steps or assumptions.

"Better search" isn't a problem felt by content creators. This need is somewhat addressed by citations. 

I'm not aware of any models of success for semantic enrichment.


Controlled Natural Languages

Not widely adopted: Mizar, ForTheL, Physics Derivation Graph

I'm not aware of any models of success for CNLs.

Formal Verification

Lean, Coq, Isabelle, 

Burdensome to learn and to use. 


Steps to convert a published scientific paper into the Physics Derivation Graph

This post documents the steps taken to get the derivations in the paper Speed of sound from fundamental physical constants into the Physics Derivation Graph. 

1) Read the paper. I need to understand the point of the paper and where content is in the text. 


2) Identify the primary result of the derivation. Which equation is the output? Is there more than one? In this paper, equation 1 is the primary result.

Tangent: I tested whether the most popular equation is the most important.
This is measurable using the command

cat sound1.tex | tr " " "\n" |\          # replace spaces with newlines -- word tokenization
   grep "ref{" |\                        # find internal references to labeled expressions
   sed 's/.*\\ref{//' | sed 's/}.*//' |\ # remove the irrelevant latex "\ref{}" to isolate the expression label ID
   sort | uniq -c | sort -n              # rank the popularity of expression labels
2 v001
3 abinitio
3 all
3 ratio
3 v0
3 v01
3 v1
5 elemental
6 bohr
6 rydberg
11 a
15 v00
15 v3
Decoding those latex labels to equations in the PDF, "v3" is equation 9, which is tied for most popular with "v00" -- equation 4. The second most popular label "a" is equation 10. The equation I'm claiming as the primary result, equation 1, is labeled as "v0" and is referenced 3 times.


3) What sections of the paper contain the derivation?
There are actually two derivations of the same result.

  • one derivation is on page 2 column 1
  • another derivation is on page 2 column 2


4) Determine whether the derivation already exist in the Physics Derivation Graph database. Check


5) On paper, write out equations from the paper. Here I'm limiting the scope to one derivation on page 2 column 1. Fill in additional equations that are missing from the text. Use arrows to denote flow of derivation.

Often I'll make math mistakes or go down paths that turn out to be irrelevant. If I were a typical researcher, this is where the reconstruction process would stop.


6) On a separate paper, rewrite the graph and label the edges in the directed graph to describe the operation being carried out. 


7) On a separate paper, rewrite the content from the previous step to refine the "operation being carried out" to a directed graph with inference rules available in the Physics Derivation Graph. 


8) Identify all symbols used in the derivation: \rho, V, x, a, m, \alpha, M, K, \hbar, c, \pi, \epsilon_0


9) Identify which symbols in the derivation already exist in the Physics Derivation Graph and which need to be added. Record the symbol ID of each for later use.


10) Add missing symbols into the the Physics Derivation Graph database using  https://derivationmap.net/list_all_symbols#add%20symbol 


11) Identify which inference rules the derivation uses. 


12) Identify which inference rules for this derivation exist in the Physics Derivation Graph and which need to be added. 


13) If novel inference rules are needed for this derivation, enter those in https://derivationmap.net/list_all_inference_rules#add_inf_rule


14) Using https://derivationmap.net/start_new_derivation/, enter the expressions and inference rules for the steps of the derivation. For each step, 

  • ensure the symbols are referenced correctly in each expression
  • ensure the SymPy version of the expression is correct

The output of this step is https://derivationmap.net/review_derivation/608598/


15) Offline, in the JSON file containing the derivation, edit the symbol IDs and SymPy representations as needed to get the step validations to be correct. 

Review the modifications on https://derivationmap.net/review_derivation/608598/



TODO: The extensive linking of variables to definitions is not made accessible in the output, so there is no way for the user to leverage the linked information. 

  • expressions in the PDF/HTML output contain variables which have no link to their definitions
  • variables in the text of the PDF/HTML output are not linked to the expressions or their definitions
As a result of the above observation, https://github.com/allofphysicsgraph/proofofconcept/issues/217 was opened. 

Saturday, August 7, 2021

spectrum of formalized scientific documentation

This post is an expansion of July 2020 post which was informed by Kaliszyk's survey.

Spectrum of formality:

category verbal natural language verbal presentation with aids written text semantic enrichment human-oriented controlled natural languages machine-oriented controlled natural languages computer algebra systems proof assistants, theorem provers
media human speaking human speaking; Latex containing text and equations -- BeamerLatex containing text and equations and hyperlinks Latex containing text and equations and hyperlinks;
sTex
AIDA, ACE text: RDF, OWL
mathematical physics: Physics Derivation Graph
mathForTheLMizar
SymPy Coq, Lean, Isabelle
purpose social knowledge transfer, one-to-one social knowledge transfer, one-to-many human readers; text search human readers; semantic search consistency validation of logical claims inference of novel knowledge correctness of inference rules; dimensional consistency; unit consistency mathematical correctness (consistency wrt a logical basis)
grouping standard practice knowledge representation mathematical correctness
techniques

manual annotation; supervised machine learning needs vocabulary and grammar:

Left side of the spectrum is less rigorous, right side is more rigorous. 

Left side of the spectrum has higher de Bruijn factor, right side has lower de Bruijn factor.


The Physics Derivation Graph is a machine-oriented controlled natural language that is integrated with a computer algebra system.


Semantic tagging can relate definitions for terms within a document to the associated variables used in that document. 
The motive for the semantic tagging is to enhance search-ability. Since search-ability doesn't benefit the author, the author is not motivated to do the work of tagging.

Semantic tagging can identify common words across documents in a corpus. OWL is interesting conceptually but since the author doesn't benefit, the author is not motivated to do the work of linking. 




Sunday, July 25, 2021

how I validate changes and deploy updates to derivationmap.net website

On my local server, I make changes to the local file and then validate using
cd ~/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface
docker-compose up --build --remove-orphans
Build time is 10 minutes when layers are not cached.

Check the result in a local web browser by visiting http://localhost If I'm happy with the content,
git add filename
git commit -m "a message"
git push
SSH to the DigitalOcean droplet VPS (virtual private server)
cd ~/proofofconcept/v7_pickle_web_interface
git pull
docker-compose up --build --force-recreate --remove-orphans --detach
Check the live page content at https://derivationmap.net/.

Sunday, July 11, 2021

roadmap for identifying mathematical variables in Latex documents

Context: I think there's a graph of mathematical expressions in Physics. That graph has math expressions as nodes and the nodes are related by inference rules. The graph spans textbooks and journal articles. The scope of this post is limited to the graph within one document. 

For determining consistency of mathematical content within one document, the first step is to identify mathematical variables and operators. In practice, variables commonly appear inline as $a$ or within an expression like

\begin{equation}
a = 2
\end{equation}

Another way variables appear in text is within \newcommand, like

\newcommand{\R}{\mathbb{R}}

or

\newcommand{\bb}[1]{\mathbb{#1}}
Other numerical systems have similar notations. 
The complex numbers \( \bb{C} \), the rational 
numbers \( \bb{Q} \) and the integer numbers \( \bb{Z} \).

source

For addressing the complexity of expanding newcommand macros, see https://stackoverflow.com/questions/1509799/how-to-replace-latex-macros-with-their-definitions-using-latex


Once variables and operators are identified, the next step would be to associate each symbol with the respective definition (e.g., wikipedia links) and dimensions (e.g., length, time, charge). 



Options

https://pypi.org/project/TexSoup/, https://texsoup.alvinwan.com/, https://github.com/alvinwan/TexSoup, https://stackoverflow.com/users/4855984/alvin-wan

https://pylatexenc.readthedocs.io/en/latest/latexwalker/, https://github.com/phfaist/pylatexenc/

Saturday, July 10, 2021

dhparam.pem necessary for nginix web server

This morning I was alerted by Wachete that the derivationmap.net website was unavailable. 

I logged into the digitalocean.com virtual private server (VPS) and used top to see that the container processes were running.

Normally the command I run to start the Docker containers is

docker-compose up --build --force-recreate --remove-orphans --detach

To troubleshoot, I ran

docker-compose up --build --force-recreate --remove-orphans

and the output was

Successfully built 0ffaac97e769
Successfully tagged v7_pickle_web_interface_nginx:latest
Recreating v7_pickle_web_interface_flask_1 ... done
Recreating v7_pickle_web_interface_nginx_1 ... done
Attaching to v7_pickle_web_interface_flask_1, v7_pickle_web_interface_nginx_1
nginx_1  | 2021/07/10 11:48:41 [emerg] 1#1: PEM_read_bio_DHparams("/certs/dhparam.pem") failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: DH PARAMETERS)
nginx_1  | nginx: [emerg] PEM_read_bio_DHparams("/certs/dhparam.pem") failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: DH PARAMETERS)
v7_pickle_web_interface_nginx_1 exited with code 1

The fix was to point nginix to the dhparam.pem file.

https://security.stackexchange.com/questions/94390/whats-the-purpose-of-dh-parameters

Sunday, May 16, 2021

what would create a tipping point

Every scientist coming to the website https://derivationmap.net/ is unlikely. 

Graph analysis 

This is what I've historically chased -- identifying the data structure, and the data input mechanism. 

Extracting value from staring at a visualization of the graph of equations is unlikely. I'm not clear what graph queries are relevant to run against the graph content.


Consistency within a document

The local value to both the author and the reader is in determining whether the mathematical content of the paper being read is self-consistent. Practically, that means

  • are the dimensions of variables with each expression consistent?
  • are the units used in each expression consistent?
  • are the variables clearly defined? Here "definition" means a tuple of (symbol, dimensions, definition)
    • are the variables used consistently throughout this paper?
  • are the operators well defined? Here "definition" means a tuple of (symbol, number of inputs, number of outputs, constraints per input, constraints per output). 
    • are the operators used consistently throughout this paper?
  • is the mathematical content consistent with the written text?
Some of these aspects are explored on https://derivationmap.net/clickable_layers

As an author, I want to write Latex that generates a document that is mathematically correct.

Mathematical typos should be detected (similar to spell-check). As I enter text (or as a post-processing phase), the computer should guess whether the content is math or non-math. If math, then prompt the author for relevant details. 

Options for implementation

Overleaf is open source: https://github.com/overleaf/overleaf, so modifying it could be an option.


Cross-document analysis

In a larger context, the relevant value questions include

  • how does the paper I'm currently reading (or writing) relate to other papers?
  • how does the paper I'm currently reading (or writing) build on previous work?

Rather than bibliographic citation, I care about mathematical provenance. 

The specific symbols may vary across papers, and the dimensions may vary (e.g., renormalizing the speed of light to 1), but definitions have to be shared. 

The scientific community currently resorts to bibliographic citation because that is the only provenance available, not because it is what matters or what we value.

The cross-document analysis is not feasible without semantic content. The current approach of unstructured text with few hyperlinks requires human readers. Addressing the intra-document consistency challenge might yield semantic markup that enables cross-document analysis.