Sunday, October 11, 2020

upgrading Ubuntu 18.04 to 20.04 on DigitalOcean VPS droplet

I've been running a DigitalOcean droplet for $5/month for the past 6 months. Because I was new and didn't know better, I selected the Ubuntu 18.04 droplet. 

Now I want to update to Ubuntu 20.04 LTS. 

The guide recommends starting with a fresh 20.04 image instead of upgrading. 

The following is a record of the steps I took in this process. 

Total duration: 2 hours. The process took longer than expected because I hadn't previously configured the website from a bare Ubuntu server. Also, I had made a few changes since the initial installation that weren't documented.

Step 1: collect all data prior to turning off the server

Used scp to copy data from the droplet to my mac

scp user@IP:/home/pdg/arxiv_rss/ .
scp user@IP:/home/pdg/arxiv_rss/.env .
scp user@IP:/home/pdg/videos/* .
scp user@IP:/home/pdg/.bash_history .
scp user@IP:/home/pdg/.bashrc .
scp user@IP:/home/pdg/.python_history .
scp user@IP:/home/pdg/.sqlite_history .
cd proofofconcept/v7_pickle_web_interface/
scp user@IP:/home/pdg/proofofconcept/v7_pickle_web_interface/.env .
scp user@IP:/home/pdg/proofofconcept/v7_pickle_web_interface/certs/* .
scp user@IP:/home/pdg/proofofconcept/v7_pickle_web_interface/flask/logs/* .
scp user@IP:/home/pdg/.ssh/authorized_keys .

Grab the crontab entry

0 0 * * * /usr/bin/python3 /home/user/arxiv_rss/ >> /home/user/arxiv_rss/cron.log 2>&1

Step 2: power off the server and take a snapshot

Step 3: Start a new droplet

Selected Ubuntu 20.04

Step 4: configure accounts and access

adduser pdg
usermod -aG sudo pdg

ufw allow OpenSSH
ufw enable

Instead of creating new SSH key pairs, 
I imported my authorized_keys file to /home/pdg/.ssh/

To get the authorized_keys file I temporarily allowed password-based authentication for scp using
sudo vim /etc/ssh/sshd_config
change "PasswordAuthentication No" to "PasswordAuthentication Yes"
sudo service ssh restart
While I was there, I also ran
change "PermitRootLogin yes" to "permitRootLogin no"
Once I had transferred the authorized_keys file, I reverted to "PasswordAuthentication No" and ran
sudo service ssh restart

sudo ufw allow 443
sudo ufw allow 80

Step 5: update OS

sudo apt-get update
sudo apt-get upgrade

Step 6: install metrics

sudo apt-get purge do-agent
curl -sSL -o /tmp/
sudo bash /tmp/
/opt/digitalocean/bin/do-agent --version

Step 7: install Docker and Docker-Compose

Step 8: certs

sudo apt install certbot python3-certbot-nginx
sudo certbot certonly --webroot \
     -w /home/pdg/proofofconcept/v7_pickle_web_interface/certs \
     --server \
     -d -d

Your certificate and chain have been saved at:
   /etc/letsencrypt/live/   Your key file has been saved at:   /etc/letsencrypt/live/   Your cert will expire on 2021-01-09.
cd /etc/ssl/certs
sudo openssl dhparam -out dhparam.pem 4096
cp dhparam.pem ~/proofofconcept/v7_pickle_web_interface/certs/

Step 9: restore data from backup

git clone
scp .env user@IP:/home/pdg/proofofconcept/v7_pickle_web_interface/
cd proofofconcept/v7_pickle_web_interface/flask
cp users_sqlite.db_TEMPLATE users_sqlite.db
cd ..
docker-compose up --build --remove-orphans --detach

Sunday, September 20, 2020

use the inputs and inference rule to generate the output

Instead of expecting the user to provide the inputs and outputs and inference rule, supplying the inputs and inference rule is sufficient to generate the output. This output is necessarily consistent with the inputs and inference rule.

>>> from sympy import *

Define an inference rule

def mult_both_sides_by(expr, feed):
    return Equality(expr.lhs*feed, expr.rhs*feed, evaluate=False)
>>> expr = parse_latex('a = b')
>>> feed = parse_latex('f')
>>> mult_both_sides_by(expr, feed)
Eq(a*f, b*f)

This generalizes to include the relation

def mult_both_sides_by(expr, feed, relation):
    return relation(expr.lhs*feed, expr.rhs*feed, evaluate=False)
>>> mult_both_sides_by(expr, feed, Equality)
Eq(a*f, b*f)

Other relations are available; see
>>> mult_both_sides_by(expr, feed, Le)
a*f <= b*f

text to Latex to SymPy using frequency and period example

As an illustration of the gradations from text to Latex to CAS is provided below. In the derivation the CAS is 1-to-1 with the Latex.


Frequency and period are inversely related.

statement with mathematical notation

Frequency and period are inversely related; thus T = 1/f and f = 1/T

statement with mathematical notation and explanation of derivation

Frequency and period are inversely related; thus T = 1/f
Multiple both sides by f, then divide by T to get f = 1/T.
statement with explanation of derivation, separating expressions from text

Frequency and period are inversely related; thus 
T = 1/f.
Multiple both sides by f to get
f T=1
then divide by T to get
f = 1/T.

statement with expressions separated from text and with bindings between math and text made explicit

Frequency and period are inversely related; thus 
expression 1: T = 1/f
Multiple both sides of expression 1 by f to get expression 2
expression 2: f T=1
then divide both sides of expression 2 by T to get expression 3
expression 3: f = 1/T.

statement with inference rules made explicit

claim: Frequency and period are inversely related; thus
inference rule: declare initial expression
expression 1: T = 1/f
inference rule: Multiple both sides of expression 1 by f to get expression 2
expression 2: f T=1
inference rule: divide both sides of expression 2 by T to get expression 3
expression 3: f = 1/T.
inference rule: declare final expression

use of a Computer algebra system to implement inference rules

The following expansion requires

  • conversion of Latex to SymPy
  • correctly implemented inference rules

>>> import sympy
>>> from sympy import *
>>> from sympy.parsing.latex import parse_latex

claim: Frequency and period are inversely related; thus
inference rule: declare initial expression
expression 1: T = 1/f

To confirm consistency of representations, the input Latex expression can be converted to SymPy and then back to Latex using

>>> latex(eval(sympy.srepr(parse_latex('T = 1/f'))))
'T = \\frac{1}{f}'

We'll work with the SymPy representation of expression 1,

>>> sympy.srepr(parse_latex('T = 1/f'))
"Equality(Symbol('T'), Pow(Symbol('f'), Integer(-1)))"

Rather than using the SymPy, use the raw format of expression 1

>>> expr1 = parse_latex('T = 1/f')

inference rule: Multiple both sides of expression 1 by f to get expression 2
expression 2: f T=1

Although we can multiply a variable and an expression,

>>> expr1*Symbol('f')
f*(Eq(T, 1/f))

what actually needs to happen is first split the expression, then apply the multiplication to both sides

>>> Equality(expr1.lhs*Symbol('f'), expr1.rhs*Symbol('f'))
Eq(T*f, 1)

Application of an inference rule (above) results in the desired result, so save that result as the second expression (below).

>>> expr2 = Equality(expr1.lhs*Symbol('f'), expr1.rhs*Symbol('f'))

inference rule: divide both sides of expression 2 by T to get expression 3
expression 3: f = 1/T.

>>> Equality(expr2.lhs/Symbol('T'), expr2.rhs/Symbol('T'))
Eq(f, 1/T)

Again, save that to a variable

>>> expr3 = Equality(expr2.lhs/Symbol('T'), expr2.rhs/Symbol('T'))

>>> latex(expr3)
'f = \\frac{1}{T}'

inference rule: declare final expression

statement with inference rules and numeric IDs for symbols

To relate the above derivation to any other content in the Physics Derivation Graph, replace T and f with numeric IDs unique to "period" and "frequency"

>>> import sympy
>>> from sympy import *
>>> from sympy.parsing.latex import parse_latex

claim: Frequency and period are inversely related; thus
inference rule: declare initial expression
expression 1T = 1/f

>>> expr1 = parse_latex('T = 1/f')
>>> eval(srepr(expr1).replace('T','pdg9491').replace('f','pdg4201'))
Eq(pdg9491, 1/pdg4201)

Save the result as expression 1
>>> expr1 = eval(srepr(expr1).replace('T','pdg9491').replace('f','pdg4201'))

inference rule: Multiple both sides of expression 1 by f to get expression 2
expression 2f T=1

>>> feed = Symbol('f')
>>> feed = eval(srepr(feed).replace('f','pdg4201'))
>>> Equality(expr1.lhs*feed, expr1.rhs*feed)
>>> Equality(expr1.lhs*feed, expr1.rhs*feed)
Eq(pdg4201*pdg9491, 1)
>>> expr2 = Equality(expr1.lhs*feed, expr1.rhs*feed)

inference rule: divide both sides of expression 2 by T to get expression 3
expression 3f = 1/T.

>>> feed = Symbol('T')
>>> feed = eval(srepr(feed).replace('T','pdg9491'))
>>> Equality(expr2.lhs/feed, expr2.rhs/feed)
Eq(pdg4201, 1/pdg9491)
>>> expr3 = Equality(expr2.lhs/feed, expr2.rhs/feed)

Convert from numeric ID back to Latex symbols in Latex expression
>>> latex(eval(srepr(expr3).replace('pdg9491','T').replace('pdg4201','f')))
'f = \\frac{1}{T}'

inference rule: declare final expression

removal of text, pure Python

The above steps can be expressed as a Python script with two functions (one for each inference rule)

from sympy import *
from sympy.parsing.latex import parse_latex

# assumptions: the inference rules are correct, the conversion of symbols-to-IDs is correct, the Latex-to-SymPy parsing is correct

def mult_both_sides_by(expr, feed):
    return Equality(expr.lhs*feed, expr.rhs*feed)

def divide_both_sides_by(expr, feed):
    return Equality(expr.lhs/feed, expr.rhs/feed)

# inference rule: declare initial expression
expr1 = parse_latex('T = 1/f')
expr1 = eval(srepr(expr1).replace('T','pdg9491').replace('f','pdg4201'))

feed = Symbol('f')
feed = eval(srepr(feed).replace('f','pdg4201'))
expr2 = mult_both_sides_by(expr1, feed)

feed = Symbol('T')
feed = eval(srepr(feed).replace('T','pdg9491'))
expr3 = divide_both_sides_by(expr2, feed)

# inference rule: declare final expression

How would the rigor of the above be increased?

To get beyond what a CAS can verify, a "proof" would relate each of the two functions to a set of axioms. Given the two arguments (an expression, a "feed" value), is the returned value always consistent with some set of axioms?

The set of axioms chosen matters. For example, we could start with Zermelo–Fraenkel set theory

That would leave a significant gap between building up addition and subtraction and getting to calculus and differential equations. "Theorems of calculus derive from the axioms of the real, rational, integer, and natural number systems, as well as set theory." (source)

statements I believe to be true

Here is what I currently understand to be true:

  • Every Physics Derivation is comprised of discrete steps.
  • Each Step in a Physics Derivation has a single Inference rule. 
  • Every mathematical expression in a Physics Derivation can be written in Latex. 
  • There are some math expressions which cannot be written in a given CAS
    • example: definite evaluation after integration, like (x+y)|_{a}^{b} in SymPy
  • There are some derivation steps that cannot be expressed in a given CAS
  • There is not a 1-to-1 correspondence between a CAS-based graph and a Latex-based graph. 
  • There are many gradations between "text" and Latex and CAS and proof. 

The consequence of the CAS-based graph not having 1-to1 correspondence with a Latex-based graph is that the current data structure of Latex and SymPy in one graph is not suitable. 

problem identification for vectors and my current responses

I encountered a few issues with SymPy today. I'll explain what I found and what I plan to do about it.

I have an expression (1158485859) that, in Latex, is

\frac{-\hbar^2}{2m} \nabla^2 = {\cal H}

The \nabla^2 is represented in SymPy as Laplacian(), though an argument is required for the operator.

My solution: leave the SymPy representation as Pow(Symbol('nabla'), Integer(2))

Also on the topic of vectors, I encountered the expression

\vec{p} \cdot \vec{F}(\vec{r}, t) = a

In order to convert that to SymPy, I'd need to specify a basis for `\vec{p}` (e.g., `from sympy.vector import CoordSys3D`). For example, 

>>> N = CoordSys3D('N')
>>> p = Symbol('p_x')*N.i + Symbol('p_y')*N.j + Symbol('p_z')*N.k
>>> F = Symbol('F_x')*N.i + Symbol('F_y')*N.j + Symbol('F_z')*N.k
F_x*p_x + F_y*p_y + F_z*p_z

However, that doesn't seem to extend to functions:

>>>'r'), Symbol('t')))
Traceback (most recent call last):
TypeError: expecting string or Symbol for name

My solution: leave the SymPy representation as incorrect, using "multiplication" instead of "dot"

vectors in SymPy and use of dot cross and the Laplacian

Converting "\vec{\psi}(r, t)" into SymPy is feasible
Function('vecpsi')(Symbol('r'), Symbol('t'))
but I can't figure out how to apply the dot product with a vector:

>>> import sympy
>>> from sympy import *
>>> from sympy.vector import CoordSys3D, Del, curl, divergence, gradient
>>> Symbol('vecp').dot( Function('vecpsi')(Symbol('r'), Symbol('t')) )
Traceback (most recent call last):
AttributeError: 'Symbol' object has no attribute 'dot'

The issue is that a vector needs to be specified in a specific dimension (e.g., 3) and have specific coefficients with respect to the basis.

>>> N = CoordSys3D('N')
>>> v1 = Symbol('a')*N.i+Symbol('b')*N.j + Symbol('c')*N.k
>>> v2 = Symbol('f')*N.i+Symbol('g')*N.j + Symbol('k')*N.k
a*f + b*g + c*k
>>> v1.cross(v2)
(b*k - c*g)*N.i + (-a*k + c*f)*N.j + (a*g - b*f)*N.k

>>> delop = Del()
>>> delop(Symbol('a'))
>>> delop(v1)
(Derivative(a*N.i + b*N.j + c*N.k, N.x))*N.i + (Derivative(a*N.i + b*N.j + c*N.k, N.y))*N.j + (Derivative(a*N.i + b*N.j + c*N.k, N.z))*N.k
>>> v1
a*N.i + b*N.j + c*N.k
>>> curl(v1)
>>> divergence(v1)
>>> Laplacian(v1)
Laplacian(a*N.i + b*N.j + c*N.k)

Also, operators can't be defined since using Laplacian requires an argument:
>>> Laplacian()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __new__() missing 1 required positional argument: 'expr'

Sunday, September 6, 2020

expanding the list of Computer Algebra Systems used by the Physics Derivation Graph

The current implementation of the Physics Derivation Graph relies on user to 1) enter expressions as Latex and 2) assist/review the conversion from Latex to SymPy.

The choice of Latex as the input interface is based on ease of use (conciseness, low barrier to entry) and could be replaced by other options for specifying expressions. Entering characters and then parsing into an AST could be applied to a variety of input methods. A second choice is what to use the AST for -- currently the output is Sympy, but a variety of output representations is available (Sage, Mathematica, Maxima, etc).

In my experience with Sage, expressions in Sage are not as compact. There seems to be less support for features like units (feet, meters, seconds) --

I'm confident there are expressions in the PDG that Sage supports and SymPy does not. Expanding the CAS used by PDG means having three representations -- Latex and SymPy and Sage. The negative consequences of this expansion would include
  • increases the work needed to input the different representations, and
  • increases the work of verifying consistency between representations, and 
  • imposes extra knowledge requirements on the developer, and
  • additional dependencies
The potential benefit would be coverage for validating some steps not currently possible with SymPy. There might be value to other people who want to compare SymPy to Sage.