Sunday, December 27, 2020

ordered list representation in RDF

The Physics Derivation Graph depends on a data structure capable of using ordered lists. RDF's support for ordered lists is slightly convoluted. The best visualization of ordered lists in RDF I've found is https://ontola.io/blog/ordered-data-in-rdf/

I tried sketching how the "linked recursive lists" approach looks for the Physics Derivation Graph for a derivation that has a sequence of steps, and each step has an ordered list of inputs, feeds, and outputs.



Credit: dreampuf.github.io

Sunday, December 13, 2020

identifying classes in the Physics Derivation Graph for OWL (Web Ontology Language)

Classes and subclasses of entities in the Physics Derivation Graph:

  • derivations = an ordered set of two or more steps
  • steps = a set of one or more statements related by an inference rule
  • inference rule = identifies the relation of a set of one or more statements
  • statement = two or more expressions (LHS and RHS) and a relational operator
    • expressions = an ordered set of symbols
    • symbols = a token
      • operator = applies to one or more values (aka operands). Property: number of expected values
      • value. Property: categorized as "variable" xor "constant"
        • integer = one or more digits. The set of digits depends on the base
        • float
        • complex
      • unit. Examples: "m" for meter, "kg" for kilogram
Some aspects of expressions and derivations I don't have names for yet:
  • binary operators {"where", "for all", "when", "for"} used two relate two expressions, the "primary expression" on the left and one or more "scope"/"definition"/"constraint" (equation/inequality)

Some aspects of expressions and derivations I don't need to label in the PDG:
  • terms = parts of the expression that are connected with addition and subtraction
  • factors = parts of the expression that are connected by multiplication
  • coefficients = a number that is multiplied by a variable in a mathematical expression.
  • power, base, exponent
  • base (as in decimal vs hexadecimal, etc)
  • formula
  • function

An equation is two expressions linked with an equal sign. 
What is the superclass above "equation" and "inequality"?
So far I'm settling on "statement".

I am intentionally staying out of the realm of {proofs, theorems, axioms} both because that is outside the scope of the Physics Derivation Graph and because the topic is already addressed by OMDoc. 

Suppose we have a statement like
y = x^2 + b where x = {5, 3, 1}
In that statement, 
  • "y = x^2 + b" is an equation
  • "x^2 + b" is an expression and is related to the expression "y" by equality. 
  • "x^2" is a term in the RHS expression
  • "x = {5, 3, 1}" is an equation that provides scope for the primary equation. 
What is the "where" relation in the statement? The "where" is a binary operator that relates two equations. There are other "statement operators" to relate equations, like "for all"; see the statement
a + c = 2*g + k for all g \in \Re
In that statement, "g \in \Re" is (an equation?) serving as a scope for the primary equation. 

All statements have supplemental scope/definition equations that are usually left as implicit. The reader is expected to deduce the scope of the statement from the surrounding context. 

The supplemental scope/definition equations describe both per-variable and inter-variable constraints. For example,
x*y + 3 = 94 where ((x \in \Re) AND (y \in \Re) AND (x<y))

More complicated statement:
f(x) = { 0 for x<0
       { 1 for 0<=x<=1
       { 0 for x>1
Here the LHS is a function and the RHS is an integer, but the value of the integer depends on x. 
Note that the "0<=x<=1" can be separated into "0<=x AND x<=1". Expanding this even more,
(f(x) = 0 for x<0) AND (f(x) = 1 for (0<=x AND x<=1)) AND (f(x) = 0 for x>1)

Saturday, December 12, 2020

an argument in support of RDF instead of property graphs

I've wrestled with whether to use Property Graphs to store and query the Physics Derivation Graph. I see potential value, but the licensing of Neo4j keeps me from committing. I'm aware of other implementations, but I don't have confidence about either their stability or durability.

This post makes a convincing argument about both the short-comings of a property-graph-based knowledge graph and the value of an RDF-based storage method. To summarize,

  • don't be distracted by visualization capabilities; inference is more important
  • property graph IDs are local, whereas identifiers in RDF are global. 
  • Global IDs are vital for enabling federation, merge, diff

I know OWL (Web Ontology Language) is popular for knowledge representation, and this post was the first to provide a clear breakdown of the difference between property graphs, RDF, and OWL. OWL supports

  • the ability infer that a node that is a member of a class is also a member of any of its superclasses
  • properties can have superproperties
OWL overview:
  • https://www.cambridgesemantics.com/blog/semantic-university/learn-rdf/
  • https://www.cambridgesemantics.com/blog/semantic-university/learn-owl-rdfs/owl-101/
  • https://www.cambridgesemantics.com/blog/semantic-university/learn-owl-rdfs/

Saturday, November 21, 2020

log analysis of nginx access using Python Pandas

My first step is to review logins on the site,
https://physicsderivationgraph.blogspot.com/2020/05/inspecting-list-of-users-who-have.html

My previous post on reviewing logs
https://physicsderivationgraph.blogspot.com/2020/05/grepping-nginx-logs-to-observe-user.html
was written prior to the current nginx format I'm using.

I haven't gotten around to a deeper analysis like
https://physicsderivationgraph.blogspot.com/2020/04/analysis-of-web-logs-to-understand-how.html


First I had to install supporting software

  sudo apt install python3-pip
  pip3 install pandas

Inline Python in bash with Pandas is possible because every line is formatted like a Python dictionary. Here I want to review what columns are present in the logs

cat nginx_access.log | python3 -c "import sys
import pandas
pandas.options.display.max_rows = 999 # https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html
list_of_lines = []
for line in sys.stdin:
    list_of_lines.append(eval(line))
df = pandas.DataFrame(list_of_lines)
print(df.columns)
"
How many of each entry for a few columns?
cat nginx_access.log | python3 -c "import sys
import pandas
pandas.options.display.max_rows = 999 # https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html
list_of_lines = []
for line in sys.stdin:
    list_of_lines.append(eval(line))
df = pandas.DataFrame(list_of_lines)
threshold = 20
print('user:')
vc = df['user'].value_counts()
print(vc[vc>threshold])
print('IP:')
vc = df['ip'].value_counts()
print(vc[vc>threshold])
print('req:')
vc = df['req'].value_counts()
print(vc[vc>threshold])
#print(df.head())
"
For IPs that have made multiple (e.g., 30) requests, what pages have been accessed?
cat nginx_access.log | python3 -c "import sys
import pandas
pandas.options.display.max_rows = 999 # https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html
list_of_lines = []
for line in sys.stdin:
    list_of_lines.append(eval(line))
df = pandas.DataFrame(list_of_lines)
threshold = 30
vc = df['ip'].value_counts()
for ip, number_of_requests in vc[vc>threshold].items():
    print('\nIP = ',ip, 'made',number_of_requests,'requests')
    df_this_ip = df[df['ip']==ip]
    #for request in df_this_ip['req'].values:
    #    print(request)
    print(df_this_ip['req'].value_counts())
"

Sunday, October 11, 2020

upgrading Ubuntu 18.04 to 20.04 on DigitalOcean VPS droplet

I've been running a DigitalOcean droplet for $5/month for the past 6 months. Because I was new and didn't know better, I selected the Ubuntu 18.04 droplet. 

Now I want to update to Ubuntu 20.04 LTS. 

The guide recommends starting with a fresh 20.04 image instead of upgrading. 

The following is a record of the steps I took in this process. 

Total duration: 2 hours. The process took longer than expected because I hadn't previously configured the website from a bare Ubuntu server. Also, I had made a few changes since the initial installation that weren't documented.

Step 1: collect all data prior to turning off the server

Used scp to copy data from the droplet to my mac

scp user@IP:/home/pdg/arxiv_rss/rss_filter_email.py .
scp user@IP:/home/pdg/arxiv_rss/.env .
scp user@IP:/home/pdg/videos/* .
scp user@IP:/home/pdg/.bash_history .
scp user@IP:/home/pdg/.bashrc .
scp user@IP:/home/pdg/.python_history .
scp user@IP:/home/pdg/.sqlite_history .
cd proofofconcept/v7_pickle_web_interface/
scp user@IP:/home/pdg/proofofconcept/v7_pickle_web_interface/.env .
scp user@IP:/home/pdg/proofofconcept/v7_pickle_web_interface/certs/* .
scp user@IP:/home/pdg/proofofconcept/v7_pickle_web_interface/flask/logs/* .
scp user@IP:/home/pdg/.ssh/authorized_keys .

Grab the crontab entry

0 0 * * * /usr/bin/python3 /home/user/arxiv_rss/rss_filter_email.py >> /home/user/arxiv_rss/cron.log 2>&1

Step 2: power off the server and take a snapshot

https://www.digitalocean.com/docs/images/snapshots/how-to/snapshot-droplets/

Step 3: Start a new droplet

Selected Ubuntu 20.04

Step 4: configure accounts and access

adduser pdg
usermod -aG sudo pdg

ufw allow OpenSSH
ufw enable

Instead of creating new SSH key pairs, 
I imported my authorized_keys file to /home/pdg/.ssh/

To get the authorized_keys file I temporarily allowed password-based authentication for scp using
sudo vim /etc/ssh/sshd_config
change "PasswordAuthentication No" to "PasswordAuthentication Yes"
sudo service ssh restart
While I was there, I also ran
change "PermitRootLogin yes" to "permitRootLogin no"
Once I had transferred the authorized_keys file, I reverted to "PasswordAuthentication No" and ran
sudo service ssh restart


sudo ufw allow 443
sudo ufw allow 80

Step 5: update OS


sudo apt-get update
sudo apt-get upgrade

Step 6: install metrics


sudo apt-get purge do-agent
curl -sSL https://repos.insights.digitalocean.com/install.sh -o /tmp/install.sh
sudo bash /tmp/install.sh
/opt/digitalocean/bin/do-agent --version

Step 7: install Docker and Docker-Compose


Step 8: certs

sudo apt install certbot python3-certbot-nginx
sudo certbot certonly --webroot \
     -w /home/pdg/proofofconcept/v7_pickle_web_interface/certs \
     --server https://acme-v02.api.letsencrypt.org/directory \
     -d derivationmap.net -d www.derivationmap.net

Your certificate and chain have been saved at:
   /etc/letsencrypt/live/derivationmap.net/fullchain.pem   Your key file has been saved at:   /etc/letsencrypt/live/derivationmap.net/privkey.pem   Your cert will expire on 2021-01-09.
https://security.stackexchange.com/questions/94390/whats-the-purpose-of-dh-parameters
cd /etc/ssl/certs
sudo openssl dhparam -out dhparam.pem 4096
cp dhparam.pem ~/proofofconcept/v7_pickle_web_interface/certs/

Step 9: restore data from backup

git clone https://github.com/allofphysicsgraph/proofofconcept.git
scp .env user@IP:/home/pdg/proofofconcept/v7_pickle_web_interface/
cd proofofconcept/v7_pickle_web_interface/flask
cp users_sqlite.db_TEMPLATE users_sqlite.db
cd ..
docker-compose up --build --remove-orphans --detach

Sunday, September 20, 2020

use the inputs and inference rule to generate the output

Instead of expecting the user to provide the inputs and outputs and inference rule, supplying the inputs and inference rule is sufficient to generate the output. This output is necessarily consistent with the inputs and inference rule.

>>> from sympy import *

Define an inference rule

def mult_both_sides_by(expr, feed):
    return Equality(expr.lhs*feed, expr.rhs*feed, evaluate=False)
 
>>> expr = parse_latex('a = b')
>>> feed = parse_latex('f')
>>> mult_both_sides_by(expr, feed)
Eq(a*f, b*f)

This generalizes to include the relation

def mult_both_sides_by(expr, feed, relation):
    return relation(expr.lhs*feed, expr.rhs*feed, evaluate=False)
 
>>> mult_both_sides_by(expr, feed, Equality)
Eq(a*f, b*f)

Other relations are available; see https://docs.sympy.org/latest/modules/core.html
>>> mult_both_sides_by(expr, feed, Le)
a*f <= b*f

text to Latex to SymPy using frequency and period example

As an illustration of the gradations from text to Latex to CAS is provided below. In the derivation the CAS is 1-to-1 with the Latex.



statement

Frequency and period are inversely related.


statement with mathematical notation

Frequency and period are inversely related; thus T = 1/f and f = 1/T

statement with mathematical notation and explanation of derivation

Frequency and period are inversely related; thus T = 1/f
Multiple both sides by f, then divide by T to get f = 1/T.
statement with explanation of derivation, separating expressions from text

Frequency and period are inversely related; thus 
T = 1/f.
Multiple both sides by f to get
f T=1
then divide by T to get
f = 1/T.

statement with expressions separated from text and with bindings between math and text made explicit

Frequency and period are inversely related; thus 
expression 1: T = 1/f
Multiple both sides of expression 1 by f to get expression 2
expression 2: f T=1
then divide both sides of expression 2 by T to get expression 3
expression 3: f = 1/T.

statement with inference rules made explicit

claim: Frequency and period are inversely related; thus
inference rule: declare initial expression
expression 1: T = 1/f
inference rule: Multiple both sides of expression 1 by f to get expression 2
expression 2: f T=1
then 
inference rule: divide both sides of expression 2 by T to get expression 3
expression 3: f = 1/T.
inference rule: declare final expression


use of a Computer algebra system to implement inference rules

The following expansion requires

  • conversion of Latex to SymPy
  • correctly implemented inference rules

>>> import sympy
>>> from sympy import *
>>> from sympy.parsing.latex import parse_latex

claim: Frequency and period are inversely related; thus
inference rule: declare initial expression
expression 1: T = 1/f

To confirm consistency of representations, the input Latex expression can be converted to SymPy and then back to Latex using

>>> latex(eval(sympy.srepr(parse_latex('T = 1/f'))))
'T = \\frac{1}{f}'

We'll work with the SymPy representation of expression 1,

>>> sympy.srepr(parse_latex('T = 1/f'))
"Equality(Symbol('T'), Pow(Symbol('f'), Integer(-1)))"

Rather than using the SymPy, use the raw format of expression 1

>>> expr1 = parse_latex('T = 1/f')

inference rule: Multiple both sides of expression 1 by f to get expression 2
expression 2: f T=1

Although we can multiply a variable and an expression,

>>> expr1*Symbol('f')
f*(Eq(T, 1/f))

what actually needs to happen is first split the expression, then apply the multiplication to both sides

>>> Equality(expr1.lhs*Symbol('f'), expr1.rhs*Symbol('f'))
Eq(T*f, 1)

Application of an inference rule (above) results in the desired result, so save that result as the second expression (below).

>>> expr2 = Equality(expr1.lhs*Symbol('f'), expr1.rhs*Symbol('f'))

inference rule: divide both sides of expression 2 by T to get expression 3
expression 3: f = 1/T.

>>> Equality(expr2.lhs/Symbol('T'), expr2.rhs/Symbol('T'))
Eq(f, 1/T)

Again, save that to a variable

>>> expr3 = Equality(expr2.lhs/Symbol('T'), expr2.rhs/Symbol('T'))

>>> latex(expr3)
'f = \\frac{1}{T}'

inference rule: declare final expression


statement with inference rules and numeric IDs for symbols

To relate the above derivation to any other content in the Physics Derivation Graph, replace T and f with numeric IDs unique to "period" and "frequency"

>>> import sympy
>>> from sympy import *
>>> from sympy.parsing.latex import parse_latex

claim: Frequency and period are inversely related; thus
inference rule: declare initial expression
expression 1T = 1/f

>>> expr1 = parse_latex('T = 1/f')
>>> eval(srepr(expr1).replace('T','pdg9491').replace('f','pdg4201'))
Eq(pdg9491, 1/pdg4201)

Save the result as expression 1
>>> expr1 = eval(srepr(expr1).replace('T','pdg9491').replace('f','pdg4201'))

inference rule: Multiple both sides of expression 1 by f to get expression 2
expression 2f T=1

>>> feed = Symbol('f')
>>> feed = eval(srepr(feed).replace('f','pdg4201'))
>>> Equality(expr1.lhs*feed, expr1.rhs*feed)
>>> Equality(expr1.lhs*feed, expr1.rhs*feed)
Eq(pdg4201*pdg9491, 1)
>>> expr2 = Equality(expr1.lhs*feed, expr1.rhs*feed)

inference rule: divide both sides of expression 2 by T to get expression 3
expression 3f = 1/T.

>>> feed = Symbol('T')
>>> feed = eval(srepr(feed).replace('T','pdg9491'))
>>> Equality(expr2.lhs/feed, expr2.rhs/feed)
Eq(pdg4201, 1/pdg9491)
>>> expr3 = Equality(expr2.lhs/feed, expr2.rhs/feed)

Convert from numeric ID back to Latex symbols in Latex expression
>>> latex(eval(srepr(expr3).replace('pdg9491','T').replace('pdg4201','f')))
'f = \\frac{1}{T}'

inference rule: declare final expression

removal of text, pure Python

The above steps can be expressed as a Python script with two functions (one for each inference rule)

from sympy import *
from sympy.parsing.latex import parse_latex

# assumptions: the inference rules are correct, the conversion of symbols-to-IDs is correct, the Latex-to-SymPy parsing is correct

def mult_both_sides_by(expr, feed):
    return Equality(expr.lhs*feed, expr.rhs*feed)

def divide_both_sides_by(expr, feed):
    return Equality(expr.lhs/feed, expr.rhs/feed)

# inference rule: declare initial expression
expr1 = parse_latex('T = 1/f')
expr1 = eval(srepr(expr1).replace('T','pdg9491').replace('f','pdg4201'))

feed = Symbol('f')
feed = eval(srepr(feed).replace('f','pdg4201'))
expr2 = mult_both_sides_by(expr1, feed)

feed = Symbol('T')
feed = eval(srepr(feed).replace('T','pdg9491'))
expr3 = divide_both_sides_by(expr2, feed)

latex(eval(
srepr(expr3).replace('pdg9491','T').replace('pdg4201','f')))
# inference rule: declare final expression


How would the rigor of the above be increased?

To get beyond what a CAS can verify, a "proof" would relate each of the two functions to a set of axioms. Given the two arguments (an expression, a "feed" value), is the returned value always consistent with some set of axioms?

The set of axioms chosen matters. For example, we could start with Zermelo–Fraenkel set theory

That would leave a significant gap between building up addition and subtraction and getting to calculus and differential equations. "Theorems of calculus derive from the axioms of the real, rational, integer, and natural number systems, as well as set theory." (source)


statements I believe to be true

Here is what I currently understand to be true:

  • Every Physics Derivation is comprised of discrete steps.
  • Each Step in a Physics Derivation has a single Inference rule. 
  • Every mathematical expression in a Physics Derivation can be written in Latex. 
  • There are some math expressions which cannot be written in a given CAS
    • example: definite evaluation after integration, like (x+y)|_{a}^{b} in SymPy
  • There are some derivation steps that cannot be expressed in a given CAS
  • There is not a 1-to-1 correspondence between a CAS-based graph and a Latex-based graph. 
  • There are many gradations between "text" and Latex and CAS and proof. 

The consequence of the CAS-based graph not having 1-to1 correspondence with a Latex-based graph is that the current data structure of Latex and SymPy in one graph is not suitable. 


problem identification for vectors and my current responses

I encountered a few issues with SymPy today. I'll explain what I found and what I plan to do about it.

I have an expression (1158485859) that, in Latex, is

\frac{-\hbar^2}{2m} \nabla^2 = {\cal H}

The \nabla^2 is represented in SymPy as Laplacian(), though an argument is required for the operator.

My solution: leave the SymPy representation as Pow(Symbol('nabla'), Integer(2))


Also on the topic of vectors, I encountered the expression

\vec{p} \cdot \vec{F}(\vec{r}, t) = a

In order to convert that to SymPy, I'd need to specify a basis for `\vec{p}` (e.g., `from sympy.vector import CoordSys3D`). For example, 

>>> N = CoordSys3D('N')
>>> p = Symbol('p_x')*N.i + Symbol('p_y')*N.j + Symbol('p_z')*N.k
>>> F = Symbol('F_x')*N.i + Symbol('F_y')*N.j + Symbol('F_z')*N.k
>>> p.dot(F)
F_x*p_x + F_y*p_y + F_z*p_z

However, that doesn't seem to extend to functions:

>>> p.dot(Function(F)(Symbol('r'), Symbol('t')))
Traceback (most recent call last):
...
TypeError: expecting string or Symbol for name

My solution: leave the SymPy representation as incorrect, using "multiplication" instead of "dot"

vectors in SymPy and use of dot cross and the Laplacian

Converting "\vec{\psi}(r, t)" into SymPy is feasible
Function('vecpsi')(Symbol('r'), Symbol('t'))
but I can't figure out how to apply the dot product with a vector:

>>> import sympy
>>> from sympy import *
>>> from sympy.vector import CoordSys3D, Del, curl, divergence, gradient
>>> Symbol('vecp').dot( Function('vecpsi')(Symbol('r'), Symbol('t')) )
Traceback (most recent call last):
AttributeError: 'Symbol' object has no attribute 'dot'

The issue is that a vector needs to be specified in a specific dimension (e.g., 3) and have specific coefficients with respect to the basis.

>>> N = CoordSys3D('N')
>>> v1 = Symbol('a')*N.i+Symbol('b')*N.j + Symbol('c')*N.k
>>> v2 = Symbol('f')*N.i+Symbol('g')*N.j + Symbol('k')*N.k
>>> v1.dot(v2)
a*f + b*g + c*k
>>> v1.cross(v2)
(b*k - c*g)*N.i + (-a*k + c*f)*N.j + (a*g - b*f)*N.k

see https://en.wikipedia.org/wiki/Del
>>> delop = Del()
>>> delop(Symbol('a'))
0
>>> delop(v1)
(Derivative(a*N.i + b*N.j + c*N.k, N.x))*N.i + (Derivative(a*N.i + b*N.j + c*N.k, N.y))*N.j + (Derivative(a*N.i + b*N.j + c*N.k, N.z))*N.k
>>> v1
a*N.i + b*N.j + c*N.k
>>> curl(v1)
0
>>> divergence(v1)
0
>>> Laplacian(v1)
Laplacian(a*N.i + b*N.j + c*N.k)

Also, operators can't be defined since using Laplacian requires an argument:
>>> Laplacian()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __new__() missing 1 required positional argument: 'expr'

Sunday, September 6, 2020

expanding the list of Computer Algebra Systems used by the Physics Derivation Graph

The current implementation of the Physics Derivation Graph relies on user to 1) enter expressions as Latex and 2) assist/review the conversion from Latex to SymPy.

The choice of Latex as the input interface is based on ease of use (conciseness, low barrier to entry) and could be replaced by other options for specifying expressions. Entering characters and then parsing into an AST could be applied to a variety of input methods. A second choice is what to use the AST for -- currently the output is Sympy, but a variety of output representations is available (Sage, Mathematica, Maxima, etc).

In my experience with Sage, expressions in Sage are not as compact. There seems to be less support for features like units (feet, meters, seconds) -- https://ask.sagemath.org/question/49511/working-with-units/

I'm confident there are expressions in the PDG that Sage supports and SymPy does not. Expanding the CAS used by PDG means having three representations -- Latex and SymPy and Sage. The negative consequences of this expansion would include
  • increases the work needed to input the different representations, and
  • increases the work of verifying consistency between representations, and 
  • imposes extra knowledge requirements on the developer, and
  • additional dependencies
The potential benefit would be coverage for validating some steps not currently possible with SymPy. There might be value to other people who want to compare SymPy to Sage. 

summary of SymPy hurdles for the Physics Derivation Graph



Friday, September 4, 2020

Latex symbols that are operators - how to create SymPy placeholder

{\cal H} in Latex is used for the Hamiltonian in the "1D particle in a box"
https://en.wikipedia.org/wiki/Hamiltonian_(quantum_mechanics)
Needs to be translated to a SymPy operator 
https://docs.sympy.org/latest/modules/physics/quantum/piab.html

The Laplace operator \nabla^2 in Latex is 
delop = Del()
delop.dot(delop())

representing the Laplace operator (nabla in Latex) in SymPy using Del

https://en.wikipedia.org/wiki/Laplace_operator
https://docs.sympy.org/latest/modules/vector/fields.html

>>> import sympy
>>> from sympy import *
>>> from sympy.parsing.latex import parse_latex
>>> from sympy.vector import Del


Original expression as Latex converted to SymPy to Latex:
>>> latex(eval(sympy.srepr(parse_latex("\\frac{-\\hbar^2}{2m} \\nabla = {\\calH}"))))
'- \\frac{\\hbar^{2} nabla}{2 m} = calH'

The first two conversions yield SymPy:
>>> sympy.srepr(parse_latex("\\frac{-\\hbar^2}{2m} \\nabla = {\\calH}"))
"Equality(Mul(Symbol('nabla'), Mul(Mul(Integer(-1), Pow(Symbol('hbar'), Integer(2))), Pow(Mul(Integer(2), Symbol('m')), Integer(-1)))), Symbol('calH'))"

This can be successfully evaluated as SymPy because the 'nabla' is a Symbol
>>> eval("Equality(Mul(Symbol('nabla'), Mul(Mul(Integer(-1), Pow(Symbol('hbar'), Integer(2))), Pow(Mul(Integer(2), Symbol('m')), Integer(-1)))), Symbol('calH'))")
Eq(-hbar**2*nabla/(2*m), calH)

However, replacing 'nabla' with 'Del' causes the eval to fail:
>>> eval("Equality(Mul(Del, Mul(Mul(Integer(-1), Pow(Symbol('hbar'), Integer(2))), Pow(Mul(Integer(2), Symbol('m')), Integer(-1)))), Symbol('calH'))")
Traceback (most recent call last):
...
  File "/usr/local/lib/python3.6/dist-packages/sympy/core/mul.py", line 307, in flatten
    b, e = o.as_base_exp()
AttributeError: type object 'Del' has no attribute 'as_base_exp'



Original expression to convert to SymPy:
>>> sympy.srepr(parse_latex("\\nabla^2 \\psi \\left( \\vec{r},t) \\right) = \\frac{i}{\\hbar} \\vec{p} \\cdot \\left( \\vec{ \\nabla} \\psi( \\vec{r},t) \\right)"))
"Mul(Pow(Symbol('nabla'), Integer(2)), Mul(Symbol('psi'), Mul(Symbol('right'), Function('left')(Mul(Symbol('r'), Symbol('vec')), Symbol('t')))))"


The Latex "\nabla" is SymPy's "Del". However, squaring Del isn't available
>>> sympy.latex(sympy.Pow(Del, sympy.Integer(2)), sympy.Symbol('x'))
...
AttributeError: type object 'Del' has no attribute '_eval_power'


The Laplacian operator is the dot product of two Del operators, so
>>> delop = Del()
>>> sympy.latex(delop.dot(delop(sympy.Function('\psi')(sympy.Symbol('r'), sympy.Symbol('t')))))
'0'


Evaluating definite integrals for humans versus SymPy breaks the Latex-to-SymPy mapping of steps

 In the process of fixing expressions and steps in the PDG database, I encountered a novel challenge.

Currently every step in the PDG has a set of Latex expressions. These expressions are provided by the user, converted to SymPy, and then the step is validated using SymPy. There is a one-to-one mapping of "what the user sees" to "what the CAS checks."

In a derivation there is a sequence like

    f = \int_a^b x dx

inference rule: carry out definite integration

    f = (x^2/2)|_a^b

inference rule: simplify

    f = 1/2 (b^2 - a^2)

It seems that SymPy doesn't support the middle expression and instead goes directly from the first to the last expression. That means there is an intermediary Latex expression that cannot be converted to Sympy, breaking the assumption of "what the human reads is one-to-one with what the CAS checks."

Accounting for the mismatch between Latex steps and CAS steps would make the already-messy graph structure more complicated. Some steps that are included for human readability would not be able to be specified to the CAS, nor would those steps be checked.

ufw allow status numbered delete -- managing the firewall

$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), deny (routed)
New profiles: skip

To                         Action      From
--                         ------      ----
22/tcp                     LIMIT IN    Anywhere                  
443/tcp                    ALLOW IN    Anywhere                  
80/tcp                     ALLOW IN    Anywhere                  
25                         ALLOW IN    Anywhere                  
22/tcp (v6)                LIMIT IN    Anywhere (v6)             
443/tcp (v6)               ALLOW IN    Anywhere (v6)             
80/tcp (v6)                ALLOW IN    Anywhere (v6)             
25 (v6)                    ALLOW IN    Anywhere (v6)             


$ sudo ufw allow 8000/tcp
Rule added
Rule added (v6)


$ sudo  ufw status numbered
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] 22/tcp                     LIMIT IN    Anywhere     
[ 2] 443/tcp                    ALLOW IN    Anywhere  
[ 3] 80/tcp                     ALLOW IN    Anywhere  
[ 4] 22/tcp (v6)                LIMIT IN    Anywhere (v6) 
[ 5] 443/tcp (v6)               ALLOW IN    Anywhere (v6) 
[ 6] 80/tcp (v6)                ALLOW IN    Anywhere (v6) 
[ 7] 25 (v6)                    ALLOW IN    Anywhere (v6) 

$ sudo ufw delete 7
Deleting:
 allow 25
Proceed with operation (y|n)? y

Rule deleted (v6)

Saturday, August 29, 2020

visualization of step validation; 271 steps to address

I created a page listing all the steps from every derivation. The utility is that I can summarize how many steps are valid (180 out of 638, or 28%), how many are declarations and assumptions (179+8, or 29%), and how many fail or are not checked -- the remaining 40%.


My plan is to work on addressing the
638-(180+179+8)=271 
failed or unchecked expressions.

I expect SymPy doesn't support all the inference rules, so I need a way to categorize that issue.

Thursday, August 27, 2020

relation of inference rules to axioms and proofs

Axioms are a set of statements upon which other statements can be formed.

Lemmas are easy but irrelevant.
Corollaries are quick but irrelevant consequences
Propositions are interesting
Theorems are important and difficult

https://www.quora.com/What-is-the-difference-between-a-lemma-theorem-corollary-and-proposition
http://blogs.scienceforums.net/ajb/2013/01/12/lemma-theorem-proposition-or-corollary/
https://math.stackexchange.com/questions/463362/whats-the-difference-between-theorem-lemma-and-corollary

A proof is about consistency of statements with respect to a set of axioms.

****************

To show the connection can be made, pick the simplest PDG inference rules.
* "add X to both sides"
* "multiply both sides by"

Wednesday, August 26, 2020

linear representation of a directed graph

A derivation typically does not show all steps. The steps of a derivation are typically not linear.

 
Figure 1: left side is a directed acyclic graph where some nodes are hidden and each node can be linearly ordered. In the center all nodes have been shown in a linear presentation. On the right, all nodes in a non-linear display.

Sunday, August 23, 2020

significant challenges feel like emotional barriers to progress

The current form of the Physics Derivation Graph has served to validate my claim of feasibility. There are a number of significant challenges that inhibit scalability, regardless of whether I generate more content or other people contribute.

  • Input complexity: the current multi-step webform process is tedious and burdensome. There are many steps, resulting in a burdensome workload for both the backend developer (many features are necessary) and the front-end user (using all those features). Alternative interaction mechanisms are technically feasible but not in my current skillset.
  • Display complexity: the graph with hundreds of nodes exceeds the ability to visually navigate given the current d3.js and Graphviz interfaces. Rendering the graph in 3D might help, but the readability of node labels is important. 
  • Limited ability to query: the graph is presented visually but lacks support for responding to user queries. Currently writing custom analysis scripts that read the JSON is the only access.
  • Use correct symbol IDs: the current JSON database is populated with incorrect symbol references and a mixture of SymPy+text symbols. 
  • Validation of steps: the current JSON database is populated with incorrect steps. Inference rules are used but either are not feasible in SymPy or are implemented incorrectly. 
  • Validation of dimension: after correcting the symbols and the steps, each expression needs to be verified to have the correct dimension. 
The last three are listed in dependency order. The symbols need to be fixed, then validation of steps and dimension are possible.

I don't currently have attacks planned on either display complexity or query capability. 


My current task list:
  • Correct the symbol IDs in data.json
  • Change layout of expression input table to distinguish input, output, and feed relation to inference rule
  • add capability to edit SymPy from web interface
  • functionally, separate symbol replacement from expression-as-sympy. For the web interface, insert new steps: Latex math expression -> symbols -> SymPy expression
  • web interface for reviewing correctness of Latex -> SymPy -> Latex for expressions

Sunday, August 16, 2020

how to edit the SymPy Latex parser and rebuild the antlr artifacts for a pull request

https://github.com/sympy/sympy/wiki/Development-workflow#fork-sympy-project

Go to https://github.com/sympy/sympy
Fork to https://github.com/bhpayne/sympy/
In https://github.com/bhpayne/sympy/ create a new branch, e.g. "floor-patch"
In the bhpayne/sympy:floor-patch branch, change three files


Then, in a local directory run
mkdir build_sympy
cd build_sympy
git clone https://github.com/bhpayne/sympy/

I use a Docker container to build SymPy
cat <<EOF >> Dockerfile
FROM phusion/baseimage:0.11

RUN apt-get update && \
    apt-get install -y \
               vim \
               python3 python3-pip python3-dev \
               wget \
               default-jre \
    && rm -rf /var/lib/apt/lists/*
WORKDIR /usr/local/lib
RUN curl -O https://www.antlr.org/download/antlr-4.7.2-complete.jar
COPY sympy/ /opt/
RUN echo "alias python=python3" > /root/.bashrc
RUN ln -s /usr/bin/python3.6 /usr/bin/python
# import the pip package for integration of grammar with Python    
RUN pip3 install antlr4-python3-runtime mpmath
# build antlr grammar
WORKDIR /opt/sympy/parsing/latex
ENV CLASSPATH=".:/usr/local/lib/antlr-4.7.2-complete.jar:$CLASSPATH"
RUN java -jar /usr/local/lib/antlr-4.7.2-complete.jar LaTeX.g4 -no-visitor -no-listener -o _antlr
# from msgoff
COPY rename.py /opt/sympy/parsing/latex
RUN python3 rename.py
# set up Sympy
WORKDIR /opt/
RUN python3 setup.py install
EOF

A second file, created by msgoff, is used for the Antlr build process
cat <<EOF >> rename.py
import glob
import os
output_dir = "_antlr"
for path in glob.glob(os.path.join(output_dir, "LaTeX*.*")) + glob.glob(
    os.path.join(output_dir, "latex*.*")):
    offset = 0
    new_path = os.path.join(output_dir, os.path.basename(path).lower())
    with open(path, "r") as f:
        lines = [line.rstrip() + "\n" for line in f.readlines()]
    os.unlink(path)
    with open(new_path, "w") as out_file:
        if path.endswith(".py"):
            offset = 2
            out_file.write(header)
        out_file.writelines(lines[offset:])
EOF

Inside the container,
cd /scratch/sympy/sympy/parsing/latex
java -jar /usr/local/lib/antlr-4.7.2-complete.jar LaTeX.g4 -no-visitor -no-listener -o _antlr
python rename.py

Now rebuild sympy
cd /scratch/sympy/
python setup.py install

leave the container
exit

On the host, add the build artifacts for Antlr
cd sympy/
git status
git add sympy/parsing/latex/_antlr/latexlexer.py sympy/parsing/latex/_antlr/latexparser.py



Testing

https://github.com/sympy/sympy/wiki/Running-tests
>>> import sympy
>>> sympy.test()
takes 2 hours on my MacBook Air

The relevant test is
>>> sympy.test("sympy/parsing/tests/test_latex.py")

transition from "validation of concept" to "usable by other people"

My initial intent with the Physics Derivation Graph was to validate the concept of using a graph as a data structure for mathematical Physics. Creating the derivationmap.net website was an important milestone -- I was proud of the content and the presentation. The previous website (https://allofphysicsgraph.github.io/proofofconcept/) felt like merely a step beyond having a public code repo since the content was limited to displaying graphs.

Now I can direct people to the derivationmap.net website and not feel embarrassed by the limitations imposed by the host. The limitations of derivationmap.net are due to my creativity and technical skill.

The consequence of feedback from interested people has been increased awareness of the roughness of the code. The code (HTML, Python, JSON) is not something I would find accessible if I were reviewing the project.

Actions I can take to improve the accessibility of the code:
  • better docstrings
  • use doctests
  • document the workflow

Saturday, August 15, 2020

plan of record for parsing Latex expressions

I'm assuming there's an interactive feedback loop with the user in the Physics Derivation Graph, whereas that's not the case for bulk content like arXiv. How to respond to ambiguity depends on whether we can assume the user is available for clarifications.


Given an input string to parse,
  1. Is the string valid Latex? If yes, continue; if no, return to user with complaint
  2. Is the string valid mathematical Latex? If yes, continue; if no, return to user with complaint
  3. Can the mathematical Latex be parsed without ambiguity? If yes, return SymPy to user; if no, continue
  4. If there is ambiguity, can the ambiguity be resolved by used a different flavor of the grammar? If no, return the options to the user so they can select the right parsing.

Removing markup specific to display may be relevant. For example, replacing "\ " with " " and replacing "\quad" with " " and replacing "\qquad" with " " and replacing "\left(" with "(" would reduce the parser workload.

Example of invalid math Latex:
\frac a b

The user probably intended 
\frac{a}{b}

Wednesday, August 12, 2020

disable UFW logging to /var/log/syslog

https://serverfault.com/questions/817565/remove-ufw-block-from-kern-log-and-sys-log
https://askubuntu.com/questions/452125/redirect-ufw-logs-to-own-file

Latex math expressions that case Sympy's Latex parser to fail

$ git clone https://github.com/allofphysicsgraph/proofofconcept.git
$ cd proofofconcept/v7_pickle_web_interface/flask
$ make dockerlive
$ python
>>> import sympy
>>> sympy.__version__
'1.5.1'
>>> from sympy.parsing.latex import parse_latex
>>> import json
>>> with open('data.json') as json_file:
...     dat = json.load(json_file)

>>> for expr_id, expr_dict in dat['expressions'].items():
...     print(expr_dict['latex'])

>>> for expr_id, expr_dict in dat['expressions'].items():
    try:
        x = parse_latex(expr_dict['latex'])
    except Exception as er:
        print('expr ID =', expr_id)
        print(er)

Using that approach, I found the following problems in the current (valid) Latex expressions used in the Physics Derivation Graph.


Subscripts with spaces

expr ID = 8871333437
I don't understand this
PE_{\rm Earth\ surface}
~~~~~~~~~~~~~^

expr ID = 7053449926
I don't understand this
r_{\rm geostationary\ orbit}
~~~~~~~~~~~~~~~~~~~~^


Use of "\left."

expr ID = 0439492440
I don't understand this
\frac{1}{a^2} = \frac{1}{2}W - \frac{1}{2}\left. \frac{W}{2n\pi}\sin\left(\frac{2n\pi}{W} x\right) \right|_0^W
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

Spaces in subscript

expr ID = 7575859306
I don't understand this
\left( \delta^{l}_{\ \ j} \delta^{m}_{\ \ k} - \delta^{l}_{\ \ k} \delta^{m}_{\ \ h} \right) \hat{x}_i \nabla_j \nabla^m E^n = \vec{ \nabla}( \vec{ \nabla} \cdot \vec{E} - \nabla^2 \vec{E})
~~~~~~~~~~~~~~~~~~~^

expr ID = 7575859308
I don't understand this
\left( \delta^{l}_{\ \ j} \delta^{m}_{\ \ k} \hat{x}_i \nabla_j \nabla^m E^n\right)-\left( \delta^{l}_{\ \ k} \delta^{m}_{\ \ h} \hat{x}_i \nabla_j \nabla^m E^n \right)  = \vec{ \nabla}( \vec{ \nabla} \cdot \vec{E} - \nabla^2 \vec{E})
~~~~~~~~~~~~~~~~~~~^


Apostrophe

expr ID = 4662369843
I don't understand this
x' = \gamma (x - v t)
~^

expr ID = 2983053062
I don't understand this
x = \gamma (x' + v t')
~~~~~~~~~~~~~^

expr ID = 3426941928
I don't understand this
x = \gamma ( \gamma (x - v t) + v t' )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

Comma in subscript

expr ID = 9973952056
I expected something else here
-g t = v_y - v_{0, y}
~~~~~~~~~~~~~~~~~^


expr ID = 7391837535
I expected something else here
\cos(\theta) = \frac{v_{0, x}}{v_0}
~~~~~~~~~~~~~~~~~~~~~~~~~^

expr ID = 8949329361
I expected something else here
v_0 \sin(\theta) = v_{0, y}
~~~~~~~~~~~~~~~~~~~~~~~^

Spaces

expr ID = 3920616792
I don't understand this
T_{\rm geostationary orbit} = 24\ {\rm hours}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

Much greater than


expr ID = 9674924517
I don't understand this
K >> G
~~~^

Sunday, August 9, 2020

content categories and keyword linking - what's the relation to the PDG?

Suppose an article on ArXiv had full markup with PhysML and ScienceWise. While useful in aggregate for search, it's not clear how those would relate to the Physics Derivation Graph.

The ScienceWise concept of linking keywords to a database would need to be expanded to linking variables to a database. Then one paper using "x" for position and another paper using "d" for position could link to the same variable definition (independent of symbol used).

my first certificate expiration

This morning I was greeted with this warning from Chrome when visiting https://derivationmap.net

The error message indicated my certificates had expired.

I SSH'd into my DigitalOcean node and ran a scan of the certs that certbot can find

$ sudo certbot renew
----------------------
Processing /etc/letsencrypt/renewal/derivationmap.net.conf
----------------------
Cert not yet due for renewal
----------------------
The following certs are not due for renewal yet:
  /etc/letsencrypt/live/derivationmap.net/fullchain.pem expires on 2020-10-08 (skipped)
No renewals were attempted.

However, when I run a manual scan of the certs used by my site,

$ openssl x509 -dates -noout < /home/pdg/proofofconcept/v7_pickle_web_interface/certs/fullchain.pem
notBefore=May 11 15:26:19 2020 GMT
notAfter=Aug  9 15:26:19 2020 GMT

The corresponds with the command history entry from 2020-05-11,
sudo certbot certonly --webroot \
-w /home/pdg/proofofconcept/v7_pickle_web_interface/certs \
--server https://acme-v02.api.letsencrypt.org/directory \
-d derivationmap.net -d www.derivationmap.net

Solution

Delete existing certs
sudo rm -rf /etc/letsencrypt/{live,renewal,archive}/{derivationmap.net,derivationmap.net.conf}/

Request new certs

sudo certbot certonly --webroot \
-w /home/pdg/proofofconcept/v7_pickle_web_interface/certs \
--server https://acme-v02.api.letsencrypt.org/directory \
-d derivationmap.net -d www.derivationmap.net

Copy new certs to directory that nginx mounts in Docker-compose

cd /home/pdg/proofofconcept/v7_pickle_web_interface/certs
sudo cp /etc/letsencrypt/live/derivationmap.net/fullchain.pem .
sudo cp /etc/letsencrypt/live/derivationmap.net/privkey.pem .
sudo chown pdg:pdg privkey.pem
openssl dhparam -out dhparam.pem 2048

Restart Docker-compose

docker-compose up --build --force-recreate --remove-orphans --detach

If the docker containers are not restarted, the changes made to the file on the host won't take effect.

Verify in a browser that https://derivationmap.net/ has the updated certificate.

Set a calendar reminder to renew the certificate

Friday, August 7, 2020

a web-based GUI for drawing graphs with latex

Back in 2014 the EquationMap interface featured the ability enter latex into nodes and create graphs with edges.



Similar web interfaces for drawing graphs include
However, none of these support rendering Latex or Mathjax


My current approach of generating PNG images from Latex and then rendering the PNGs as a graph using d3.js seems to work sufficiently well. I could modify the interface such that the existing webform is on the same page as the rendered graph.

The user would select an inference rule, provide the expressions, and render the graph all in a single page.
What isn't clear in my mental model is how to connect edges to new steps. 

where to invest my efforts; prioritization depends on objective

Options on where to spend my attention on this project include

  • semantic decoration of text
  • trying to figure out how semantic decoration connects to the underlying inference rules
  • generating PDG Latex content using existing infrastructure
  • verifying existing PDG content by converting to SymPy
  • expanding the SymPy Latex grammar for better parsing and conversion
  • improving the front-end presentation
  • improving the data input interface
  • changing the backend database from JSON + SQL to something cleaner
I'm not completely clear on what my motives for the PDG are. 
  • I find the work interesting
  • I think other people might also be interested
  • I'm not clear what specific value the project provides (other than attempting to document the beauty of the math and Physics) 
I'm not sure whether I'm doing this for myself, for my legacy, or for other people.

My own lack of clarity translates into not being certain about where to invest my attention.
Options like more content, more integration with other aspects, better code all pull in different directions.
I think that if I better understood my own intentions I would have the ability to more easily prioritize.

Friday, July 31, 2020

conferences on formalization and cataloging math and physics

https://imkt.org/activities/events/ - International Mathematical Knowledge Trust


https://cicm-conference.org/cicm.php - Conference on Intelligent Computer Mathematics

  • CICM 2021 will be held in Timisoara, Romania during the week of July 26-30, 2021
  • CICM 2022 will be held in Tbilisi, Georgia sometime in the second half of September 2022 as a component of the Computational Logic Autumn Summit (CLAS 2022) taking place in Tbilisi.

online digital math libraries and Physics libraries

Mathematics

There are multiple formal math databases:
In addition to formal math databases, there are digital math libraries
These digital math libraries are relevant for searching and parsing content.


"How to Build a Global Digital Mathematics Library" by S Watt
https://cs.uwaterloo.ca/~smwatt/pub/reprints/2016-synasc-gdml.pdf

Physics


Teaching resource: https://www.compadre.org/