Sunday, May 16, 2021

what would create a tipping point

Every scientist coming to the website is unlikely. 

Extracting value from staring at a visualization of the graph of equations is unlikely. 

The local value to both the author and the reader is in determining 

  • is the mathematical content of the paper I'm reading self-consistent?
  • are the expressions consistent in units?
  • are the variables clearly defined?
  • are the operators well defined?
In a larger context, the relevant value question is 
  • how does the paper I'm currently reading (or writing) relate to other papers?
  • how does the paper I'm currently reading (or writing) build on previous work?
Rather than bibliographic citation, I care about mathematical provenance. 

The scientific community currently resorts to bibliographic citation because that is the only provenance available, not because it is what matters or what we value. 

As an author, I want to write Latex that generates a document that is mathematically correct. Mathematical typos should be detected (similar to spell-check). 

Options for implementation:
Overleaf is opensource:, so modifying it could be an option

Sunday, December 27, 2020

ordered list representation in RDF

The Physics Derivation Graph depends on a data structure capable of using ordered lists. RDF's support for ordered lists is slightly convoluted. The best visualization of ordered lists in RDF I've found is

I tried sketching how the "linked recursive lists" approach looks for the Physics Derivation Graph for a derivation that has a sequence of steps, and each step has an ordered list of inputs, feeds, and outputs.


Sunday, December 13, 2020

identifying classes in the Physics Derivation Graph for OWL (Web Ontology Language)

Classes and subclasses of entities in the Physics Derivation Graph:

  • derivations = an ordered set of two or more steps
  • steps = a set of one or more statements related by an inference rule
  • inference rule = identifies the relation of a set of one or more statements
  • statement = two or more expressions (LHS and RHS) and a relational operator
    • expressions = an ordered set of symbols
    • symbols = a token
      • operator = applies to one or more values (aka operands). Property: number of expected values
      • value. Property: categorized as "variable" xor "constant"
        • integer = one or more digits. The set of digits depends on the base
        • float
        • complex
      • unit. Examples: "m" for meter, "kg" for kilogram
Some aspects of expressions and derivations I don't have names for yet:
  • binary operators {"where", "for all", "when", "for"} used two relate two expressions, the "primary expression" on the left and one or more "scope"/"definition"/"constraint" (equation/inequality)

Some aspects of expressions and derivations I don't need to label in the PDG:
  • terms = parts of the expression that are connected with addition and subtraction
  • factors = parts of the expression that are connected by multiplication
  • coefficients = a number that is multiplied by a variable in a mathematical expression.
  • power, base, exponent
  • base (as in decimal vs hexadecimal, etc)
  • formula
  • function

An equation is two expressions linked with an equal sign. 
What is the superclass above "equation" and "inequality"?
So far I'm settling on "statement".

I am intentionally staying out of the realm of {proofs, theorems, axioms} both because that is outside the scope of the Physics Derivation Graph and because the topic is already addressed by OMDoc. 

Suppose we have a statement like
y = x^2 + b where x = {5, 3, 1}
In that statement, 
  • "y = x^2 + b" is an equation
  • "x^2 + b" is an expression and is related to the expression "y" by equality. 
  • "x^2" is a term in the RHS expression
  • "x = {5, 3, 1}" is an equation that provides scope for the primary equation. 
What is the "where" relation in the statement? The "where" is a binary operator that relates two equations. There are other "statement operators" to relate equations, like "for all"; see the statement
a + c = 2*g + k for all g \in \Re
In that statement, "g \in \Re" is (an equation?) serving as a scope for the primary equation. 

All statements have supplemental scope/definition equations that are usually left as implicit. The reader is expected to deduce the scope of the statement from the surrounding context. 

The supplemental scope/definition equations describe both per-variable and inter-variable constraints. For example,
x*y + 3 = 94 where ((x \in \Re) AND (y \in \Re) AND (x<y))

More complicated statement:
f(x) = { 0 for x<0
       { 1 for 0<=x<=1
       { 0 for x>1
Here the LHS is a function and the RHS is an integer, but the value of the integer depends on x. 
Note that the "0<=x<=1" can be separated into "0<=x AND x<=1". Expanding this even more,
(f(x) = 0 for x<0) AND (f(x) = 1 for (0<=x AND x<=1)) AND (f(x) = 0 for x>1)

Saturday, December 12, 2020

an argument in support of RDF instead of property graphs

I've wrestled with whether to use Property Graphs to store and query the Physics Derivation Graph. I see potential value, but the licensing of Neo4j keeps me from committing. I'm aware of other implementations, but I don't have confidence about either their stability or durability.

This post makes a convincing argument about both the short-comings of a property-graph-based knowledge graph and the value of an RDF-based storage method. To summarize,

  • don't be distracted by visualization capabilities; inference is more important
  • property graph IDs are local, whereas identifiers in RDF are global. 
  • Global IDs are vital for enabling federation, merge, diff

I know OWL (Web Ontology Language) is popular for knowledge representation, and this post was the first to provide a clear breakdown of the difference between property graphs, RDF, and OWL. OWL supports

  • the ability infer that a node that is a member of a class is also a member of any of its superclasses
  • properties can have superproperties
OWL overview:

Saturday, November 21, 2020

log analysis of nginx access using Python Pandas

My first step is to review logins on the site,

My previous post on reviewing logs
was written prior to the current nginx format I'm using.

I haven't gotten around to a deeper analysis like

First I had to install supporting software

  sudo apt install python3-pip
  pip3 install pandas

Inline Python in bash with Pandas is possible because every line is formatted like a Python dictionary. Here I want to review what columns are present in the logs

cat nginx_access.log | python3 -c "import sys
import pandas
pandas.options.display.max_rows = 999 #
list_of_lines = []
for line in sys.stdin:
df = pandas.DataFrame(list_of_lines)
How many of each entry for a few columns?
cat nginx_access.log | python3 -c "import sys
import pandas
pandas.options.display.max_rows = 999 #
list_of_lines = []
for line in sys.stdin:
df = pandas.DataFrame(list_of_lines)
threshold = 20
vc = df['user'].value_counts()
vc = df['ip'].value_counts()
vc = df['req'].value_counts()
For IPs that have made multiple (e.g., 30) requests, what pages have been accessed?
cat nginx_access.log | python3 -c "import sys
import pandas
pandas.options.display.max_rows = 999 #
list_of_lines = []
for line in sys.stdin:
df = pandas.DataFrame(list_of_lines)
threshold = 30
vc = df['ip'].value_counts()
for ip, number_of_requests in vc[vc>threshold].items():
    print('\nIP = ',ip, 'made',number_of_requests,'requests')
    df_this_ip = df[df['ip']==ip]
    #for request in df_this_ip['req'].values:
    #    print(request)

Sunday, October 11, 2020

upgrading Ubuntu 18.04 to 20.04 on DigitalOcean VPS droplet

I've been running a DigitalOcean droplet for $5/month for the past 6 months. Because I was new and didn't know better, I selected the Ubuntu 18.04 droplet. 

Now I want to update to Ubuntu 20.04 LTS. 

The guide recommends starting with a fresh 20.04 image instead of upgrading. 

The following is a record of the steps I took in this process. 

Total duration: 2 hours. The process took longer than expected because I hadn't previously configured the website from a bare Ubuntu server. Also, I had made a few changes since the initial installation that weren't documented.

Step 1: collect all data prior to turning off the server

Used scp to copy data from the droplet to my mac

scp user@IP:/home/pdg/arxiv_rss/ .
scp user@IP:/home/pdg/arxiv_rss/.env .
scp user@IP:/home/pdg/videos/* .
scp user@IP:/home/pdg/.bash_history .
scp user@IP:/home/pdg/.bashrc .
scp user@IP:/home/pdg/.python_history .
scp user@IP:/home/pdg/.sqlite_history .
cd proofofconcept/v7_pickle_web_interface/
scp user@IP:/home/pdg/proofofconcept/v7_pickle_web_interface/.env .
scp user@IP:/home/pdg/proofofconcept/v7_pickle_web_interface/certs/* .
scp user@IP:/home/pdg/proofofconcept/v7_pickle_web_interface/flask/logs/* .
scp user@IP:/home/pdg/.ssh/authorized_keys .

Grab the crontab entry

0 0 * * * /usr/bin/python3 /home/user/arxiv_rss/ >> /home/user/arxiv_rss/cron.log 2>&1

Step 2: power off the server and take a snapshot

Step 3: Start a new droplet

Selected Ubuntu 20.04

Step 4: configure accounts and access

adduser pdg
usermod -aG sudo pdg

ufw allow OpenSSH
ufw enable

Instead of creating new SSH key pairs, 
I imported my authorized_keys file to /home/pdg/.ssh/

To get the authorized_keys file I temporarily allowed password-based authentication for scp using
sudo vim /etc/ssh/sshd_config
change "PasswordAuthentication No" to "PasswordAuthentication Yes"
sudo service ssh restart
While I was there, I also ran
change "PermitRootLogin yes" to "permitRootLogin no"
Once I had transferred the authorized_keys file, I reverted to "PasswordAuthentication No" and ran
sudo service ssh restart

sudo ufw allow 443
sudo ufw allow 80

Step 5: update OS

sudo apt-get update
sudo apt-get upgrade

Step 6: install metrics

sudo apt-get purge do-agent
curl -sSL -o /tmp/
sudo bash /tmp/
/opt/digitalocean/bin/do-agent --version

Step 7: install Docker and Docker-Compose

Step 8: certs

sudo apt install certbot python3-certbot-nginx
sudo certbot certonly --webroot \
     -w /home/pdg/proofofconcept/v7_pickle_web_interface/certs \
     --server \
     -d -d

Your certificate and chain have been saved at:
   /etc/letsencrypt/live/   Your key file has been saved at:   /etc/letsencrypt/live/   Your cert will expire on 2021-01-09.
cd /etc/ssl/certs
sudo openssl dhparam -out dhparam.pem 4096
cp dhparam.pem ~/proofofconcept/v7_pickle_web_interface/certs/

Step 9: restore data from backup

git clone
scp .env user@IP:/home/pdg/proofofconcept/v7_pickle_web_interface/
cd proofofconcept/v7_pickle_web_interface/flask
cp users_sqlite.db_TEMPLATE users_sqlite.db
cd ..
docker-compose up --build --remove-orphans --detach

Sunday, September 20, 2020

use the inputs and inference rule to generate the output

Instead of expecting the user to provide the inputs and outputs and inference rule, supplying the inputs and inference rule is sufficient to generate the output. This output is necessarily consistent with the inputs and inference rule.

>>> from sympy import *

Define an inference rule

def mult_both_sides_by(expr, feed):
    return Equality(expr.lhs*feed, expr.rhs*feed, evaluate=False)
>>> expr = parse_latex('a = b')
>>> feed = parse_latex('f')
>>> mult_both_sides_by(expr, feed)
Eq(a*f, b*f)

This generalizes to include the relation

def mult_both_sides_by(expr, feed, relation):
    return relation(expr.lhs*feed, expr.rhs*feed, evaluate=False)
>>> mult_both_sides_by(expr, feed, Equality)
Eq(a*f, b*f)

Other relations are available; see
>>> mult_both_sides_by(expr, feed, Le)
a*f <= b*f