Sunday, July 25, 2021

how I validate changes and deploy updates to derivationmap.net website

On my local server, I make changes to the local file and then validate using
cd ~/version_controlled/allofphysicsgraph/proofofconcept/v7_pickle_web_interface
docker-compose up --build --remove-orphans
Build time is 10 minutes when layers are not cached.

Check the result in a local web browser by visiting http://localhost If I'm happy with the content,
git add filename
git commit -m "a message"
git push
SSH to the DigitalOcean droplet VPS (virtual private server)
cd ~/proofofconcept/v7_pickle_web_interface
git pull
docker-compose up --build --force-recreate --remove-orphans --detach
Check the live page content at https://derivationmap.net/.

Sunday, July 11, 2021

roadmap for identifying mathematical variables in Latex documents

Context: I think there's a graph of mathematical expressions in Physics. That graph has math expressions as nodes and the nodes are related by inference rules. The graph spans textbooks and journal articles. The scope of this post is limited to the graph within one document. 

For determining consistency of mathematical content within one document, the first step is to identify mathematical variables and operators. In practice, variables commonly appear inline as $a$ or within an expression like

\begin{equation}
a = 2
\end{equation}

Another way variables appear in text is within \newcommand, like

\newcommand{\R}{\mathbb{R}}

or

\newcommand{\bb}[1]{\mathbb{#1}}
Other numerical systems have similar notations. 
The complex numbers \( \bb{C} \), the rational 
numbers \( \bb{Q} \) and the integer numbers \( \bb{Z} \).

source

For addressing the complexity of expanding newcommand macros, see https://stackoverflow.com/questions/1509799/how-to-replace-latex-macros-with-their-definitions-using-latex


Once variables and operators are identified, the next step would be to associate each symbol with the respective definition (e.g., wikipedia links) and dimensions (e.g., length, time, charge). 



Options

https://pypi.org/project/TexSoup/, https://texsoup.alvinwan.com/, https://github.com/alvinwan/TexSoup, https://stackoverflow.com/users/4855984/alvin-wan

https://pylatexenc.readthedocs.io/en/latest/latexwalker/, https://github.com/phfaist/pylatexenc/

Saturday, July 10, 2021

dhparam.pem necessary for nginix web server

This morning I was alerted by Wachete that the derivationmap.net website was unavailable. 

I logged into the digitalocean.com virtual private server (VPS) and used top to see that the container processes were running.

Normally the command I run to start the Docker containers is

docker-compose up --build --force-recreate --remove-orphans --detach

To troubleshoot, I ran

docker-compose up --build --force-recreate --remove-orphans

and the output was

Successfully built 0ffaac97e769
Successfully tagged v7_pickle_web_interface_nginx:latest
Recreating v7_pickle_web_interface_flask_1 ... done
Recreating v7_pickle_web_interface_nginx_1 ... done
Attaching to v7_pickle_web_interface_flask_1, v7_pickle_web_interface_nginx_1
nginx_1  | 2021/07/10 11:48:41 [emerg] 1#1: PEM_read_bio_DHparams("/certs/dhparam.pem") failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: DH PARAMETERS)
nginx_1  | nginx: [emerg] PEM_read_bio_DHparams("/certs/dhparam.pem") failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: DH PARAMETERS)
v7_pickle_web_interface_nginx_1 exited with code 1

The fix was to point nginix to the dhparam.pem file.

https://security.stackexchange.com/questions/94390/whats-the-purpose-of-dh-parameters

Sunday, May 16, 2021

what would create a tipping point

Every scientist coming to the website https://derivationmap.net/ is unlikely. 

Graph analysis 

This is what I've historically chased -- identifying the data structure, and the data input mechanism. 

Extracting value from staring at a visualization of the graph of equations is unlikely. I'm not clear what graph queries are relevant to run against the graph content.


Consistency within a document

The local value to both the author and the reader is in determining whether the mathematical content of the paper being read is self-consistent. Practically, that means

  • are the dimensions of variables with each expression consistent?
  • are the units used in each expression consistent?
  • are the variables clearly defined? Here "definition" means a tuple of (symbol, dimensions, definition)
    • are the variables used consistently throughout this paper?
  • are the operators well defined? Here "definition" means a tuple of (symbol, number of inputs, number of outputs, constraints per input, constraints per output). 
    • are the operators used consistently throughout this paper?
  • is the mathematical content consistent with the written text?
Some of these aspects are explored on https://derivationmap.net/clickable_layers

As an author, I want to write Latex that generates a document that is mathematically correct.

Mathematical typos should be detected (similar to spell-check). As I enter text (or as a post-processing phase), the computer should guess whether the content is math or non-math. If math, then prompt the author for relevant details. 

Options for implementation

Overleaf is open source: https://github.com/overleaf/overleaf, so modifying it could be an option.


Cross-document analysis

In a larger context, the relevant value questions include

  • how does the paper I'm currently reading (or writing) relate to other papers?
  • how does the paper I'm currently reading (or writing) build on previous work?

Rather than bibliographic citation, I care about mathematical provenance. 

The specific symbols may vary across papers, and the dimensions may vary (e.g., renormalizing the speed of light to 1), but definitions have to be shared. 

The scientific community currently resorts to bibliographic citation because that is the only provenance available, not because it is what matters or what we value.

The cross-document analysis is not feasible without semantic content. The current approach of unstructured text with few hyperlinks requires human readers. Addressing the intra-document consistency challenge might yield semantic markup that enables cross-document analysis.

Sunday, December 27, 2020

ordered list representation in RDF

The Physics Derivation Graph depends on a data structure capable of using ordered lists. RDF's support for ordered lists is slightly convoluted. The best visualization of ordered lists in RDF I've found is https://ontola.io/blog/ordered-data-in-rdf/

I tried sketching how the "linked recursive lists" approach looks for the Physics Derivation Graph for a derivation that has a sequence of steps, and each step has an ordered list of inputs, feeds, and outputs.



Credit: dreampuf.github.io

Sunday, December 13, 2020

identifying classes in the Physics Derivation Graph for OWL (Web Ontology Language)

Classes and subclasses of entities in the Physics Derivation Graph:

  • derivations = an ordered set of two or more steps
  • steps = a set of one or more statements related by an inference rule
  • inference rule = identifies the relation of a set of one or more statements
  • statement = two or more expressions (LHS and RHS) and a relational operator
    • expressions = an ordered set of symbols
    • symbols = a token
      • operator = applies to one or more values (aka operands). Property: number of expected values
      • value. Property: categorized as "variable" xor "constant"
        • integer = one or more digits. The set of digits depends on the base
        • float
        • complex
      • unit. Examples: "m" for meter, "kg" for kilogram
Some aspects of expressions and derivations I don't have names for yet:
  • binary operators {"where", "for all", "when", "for"} used two relate two expressions, the "primary expression" on the left and one or more "scope"/"definition"/"constraint" (equation/inequality)

Some aspects of expressions and derivations I don't need to label in the PDG:
  • terms = parts of the expression that are connected with addition and subtraction
  • factors = parts of the expression that are connected by multiplication
  • coefficients = a number that is multiplied by a variable in a mathematical expression.
  • power, base, exponent
  • base (as in decimal vs hexadecimal, etc)
  • formula
  • function

An equation is two expressions linked with an equal sign. 
What is the superclass above "equation" and "inequality"?
So far I'm settling on "statement".

I am intentionally staying out of the realm of {proofs, theorems, axioms} both because that is outside the scope of the Physics Derivation Graph and because the topic is already addressed by OMDoc. 

Suppose we have a statement like
y = x^2 + b where x = {5, 3, 1}
In that statement, 
  • "y = x^2 + b" is an equation
  • "x^2 + b" is an expression and is related to the expression "y" by equality. 
  • "x^2" is a term in the RHS expression
  • "x = {5, 3, 1}" is an equation that provides scope for the primary equation. 
What is the "where" relation in the statement? The "where" is a binary operator that relates two equations. There are other "statement operators" to relate equations, like "for all"; see the statement
a + c = 2*g + k for all g \in \Re
In that statement, "g \in \Re" is (an equation?) serving as a scope for the primary equation. 

All statements have supplemental scope/definition equations that are usually left as implicit. The reader is expected to deduce the scope of the statement from the surrounding context. 

The supplemental scope/definition equations describe both per-variable and inter-variable constraints. For example,
x*y + 3 = 94 where ((x \in \Re) AND (y \in \Re) AND (x<y))

More complicated statement:
f(x) = { 0 for x<0
       { 1 for 0<=x<=1
       { 0 for x>1
Here the LHS is a function and the RHS is an integer, but the value of the integer depends on x. 
Note that the "0<=x<=1" can be separated into "0<=x AND x<=1". Expanding this even more,
(f(x) = 0 for x<0) AND (f(x) = 1 for (0<=x AND x<=1)) AND (f(x) = 0 for x>1)

Saturday, December 12, 2020

an argument in support of RDF instead of property graphs

I've wrestled with whether to use Property Graphs to store and query the Physics Derivation Graph. I see potential value, but the licensing of Neo4j keeps me from committing. I'm aware of other implementations, but I don't have confidence about either their stability or durability.

This post makes a convincing argument about both the short-comings of a property-graph-based knowledge graph and the value of an RDF-based storage method. To summarize,

  • don't be distracted by visualization capabilities; inference is more important
  • property graph IDs are local, whereas identifiers in RDF are global. 
  • Global IDs are vital for enabling federation, merge, diff

I know OWL (Web Ontology Language) is popular for knowledge representation, and this post was the first to provide a clear breakdown of the difference between property graphs, RDF, and OWL. OWL supports

  • the ability infer that a node that is a member of a class is also a member of any of its superclasses
  • properties can have superproperties
OWL overview:
  • https://www.cambridgesemantics.com/blog/semantic-university/learn-rdf/
  • https://www.cambridgesemantics.com/blog/semantic-university/learn-owl-rdfs/owl-101/
  • https://www.cambridgesemantics.com/blog/semantic-university/learn-owl-rdfs/