Saturday, September 28, 2024

Notes from Talks at Google episode 485: A Philosophy of Software Design by John Ousterhout

This podcast is based on the book https://www.amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201/ and is available as a video, https://www.youtube.com/watch?v=bmSAYlu0NcY


claim: Primary design principle is decomposition. 

The point of decomposition is to make future work easier.


principle: "classes should be deep."
Deep classes enable information hiding.
The interface to the class is the cost; the functionality of the class is the benefit.
Here "interface" includes signatures, side effects, and dependencies.
The goal of abstraction is to maximize the ratio of functionality to interface.

Sometimes the cost of an interface can exceed the benefit of the functionality. For example, the number of characters for calling the interface may exceed the character count of enacting the function.

Smaller classes are easier to grok, but proliferation of classes increases the interfaces (and thus cost).
"Classes are good" does not mean "more classes are better."


principle: "the common case should be simple"

principle: exceptions should be defined out of existence.


Tactical approach is to get some code working, accepting shortcuts to get the result. These shortcuts compound. Complexity isn't one mistake; complexity is hundreds or thousands of mistakes over a long duration. Because complexity doesn't happen all at once, the incremental mistakes are hard to notice. 
Observation: Software developers who are "tactical tornados" are rewarded by management because they appear productive in the short term. There's no recognition of the technical debt being created.
Observation: "what's the smallest change I can make?" is a local optimization that often harms the global optimum.

The strategic approach is have the goal be "great design." Not just at the beginning, but with every change to code. This is slower compared to tactical approach but enables future productivity. 

claim: The crossover in ROI might be something like 6 months for a project. Tactical can get results faster initially, but after 6 months making small changes gets more expensive than having used a strategic approach.

mentally untangling the spaghetti code

I recently started exploring how to migrate the Physics Derivation Graph website from the JSON-based backend (v7) to Neo4j (v8). For v7 I have a working JSON-based backend+frontend, and for v8 I have a working Neo4j-backend with minimal frontend (e.g., no nginx, no authentication, no documentation pages).

I'm aware of two ways to carry out the migration:

  • hard cut-over: port all existing front-end capabilities (nginx, authentication, documentation) and backend capabilities {render d3js, render tex, render pdf, render graphviz} to v8, then switch public site from v7 to v8
  • slow transition: add Neo4j to existing v7 JSON-based site and then incrementally transition site features like {render d3js, render tex, render pdf, render graphviz}

The "slow transition" has the advantage of not having to port the front-end capabilities. The downside is that the two codebases (v7, v8) exist concurrently and I have to refactor each backend feature on the live site.

I don't have a good sense for how much work porting front-end capabilities (the hard cut-over option) involves. 


In the process of trying the "slow transition" I am getting lost in the code dependencies and which complexity is essential versus accidental. Refactoring requires the ability to distinguish essential versus accidental complexity. 

The usual way to guide refactoring is to rely on tests of features. If I change some code and a feature breaks, then the essential complexity wasn't met. If I change something and all tests pass, then the assumption is that my change reduced accidental complexity. (This also relies on full test coverage and the assumption that tests are correlated with necessary capabilities.)

A different way to distinguish essential versus accidental complexity is to enumerate roles, use cases per role, and the requirements of each use case. The requirements are the essential complexity. Then the code I write is in response to a specific requirement. 

In practice, as a developer my natural instinct is to enact the minimum amount of change based on my local perspective. To break out of this local minimum I need the context of tracing the roles to use cases to requirements. For each change to the software! That's asking a lot.

Writing out all the roles, uses cases, and requirements turns out to be pretty messy. This corresponds with the amount of code needed to support the Physics Derivation Graph.

DreamPuf

digraph G {

  rankdir="LR";
  
  // see https://graphviz.org/docs/attrs/splines/
  //splines=false;
  //splines=curved;
  //splines=polyline;
  splines=ortho;

  unauthenticated_user [label="unauthenticated user", shape="rectangle"];
  authenticated_user [label="authenticated user", shape="rectangle"];
  
  log_in [label="authenticate" style=filled color=lightgrey];
  unauthenticated_user -> log_in -> authenticated_user;
  auth_google [label="authenticate using Google"];
  log_in -> auth_google;
  
  log_out [label="log out" style=filled color=lightgrey];
  authenticated_user -> log_out -> unauthenticated_user;
  
  export_database [label="export database" style=filled color=lightgrey];
  unauthenticated_user -> export_database;
  authenticated_user -> export_database;
  
  export_database_json_webUI [label="web UI: link to database JSON" style=filled color=lightblue];
  export_database -> export_database_json_webUI;
  export_database_cypher_webUI [label="web UI: link to database Cypher" style=filled color=lightblue];
  export_database -> export_database_cypher_webUI;
  
  export_database_json_rest [label="REST API: export database as JSON" style=filled color=lightpink];
  export_database -> export_database_json_rest;
  export_database_cypher_rest [label="REST API: export database as Cypher" style=filled color=lightpink];
  export_database -> export_database_cypher_rest;
  
  export_database_json_backend [label="backend: export database as JSON"];
  export_database_json_rest -> export_database_json_backend;
  export_database_json_webUI -> export_database_json_backend;
  
  export_database_cypher_backend [label="backend: export database as Cypher"];
  export_database_cypher_webUI -> export_database_cypher_backend;
  export_database_cypher_rest -> export_database_cypher_backend;
  
  import_database_as_cypher [label="import database as Cypher" style=filled color=lightgrey];
  authenticated_user -> import_database_as_cypher;

  upload_database_as_cypher_webUI [label="web UI: upload database as Cypher" style=filled color=lightblue];
  import_database_as_cypher -> upload_database_as_cypher_webUI;
  
  upload_database_as_cypher_backend [label="backend: upload database as Cypher"];
  upload_database_as_cypher_webUI -> upload_database_as_cypher_backend;
  
  read_FAQ [label="read FAQ" style=filled color=lightgrey];
  unauthenticated_user -> read_FAQ;
  authenticated_user -> read_FAQ;
  render_FAQ [label="web UI: render FAQ" style=filled color=lightblue];
  read_FAQ -> render_FAQ;
  
  query_graph [label="query graph" style=filled color=lightgrey];
  unauthenticated_user -> query_graph;
  authenticated_user -> query_graph;
  
  query_graph_webUI [label="web UI: query graph" style=filled color=lightblue];
  query_graph -> query_graph_webUI;

  query_graph_rest [label="REST API: query graph" style=filled color=lightpink];
  query_graph -> query_graph_rest;
  
  query_graph_backend [label="backend: query graph"];
  query_graph_webUI -> query_graph_backend;
  query_graph_rest -> query_graph_backend;
  
  enter_new_operator [label="enter new operator" style=filled color=lightgrey];
  authenticated_user -> enter_new_operator;
  
  create_new_operator_webUI [label="web UI: create new operator" style=filled color=lightblue];
  enter_new_operator -> create_new_operator_webUI;
  
  enter_new_expression [label="enter new expression" style=filled color=lightgrey];
  authenticated_user -> enter_new_expression;

  create_new_expression_webUI [label="web UI: create new expression" style=filled color=lightblue];
  enter_new_expression -> create_new_expression_webUI;

  add_new_expression [label="backend: add new expression"];
  create_new_expression_webUI -> add_new_expression;
  
  verify_expr_dimensional_consistency [label="backend: verify expression dimensional consistency"];
  create_new_expression_webUI -> verify_expr_dimensional_consistency;

  enter_new_derivation [label="enter new derivation" style=filled color=lightgrey];
  authenticated_user -> enter_new_derivation;
  
  create_new_derivation_webUI [label="web UI: create new derivation" style=filled color=lightblue];
  enter_new_derivation -> create_new_derivation_webUI;

  add_new_derivation [label="backend: add new derivation"];
  create_new_derivation_webUI -> add_new_derivation;

  enter_new_symbol [label="enter new symbol" style=filled color=lightgrey];
  authenticated_user -> enter_new_symbol;
  
  create_new_symbol_webUI [label="web UI: create new symbol" style=filled color=lightblue];
  enter_new_symbol -> create_new_symbol_webUI;

  add_new_symbol [label="backend: add new symbol"];
  create_new_symbol_webUI -> add_new_symbol;

  read_list_of_derivations [label="read list of derivations" style=filled color=lightgrey];
  unauthenticated_user -> read_list_of_derivations;
  authenticated_user -> read_list_of_derivations;

  list_of_derivations_html [label="web UI: render list of derivations" style=filled color=lightblue];
  read_list_of_derivations -> list_of_derivations_html;

  list_of_derivations_json [label="REST API: JSON list of derivations" style=filled color=lightpink];
  read_list_of_derivations -> list_of_derivations_json;

  list_of_derivations [label="backend: get list of derivations"];
  list_of_derivations_json -> list_of_derivations;
  list_of_derivations_html -> list_of_derivations;

  read_list_of_symbols [label="read list of symbols" style=filled color=lightgrey];
  unauthenticated_user -> read_list_of_symbols;
  authenticated_user -> read_list_of_symbols;

  list_of_symbols_html [label="web UI: show list of symbols" style=filled color=lightblue];
  read_list_of_symbols -> list_of_symbols_html;

  list_of_symbols_json [label="REST API: JSON list of symbols" style=filled color=lightpink];
  read_list_of_symbols -> list_of_symbols_json;

  list_of_symbols [label="backend: get list of symbols"]
  list_of_symbols_html -> list_of_symbols;
  list_of_symbols_json -> list_of_symbols;

  read_list_of_expressions [label="read list of expressions" style=filled color=lightgrey];
  unauthenticated_user -> read_list_of_expressions;
  authenticated_user -> read_list_of_expressions;

  list_of_expressions_html [label="web UI: show list of expressions" style=filled color=lightblue];
  read_list_of_expressions -> list_of_expressions_html;
  list_of_expressions_html -> verify_expr_dimensional_consistency;

  list_of_expressions_json [label="REST API: JSON list of expressions" style=filled color=lightpink];
  read_list_of_expressions -> list_of_expressions_json;

  list_of_expressions [label="backend: get list of expressions"];
  list_of_expressions_json -> list_of_expressions;
  list_of_expressions_html -> list_of_expressions;

  read_derivation [label="read derivation" style=filled color=lightgrey];
  unauthenticated_user -> read_derivation;
  authenticated_user -> read_derivation;
  
  derivation_steps_json [label="REST API: JSON list of steps" style=filled color=lightpink];
  read_derivation -> derivation_steps_json;
  derivation_table [label="web UI: show table of steps" style=filled color=lightblue];
  read_derivation -> derivation_table;
  derivation_table -> verify_expr_dimensional_consistency;
  
  verify_step_consistency [label="backend: verify step consistency"];
  derivation_table -> verify_step_consistency;
  
  list_of_steps [label="backend: get list of steps"];
  derivation_steps_json -> list_of_steps;
  derivation_table -> list_of_steps;
  
  derivation_tex [label="web UI: link to derivation .tex file" style=filled color=lightblue];
  read_derivation -> derivation_tex;
  derivation_pdf [label="web UI: link to derivation .pdf file" style=filled color=lightblue];
  read_derivation -> derivation_pdf;
  derivation_d3js [label="web UI: show derivation d3js" style=filled color=lightblue];
  read_derivation -> derivation_d3js;
  derivation_graphviz_png [label="web UI: show derivation graphviz PNG" style=filled color=lightblue];
  read_derivation -> derivation_graphviz_png;

  derivation_d3js -> list_of_steps;
  derivation_graphviz_png  -> list_of_steps;
  
  get_expression_for_step [label="backend: get expressions per step"];
  derivation_d3js -> get_expression_for_step;
  derivation_table -> get_expression_for_step;
  get_infrule_for_step [label="backend: get infrule per step"];
  derivation_graphviz_png -> get_infrule_for_step;
  
  generate_png_for_expression [label="backend: generate png for expression"];
  derivation_d3js -> generate_png_for_expression;
  derivation_graphviz_png -> generate_png_for_expression;
  generate_png_for_infrule [label="backend: generate png for infrule"];
  derivation_d3js -> generate_png_for_infrule;
  derivation_graphviz_png -> generate_png_for_infrule;
  
  generate_derivation_tex [label="backend: generate tex for derivation"];
  derivation_tex -> generate_derivation_tex;
  generate_derivation_tex -> list_of_steps;
  generate_derivation_tex -> get_infrule_for_step;
  generate_derivation_tex -> get_expression_for_step;
  
  generate_derivation_pdf_from_tex [label="backend: generate derivation PDF from .tex"];
  derivation_pdf -> generate_derivation_pdf_from_tex -> generate_derivation_tex;
  
  generate_pdf_from_tex [label="backend: generate PDF from .tex"];
  generate_derivation_pdf_from_tex -> generate_pdf_from_tex;
}

Monday, September 2, 2024

new droplet for Ubuntu 24 LTS

My upgrade from Ubuntu 20 to 22 resulted in an inability to SSH to the server; see https://physicsderivationgraph.blogspot.com/2024/09/unable-to-ssh-into-vps-after-upgrade-of.html
I decided to get a new droplet and start from scratch with Ubuntu 24 LTS. Logged in as root via the web console,
adduser pdg
usermod -aG sudo pdg
ufw allow OpenSSH
ufw enable
Then I was able to SSH from my laptop to the VPS as "pdg"
sudo ufw allow 443
sudo ufw allow 80

edit ~/.bashrc to include

alias ..='cd ..'

Install Docker

sudo apt update
sudo apt upgrade
sudo apt install apt-transport-https curl
sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl is-active docker
as per https://linuxiac.com/how-to-install-docker-on-ubuntu-24-04-lts/

To clone the github repos over SSH requires keys.

ssh-keygen
and upload the newly generated public key to https://github.com/settings/keys

Ubuntu 24 LTS doesn't come with make by default, so

sudo apt install make

By default the user pdg can't launch Docker, so

sudo usermod -a -G docker $USER
  newgrp docker
as per https://stackoverflow.com/a/48450294

Certs need to be loaded; see https://physicsderivationgraph.blogspot.com/2021/10/periodic-renewal-of-https-letsencrypt.html

sudo apt -y install certbot

Certbot requires a running webservice on port 80 to create new certificates.



Reference:
https://physicsderivationgraph.blogspot.com/2020/10/upgrading-ubuntu-1804-to-2004-on.html

unable to SSH into VPS after upgrade of Ubuntu from 20 to 22 LTS

Prior to upgrading from Ubuntu 20 to 22 LTS I was able to SSH from my local laptop to a remote VPS using the command

ssh -v username@IPaddress

After the upgrade I got

ssh -v username@IPaddress
OpenSSH_9.7p1, LibreSSL 3.3.6
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files
debug1: /etc/ssh/ssh_config line 54: Applying options for *
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug1: Connecting to IPaddress [IPaddress] port 22.
debug1: connect to address IPaddress port 22: Operation timed out
ssh: connect to host IPaddress port 22: Operation timed out

Sunday, May 19, 2024

notes on status and near-term plans

status updates after ~30 hours of work over the course of 1 weekend.

I'm happy with the rewrite of the back-end since it makes the design more robust and flexible. 

I've learned enough Cypher to feel comfortable continuing with this path. I'm also somewhat confident (with no basis in experience) that switching to a different property graph is feasible without having to rewrite the front-end and controller).


Developer workflow

My "how do I develop the code" steps have evolved to include Black and mypy. 

After making changes to the code I format using Black and then use type hinting:
make black_out
docker run --rm -v`pwd`:/scratch --entrypoint='' -w /scratch/ property_graph_webserver mypy --check-untyped-defs webserver/app.py
To launch the web UI,
make black_out; docker ps | grep property | cut -d' ' -f1 | xargs docker kill; date; make up

I added Selenium tests to speed up the UI validation.


Near-term plans

There's a lot to do, so in roughly priority order (from most important near-term to will-get-to-later)
  1. Convert Latex expressions to Sympy
  2. Check the dimensionality and consistency of expressions using sympy
  3. Provide feedback to user on invalid inputs using dimensionality checks and symbol consistency
  4. Check steps using SymPy. Currently in the v7 code base the file "validate_steps_sympy.py" takes Latex as input, converts to SymPy, then validates step. I think the step validation should be done using pure SymPy? (rather than taking Latex as input).

Lean: Explorer how to include lean derivations per step. Can I get lean to run on my computer?

Add API support to enable curl interactions.
This would improve command line testability of workflows

Write more selenium tests.
This assumes the web UI is stable.

Make the HTML tables sortable (as already exists on https://derivationmap.net/)

Support rendering latex in HTML (as already exists on https://derivationmap.net/)

Migrating existing content into the new back end

Convert latex strings into PNG files for visualization (as already exists on https://derivationmap.net/)

Render derivations as PDF (as already exists on https://derivationmap.net/)

Saturday, May 18, 2024

distinguishing scalars, vectors, and matrices as operators or symbols

A Physics derivation has steps and expressions. Expressions are composed of symbols and operations. 

Symbols (e.g., x) can be constant or variable, real or complex. Symbols do not have arguments.

Operations are distinct from symbols because they require one or more symbols to operate upon. For example, +, ^, determinant(), etc.


Within the concept of symbols there are distinct categories: scalar, vector, matrix. The distinction is that they have different properties. A vector has a dimension, a matrix has 2 dimensions. 

Since vectors can contain variables (e.g., \vec{a} = [x,y,z] ) and matrices can contain scalars, then we need a new boolean property for symbols: is_composite, and a new numeric property for symbols: dimensions. 

When "dimension"=0, the symbol is a scalar. Then is_composite is false.
When "dimension"=1, the symbol is a vector. Then is_composite is either false or true. If true then the symbol has two or more edges with other (scalar) symbols.
When "dimension"=2, the symbol is a matrix. Then is_composite is either false or true. If true then the symbol has four or more edges with other (scalar) symbols.


However, this accommodation isn't sufficient. We can say "matrix A = matrix B" and consider the matrices as symbols. However, a quantum operator (e.g., bit flip) is a matrix that operates on a state -- an argument is required. There is an equivalence to the Hamiltonian ("H") and the matrix representation. Therefore the matrix is not a symbol as previously defined.


While the expression "matrix A = matrix B" is valid, " + = + " is not. The difference is that "+" is not a composite symbol.

"H = matrix B" is a valid expression even though "H" is an operator. 

Therefore the distinction between symbol and operators is misleading. The schema for symbols should be

  • latex
  • name_latex
  • description_latex
  • requires_arguments (boolean)
    • if requires_arguments=true (operator: +, /, ^, determinant), then 
      • argument_count (e.g., 1,2,3)
    • else requires_arguments=false (variable or constant or operator: a, x, H), then 
      • dimension (e.g., 0,1,2)
        • if dimension = 0 (aka scalar: "size=1x1" and "is_composite=false") then
          • dimension_length
          • dimension_time
          • dimension_luminosity
          • dimension_mass
        • if dimension = 1 (aka vector) then
          • is_composite = false or true
          • orientation is row xor column xor arbitrary
            • if row or column, 
              • size is arbitrary xor definite
                • if definite, "2x1" xor "1x2" xor "3x1" xor "1x3" ...
        • if dimension = 2 (aka matrix) then
          • is_composite = false or true
          • size arbitrary xor definite
            • if definite, "2x2" xor "2x3" xor "3x2" xor "3x3" xor ...

That means the major distinct subtypes of symbols are
  • symbol that requires arguments
  • symbol that does not require arguments, dimension 0
  • symbol that does not require arguments, dimension 1
  • symbol that does not require arguments, dimension 2

Saturday, March 9, 2024

Derivations, CAS, Lean, and Assumptions in Physics

Initially the Physics Derivation Graph documented expressions as Latex. Then SymPy was added to support validation of steps (is the step self-consistent) and dimensionality (is the expression self-consistent?). 

Recently I learned that Lean could be used to prove each step in a derivation. The difference between a Computer Algebra System (e.g., SymPy) and Lean is whether "a = b  --> a/b = 1" is a valid step -- it isn't when b is zero. Lean catches that; SymPy does not. 

While Lean proofs sound like the last possible refinement, there are two additional complications to account for not addressed by Lean. 

Challenge: Bounded ranges of applicability

In classical mechanics the relation between momentum, mass, and velocity is "p = m v". That hold when "v << c". Near the speed of light we need to switch to relativistic mass, 

m = m_{rest} / sqrt{1-((v^2)/(c^2))}.

The boundary between "v << c" and "v ~ c" is usually set by the context being considered. 

One response for users of Lean would be to always use the "correct" relativistic equation, even when "v << c."  A more conventional approach used by Physicists is to use

p = m v, where v << c

then drop the "v << c" clause and rely on context.


Challenge: Real versus Float versus experimental characterization

Lean forces you to characterize numbers as Real or Integer or Complex. This presents a problem for numerical simulations that have something like a 64 bit float representation.

In thermodynamics we assume the number of particles involved is sufficiently large that we focus on the behavior of the ensemble rather than individual particles. The imprecision of floats is not correct, but neither is the infinite precision assumed by Real numbers. 


Example applications of Lean proofs needing bounds on values

Math doesn't have convenient ways of indicating "finite precision, as set by the Plank scale."  The differential element used in calculus cannot actually go to zero, but we use that concept because it works at the scales we are used to. 

Physicists make simplifying assumptions that sometimes ignore reality (e.g., assuming continuous media when particles are discrete). Then again the assumption that particles are discrete is also a convenient fiction that ignores the wavefunction of quantum mechanics. 

Lean can be used to prove derivations in classical mechanics, but to be explicit about the bounds of those proofs we'd also need to indicate "v << c" and "assume space is Euclidean." 

For molecular dynamics, another constraint to account for is "temperature << 1E10 Kelvin" or whatever temperature the atoms breaks down into plasma. 

Distinguishing the context of (classical mechanics from quantum) and (classical from relativistic) and (conventional gas versus plasma) seems important so that we know when a claim proven in Lean is applicable.