Sunday, December 15, 2024

Characterizing complexity of software and social systems

BLUF: Definitions of distance and modularity enable characterization of complexity in technical and social systems. However, I am unable to find a simple quantification of complexity or rank complexity.

This post is a mixture of notes from a YouTube video (linked at the bottom of this post) plus some analysis. 


Characterizing Complexity

The scope of this post covers the both software complexity and social complexity. 

Software complexity is characterized by

  • how many layers of abstraction: namespace/package, classes, functions?
  • the number of modules at each level of abstraction: how many of each for namespaces, classes per namespace, functions per class?
  • number of interfaces per module at each level of abstraction
  • the distance between interdependent modules
  • the amount of information transmitted between modules

Social complexity is characterized by

  • how many layers in the hierarchy? 
  • the number of modules at each level of hierarchy: how many of each for organizations, teams-per-organization, and people-per-team
  • the number of interfaces at each level of hierarchy. A person talks to how many other people? A team coordinates with how many other teams?
  • the distance between interdependent modules. When two teams have to coordinate, are they in the same parent organization? Or separate organizations?
  • the amount of information transmitted between modules. A short email, a long telephone call, or multiple in-person meetings?

[Claim] When refactoring a system (whether software redesign or re-oganization of people roles and titles) that lacks redundancies, the complexity of the software+social is conserved if the functionality remains the same. Here conservation means productivity and efficiency are unchanged.

Even with the characterizations above and the definitions below I don't see how to quantify complexity in software-social systems as a scalar number or as a vector. I don't see how to rank complexity to enable comparison of different systems, or how the complexity of a system changes over time.


Definitions

In the characterizations above I used words that need explanation.

Definition of System

System: "an interconnected set of elements that is clearly organized in a way that achieves something"
source: "Thinking in Systems" by D. Meadows

In other words, a set of components + interactions that accomplish a goal. This applies to software, hardware, and social settings.


Definition of Distance within a System

Distance within a system of software is characterized by
  • services, which are decomposed into
  • namespace or package, which are decomposed into
  • class, which are decomposed into
  • methods or functions, which are decomposed into
  • lines
Distance in hierarchical social systems is characterized by
  • organization
  • team
  • person

Definition of Modularity

Modularity is defined as having these characteristics
  • enacts self-contained capability
  • capability can be called by other modules
  • capability can operate independently
source: "Composite/structured design" by G. Myers

modules + interactions accomplish future goals (aka flexibility)

There's a 2018 review that surveys the various types of coupling in software and the tooling capable of measuring the coupling. 

From "A Survey on Software Coupling Relations and Tools" by Fregnan et al, 2018 https://fpalomba.github.io/pdf/Journals/J16.pdf

From "A Survey on Software Coupling Relations and Tools" by Fregnan et al, 2018 https://fpalomba.github.io/pdf/Journals/J16.pdf


The tables above illustrate that there are nuances to what "modularity" means in software. Modularity can improve flexibility for re-use but incurs cognitive load of the relationships and dependencies.



Definition of Complexity

Once you write a line of software it becomes legacy code and technical debt is incurred. Similarly in social settings, as soon as one person engages with another (cooperatively or competitively) there are decisions to be made about the interaction. The best you can aim for is to be aware of both the available trade-offs and the consequences of your actions when altering the system.


Complexity is characterized by cause and effect in the https://en.wikipedia.org/wiki/Cynefin_framework in contrast with other categories:

  • Obvious = relationship between cause and effect is obvious to system participants
  • Complicated = relationship between cause and effect requires analysis
  • Complex = relationship between cause and effect is clear only in hindsight
  • Chaotic = no relationship between cause and effect

When a software developer is exploring a software that is novel to them, the code can seem complex or complicated. After they have internalized the design and the reasoning for the design the same code base can seem obvious. Complexity is not a static characterization.

Similarly for interactions of people, a person new to an organization may initially feel overwhelmed, but after forming relationships and understanding the history views the organization as obvious.

Ways to measure complexity of software

Cyclomatic Complexity - measure the control flow of software
http://www.literateprogramming.com/mccabe.pdf
https://en.wikipedia.org/wiki/Cyclomatic_complexity

Halstead complexity -- countable aspects of software
https://en.wikipedia.org/wiki/Halstead_complexity_measures


Why measure complexity

  • To find heterogenous distribution of complexity within a system (and then re-distribute)
  • To compare one system to another
  • To compare changes across time
    • historical: has the system become more complex? If yes, how?
    • forecasting: would this design change make a future system more or less complex?



Sources for the ideas in this post

This post is primarily informed by two recent sources. The first is a superb talk by Sonya Natanzon, https://www.youtube.com/watch?v=Ukv0JVDh_BI titled "Complexity & Modularity: the Yin and Yang of Socio-Technical Design". This post is a summary of that talk plus an extension of the ideas.  The second is a blog post that qualitatively ranks social distance, https://tej.as/blog/how-to-grow-professional-relationships-tjs-model

References

A Philosophy of Software Design | John Ousterhout | Talks at Google
https://www.youtube.com/watch?v=bmSAYlu0NcY


"A Survey on Software Coupling Relations and Tools" by Fregnan et al, 2018 
https://fpalomba.github.io/pdf/Journals/J16.pdf

Saturday, September 28, 2024

Notes from Talks at Google episode 485: A Philosophy of Software Design by John Ousterhout

This podcast is based on the book https://www.amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201/ and is available as a video, https://www.youtube.com/watch?v=bmSAYlu0NcY


claim: Primary design principle is decomposition. 

The point of decomposition is to make future work easier.


principle: "classes should be deep."
Deep classes enable information hiding.
The interface to the class is the cost; the functionality of the class is the benefit.
Here "interface" includes signatures, side effects, and dependencies.
The goal of abstraction is to maximize the ratio of functionality to interface.

Sometimes the cost of an interface can exceed the benefit of the functionality. For example, the number of characters for calling the interface may exceed the character count of enacting the function.

Smaller classes are easier to grok, but proliferation of classes increases the interfaces (and thus cost).
"Classes are good" does not mean "more classes are better."


principle: "the common case should be simple"

principle: exceptions should be defined out of existence.


Tactical approach is to get some code working, accepting shortcuts to get the result. These shortcuts compound. Complexity isn't one mistake; complexity is hundreds or thousands of mistakes over a long duration. Because complexity doesn't happen all at once, the incremental mistakes are hard to notice. 
Observation: Software developers who are "tactical tornados" are rewarded by management because they appear productive in the short term. There's no recognition of the technical debt being created.
Observation: "what's the smallest change I can make?" is a local optimization that often harms the global optimum.

The strategic approach is have the goal be "great design." Not just at the beginning, but with every change to code. This is slower compared to tactical approach but enables future productivity. 

claim: The crossover in ROI might be something like 6 months for a project. Tactical can get results faster initially, but after 6 months making small changes gets more expensive than having used a strategic approach.

mentally untangling the spaghetti code

I recently started exploring how to migrate the Physics Derivation Graph website from the JSON-based backend (v7) to Neo4j (v8). For v7 I have a working JSON-based backend+frontend, and for v8 I have a working Neo4j-backend with minimal frontend (e.g., no nginx, no authentication, no documentation pages).

I'm aware of two ways to carry out the migration:

  • hard cut-over: port all existing front-end capabilities (nginx, authentication, documentation) and backend capabilities {render d3js, render tex, render pdf, render graphviz} to v8, then switch public site from v7 to v8
  • slow transition: add Neo4j to existing v7 JSON-based site and then incrementally transition site features like {render d3js, render tex, render pdf, render graphviz}

The "slow transition" has the advantage of not having to port the front-end capabilities. The downside is that the two codebases (v7, v8) exist concurrently and I have to refactor each backend feature on the live site.

I don't have a good sense for how much work porting front-end capabilities (the hard cut-over option) involves. 


In the process of trying the "slow transition" I am getting lost in the code dependencies and which complexity is essential versus accidental. Refactoring requires the ability to distinguish essential versus accidental complexity. 

The usual way to guide refactoring is to rely on tests of features. If I change some code and a feature breaks, then the essential complexity wasn't met. If I change something and all tests pass, then the assumption is that my change reduced accidental complexity. (This also relies on full test coverage and the assumption that tests are correlated with necessary capabilities.)

A different way to distinguish essential versus accidental complexity is to enumerate roles, use cases per role, and the requirements of each use case. The requirements are the essential complexity. Then the code I write is in response to a specific requirement. 

In practice, as a developer my natural instinct is to enact the minimum amount of change based on my local perspective. To break out of this local minimum I need the context of tracing the roles to use cases to requirements. For each change to the software! That's asking a lot.

Writing out all the roles, uses cases, and requirements turns out to be pretty messy. This corresponds with the amount of code needed to support the Physics Derivation Graph.

DreamPuf

digraph G {

  rankdir="LR";
  
  // see https://graphviz.org/docs/attrs/splines/
  //splines=false;
  //splines=curved;
  //splines=polyline;
  splines=ortho;

  unauthenticated_user [label="unauthenticated user", shape="rectangle"];
  authenticated_user [label="authenticated user", shape="rectangle"];
  
  log_in [label="authenticate" style=filled color=lightgrey];
  unauthenticated_user -> log_in -> authenticated_user;
  auth_google [label="authenticate using Google"];
  log_in -> auth_google;
  
  log_out [label="log out" style=filled color=lightgrey];
  authenticated_user -> log_out -> unauthenticated_user;
  
  export_database [label="export database" style=filled color=lightgrey];
  unauthenticated_user -> export_database;
  authenticated_user -> export_database;
  
  export_database_json_webUI [label="web UI: link to database JSON" style=filled color=lightblue];
  export_database -> export_database_json_webUI;
  export_database_cypher_webUI [label="web UI: link to database Cypher" style=filled color=lightblue];
  export_database -> export_database_cypher_webUI;
  
  export_database_json_rest [label="REST API: export database as JSON" style=filled color=lightpink];
  export_database -> export_database_json_rest;
  export_database_cypher_rest [label="REST API: export database as Cypher" style=filled color=lightpink];
  export_database -> export_database_cypher_rest;
  
  export_database_json_backend [label="backend: export database as JSON"];
  export_database_json_rest -> export_database_json_backend;
  export_database_json_webUI -> export_database_json_backend;
  
  export_database_cypher_backend [label="backend: export database as Cypher"];
  export_database_cypher_webUI -> export_database_cypher_backend;
  export_database_cypher_rest -> export_database_cypher_backend;
  
  import_database_as_cypher [label="import database as Cypher" style=filled color=lightgrey];
  authenticated_user -> import_database_as_cypher;

  upload_database_as_cypher_webUI [label="web UI: upload database as Cypher" style=filled color=lightblue];
  import_database_as_cypher -> upload_database_as_cypher_webUI;
  
  upload_database_as_cypher_backend [label="backend: upload database as Cypher"];
  upload_database_as_cypher_webUI -> upload_database_as_cypher_backend;
  
  read_FAQ [label="read FAQ" style=filled color=lightgrey];
  unauthenticated_user -> read_FAQ;
  authenticated_user -> read_FAQ;
  render_FAQ [label="web UI: render FAQ" style=filled color=lightblue];
  read_FAQ -> render_FAQ;
  
  query_graph [label="query graph" style=filled color=lightgrey];
  unauthenticated_user -> query_graph;
  authenticated_user -> query_graph;
  
  query_graph_webUI [label="web UI: query graph" style=filled color=lightblue];
  query_graph -> query_graph_webUI;

  query_graph_rest [label="REST API: query graph" style=filled color=lightpink];
  query_graph -> query_graph_rest;
  
  query_graph_backend [label="backend: query graph"];
  query_graph_webUI -> query_graph_backend;
  query_graph_rest -> query_graph_backend;
  
  enter_new_operator [label="enter new operator" style=filled color=lightgrey];
  authenticated_user -> enter_new_operator;
  
  create_new_operator_webUI [label="web UI: create new operator" style=filled color=lightblue];
  enter_new_operator -> create_new_operator_webUI;
  
  enter_new_expression [label="enter new expression" style=filled color=lightgrey];
  authenticated_user -> enter_new_expression;

  create_new_expression_webUI [label="web UI: create new expression" style=filled color=lightblue];
  enter_new_expression -> create_new_expression_webUI;

  add_new_expression [label="backend: add new expression"];
  create_new_expression_webUI -> add_new_expression;
  
  verify_expr_dimensional_consistency [label="backend: verify expression dimensional consistency"];
  create_new_expression_webUI -> verify_expr_dimensional_consistency;

  enter_new_derivation [label="enter new derivation" style=filled color=lightgrey];
  authenticated_user -> enter_new_derivation;
  
  create_new_derivation_webUI [label="web UI: create new derivation" style=filled color=lightblue];
  enter_new_derivation -> create_new_derivation_webUI;

  add_new_derivation [label="backend: add new derivation"];
  create_new_derivation_webUI -> add_new_derivation;

  enter_new_symbol [label="enter new symbol" style=filled color=lightgrey];
  authenticated_user -> enter_new_symbol;
  
  create_new_symbol_webUI [label="web UI: create new symbol" style=filled color=lightblue];
  enter_new_symbol -> create_new_symbol_webUI;

  add_new_symbol [label="backend: add new symbol"];
  create_new_symbol_webUI -> add_new_symbol;

  read_list_of_derivations [label="read list of derivations" style=filled color=lightgrey];
  unauthenticated_user -> read_list_of_derivations;
  authenticated_user -> read_list_of_derivations;

  list_of_derivations_html [label="web UI: render list of derivations" style=filled color=lightblue];
  read_list_of_derivations -> list_of_derivations_html;

  list_of_derivations_json [label="REST API: JSON list of derivations" style=filled color=lightpink];
  read_list_of_derivations -> list_of_derivations_json;

  list_of_derivations [label="backend: get list of derivations"];
  list_of_derivations_json -> list_of_derivations;
  list_of_derivations_html -> list_of_derivations;

  read_list_of_symbols [label="read list of symbols" style=filled color=lightgrey];
  unauthenticated_user -> read_list_of_symbols;
  authenticated_user -> read_list_of_symbols;

  list_of_symbols_html [label="web UI: show list of symbols" style=filled color=lightblue];
  read_list_of_symbols -> list_of_symbols_html;

  list_of_symbols_json [label="REST API: JSON list of symbols" style=filled color=lightpink];
  read_list_of_symbols -> list_of_symbols_json;

  list_of_symbols [label="backend: get list of symbols"]
  list_of_symbols_html -> list_of_symbols;
  list_of_symbols_json -> list_of_symbols;

  read_list_of_expressions [label="read list of expressions" style=filled color=lightgrey];
  unauthenticated_user -> read_list_of_expressions;
  authenticated_user -> read_list_of_expressions;

  list_of_expressions_html [label="web UI: show list of expressions" style=filled color=lightblue];
  read_list_of_expressions -> list_of_expressions_html;
  list_of_expressions_html -> verify_expr_dimensional_consistency;

  list_of_expressions_json [label="REST API: JSON list of expressions" style=filled color=lightpink];
  read_list_of_expressions -> list_of_expressions_json;

  list_of_expressions [label="backend: get list of expressions"];
  list_of_expressions_json -> list_of_expressions;
  list_of_expressions_html -> list_of_expressions;

  read_derivation [label="read derivation" style=filled color=lightgrey];
  unauthenticated_user -> read_derivation;
  authenticated_user -> read_derivation;
  
  derivation_steps_json [label="REST API: JSON list of steps" style=filled color=lightpink];
  read_derivation -> derivation_steps_json;
  derivation_table [label="web UI: show table of steps" style=filled color=lightblue];
  read_derivation -> derivation_table;
  derivation_table -> verify_expr_dimensional_consistency;
  
  verify_step_consistency [label="backend: verify step consistency"];
  derivation_table -> verify_step_consistency;
  
  list_of_steps [label="backend: get list of steps"];
  derivation_steps_json -> list_of_steps;
  derivation_table -> list_of_steps;
  
  derivation_tex [label="web UI: link to derivation .tex file" style=filled color=lightblue];
  read_derivation -> derivation_tex;
  derivation_pdf [label="web UI: link to derivation .pdf file" style=filled color=lightblue];
  read_derivation -> derivation_pdf;
  derivation_d3js [label="web UI: show derivation d3js" style=filled color=lightblue];
  read_derivation -> derivation_d3js;
  derivation_graphviz_png [label="web UI: show derivation graphviz PNG" style=filled color=lightblue];
  read_derivation -> derivation_graphviz_png;

  derivation_d3js -> list_of_steps;
  derivation_graphviz_png  -> list_of_steps;
  
  get_expression_for_step [label="backend: get expressions per step"];
  derivation_d3js -> get_expression_for_step;
  derivation_table -> get_expression_for_step;
  get_infrule_for_step [label="backend: get infrule per step"];
  derivation_graphviz_png -> get_infrule_for_step;
  
  generate_png_for_expression [label="backend: generate png for expression"];
  derivation_d3js -> generate_png_for_expression;
  derivation_graphviz_png -> generate_png_for_expression;
  generate_png_for_infrule [label="backend: generate png for infrule"];
  derivation_d3js -> generate_png_for_infrule;
  derivation_graphviz_png -> generate_png_for_infrule;
  
  generate_derivation_tex [label="backend: generate tex for derivation"];
  derivation_tex -> generate_derivation_tex;
  generate_derivation_tex -> list_of_steps;
  generate_derivation_tex -> get_infrule_for_step;
  generate_derivation_tex -> get_expression_for_step;
  
  generate_derivation_pdf_from_tex [label="backend: generate derivation PDF from .tex"];
  derivation_pdf -> generate_derivation_pdf_from_tex -> generate_derivation_tex;
  
  generate_pdf_from_tex [label="backend: generate PDF from .tex"];
  generate_derivation_pdf_from_tex -> generate_pdf_from_tex;
}

Monday, September 2, 2024

new droplet for Ubuntu 24 LTS

My upgrade from Ubuntu 20 to 22 resulted in an inability to SSH to the server; see https://physicsderivationgraph.blogspot.com/2024/09/unable-to-ssh-into-vps-after-upgrade-of.html
I decided to get a new droplet and start from scratch with Ubuntu 24 LTS. Logged in as root via the web console,
adduser pdg
usermod -aG sudo pdg
ufw allow OpenSSH
ufw enable
Then I was able to SSH from my laptop to the VPS as "pdg"
sudo ufw allow 443
sudo ufw allow 80

edit ~/.bashrc to include

alias ..='cd ..'

Install Docker

sudo apt update
sudo apt upgrade
sudo apt install apt-transport-https curl
sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl is-active docker
as per https://linuxiac.com/how-to-install-docker-on-ubuntu-24-04-lts/

To clone the github repos over SSH requires keys.

ssh-keygen
and upload the newly generated public key to https://github.com/settings/keys

Ubuntu 24 LTS doesn't come with make by default, so

sudo apt install make

By default the user pdg can't launch Docker, so

sudo usermod -a -G docker $USER
  newgrp docker
as per https://stackoverflow.com/a/48450294

Certs need to be loaded; see https://physicsderivationgraph.blogspot.com/2021/10/periodic-renewal-of-https-letsencrypt.html

sudo apt -y install certbot

Certbot requires a running webservice on port 80 to create new certificates.



Reference:
https://physicsderivationgraph.blogspot.com/2020/10/upgrading-ubuntu-1804-to-2004-on.html

unable to SSH into VPS after upgrade of Ubuntu from 20 to 22 LTS

Prior to upgrading from Ubuntu 20 to 22 LTS I was able to SSH from my local laptop to a remote VPS using the command

ssh -v username@IPaddress

After the upgrade I got

ssh -v username@IPaddress
OpenSSH_9.7p1, LibreSSL 3.3.6
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files
debug1: /etc/ssh/ssh_config line 54: Applying options for *
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug1: Connecting to IPaddress [IPaddress] port 22.
debug1: connect to address IPaddress port 22: Operation timed out
ssh: connect to host IPaddress port 22: Operation timed out

Sunday, May 19, 2024

notes on status and near-term plans

status updates after ~30 hours of work over the course of 1 weekend.

I'm happy with the rewrite of the back-end since it makes the design more robust and flexible. 

I've learned enough Cypher to feel comfortable continuing with this path. I'm also somewhat confident (with no basis in experience) that switching to a different property graph is feasible without having to rewrite the front-end and controller).


Developer workflow

My "how do I develop the code" steps have evolved to include Black and mypy. 

After making changes to the code I format using Black and then use type hinting:
make black_out
docker run --rm -v`pwd`:/scratch --entrypoint='' -w /scratch/ property_graph_webserver mypy --check-untyped-defs webserver/app.py
To launch the web UI,
make black_out; docker ps | grep property | cut -d' ' -f1 | xargs docker kill; date; make up

I added Selenium tests to speed up the UI validation.


Near-term plans

There's a lot to do, so in roughly priority order (from most important near-term to will-get-to-later)
  1. Convert Latex expressions to Sympy
  2. Check the dimensionality and consistency of expressions using sympy
  3. Provide feedback to user on invalid inputs using dimensionality checks and symbol consistency
  4. Check steps using SymPy. Currently in the v7 code base the file "validate_steps_sympy.py" takes Latex as input, converts to SymPy, then validates step. I think the step validation should be done using pure SymPy? (rather than taking Latex as input).

Lean: Explorer how to include lean derivations per step. Can I get lean to run on my computer?

Add API support to enable curl interactions.
This would improve command line testability of workflows

Write more selenium tests.
This assumes the web UI is stable.

Make the HTML tables sortable (as already exists on https://derivationmap.net/)

Support rendering latex in HTML (as already exists on https://derivationmap.net/)

Migrating existing content into the new back end

Convert latex strings into PNG files for visualization (as already exists on https://derivationmap.net/)

Render derivations as PDF (as already exists on https://derivationmap.net/)

Saturday, May 18, 2024

distinguishing scalars, vectors, and matrices as operators or symbols

A Physics derivation has steps and expressions. Expressions are composed of symbols and operations. 

Symbols (e.g., x) can be constant or variable, real or complex. Symbols do not have arguments.

Operations are distinct from symbols because they require one or more symbols to operate upon. For example, +, ^, determinant(), etc.


Within the concept of symbols there are distinct categories: scalar, vector, matrix. The distinction is that they have different properties. A vector has a dimension, a matrix has 2 dimensions. 

Since vectors can contain variables (e.g., \vec{a} = [x,y,z] ) and matrices can contain scalars, then we need a new boolean property for symbols: is_composite, and a new numeric property for symbols: dimensions. 

When "dimension"=0, the symbol is a scalar. Then is_composite is false.
When "dimension"=1, the symbol is a vector. Then is_composite is either false or true. If true then the symbol has two or more edges with other (scalar) symbols.
When "dimension"=2, the symbol is a matrix. Then is_composite is either false or true. If true then the symbol has four or more edges with other (scalar) symbols.


However, this accommodation isn't sufficient. We can say "matrix A = matrix B" and consider the matrices as symbols. However, a quantum operator (e.g., bit flip) is a matrix that operates on a state -- an argument is required. There is an equivalence to the Hamiltonian ("H") and the matrix representation. Therefore the matrix is not a symbol as previously defined.


While the expression "matrix A = matrix B" is valid, " + = + " is not. The difference is that "+" is not a composite symbol.

"H = matrix B" is a valid expression even though "H" is an operator. 

Therefore the distinction between symbol and operators is misleading. The schema for symbols should be

  • latex
  • name_latex
  • description_latex
  • requires_arguments (boolean)
    • if requires_arguments=true (operator: +, /, ^, determinant), then 
      • argument_count (e.g., 1,2,3)
    • else requires_arguments=false (variable or constant or operator: a, x, H), then 
      • dimension (e.g., 0,1,2)
        • if dimension = 0 (aka scalar: "size=1x1" and "is_composite=false") then
          • dimension_length
          • dimension_time
          • dimension_luminosity
          • dimension_mass
        • if dimension = 1 (aka vector) then
          • is_composite = false or true
          • orientation is row xor column xor arbitrary
            • if row or column, 
              • size is arbitrary xor definite
                • if definite, "2x1" xor "1x2" xor "3x1" xor "1x3" ...
        • if dimension = 2 (aka matrix) then
          • is_composite = false or true
          • size arbitrary xor definite
            • if definite, "2x2" xor "2x3" xor "3x2" xor "3x3" xor ...

That means the major distinct subtypes of symbols are
  • symbol that requires arguments
  • symbol that does not require arguments, dimension 0
  • symbol that does not require arguments, dimension 1
  • symbol that does not require arguments, dimension 2

Saturday, March 9, 2024

Derivations, CAS, Lean, and Assumptions in Physics

Initially the Physics Derivation Graph documented expressions as Latex. Then SymPy was added to support validation of steps (is the step self-consistent) and dimensionality (is the expression self-consistent?). 

Recently I learned that Lean could be used to prove each step in a derivation. The difference between a Computer Algebra System (e.g., SymPy) and Lean is whether "a = b  --> a/b = 1" is a valid step -- it isn't when b is zero. Lean catches that; SymPy does not. 

While Lean proofs sound like the last possible refinement, there are two additional complications to account for not addressed by Lean. 

Challenge: Bounded ranges of applicability

In classical mechanics the relation between momentum, mass, and velocity is "p = m v". That hold when "v << c". Near the speed of light we need to switch to relativistic mass, 

m = m_{rest} / sqrt{1-((v^2)/(c^2))}.

The boundary between "v << c" and "v ~ c" is usually set by the context being considered. 

One response for users of Lean would be to always use the "correct" relativistic equation, even when "v << c."  A more conventional approach used by Physicists is to use

p = m v, where v << c

then drop the "v << c" clause and rely on context.


Challenge: Real versus Float versus experimental characterization

Lean forces you to characterize numbers as Real or Integer or Complex. This presents a problem for numerical simulations that have something like a 64 bit float representation.

In thermodynamics we assume the number of particles involved is sufficiently large that we focus on the behavior of the ensemble rather than individual particles. The imprecision of floats is not correct, but neither is the infinite precision assumed by Real numbers. 


Example applications of Lean proofs needing bounds on values

Math doesn't have convenient ways of indicating "finite precision, as set by the Plank scale."  The differential element used in calculus cannot actually go to zero, but we use that concept because it works at the scales we are used to. 

Physicists make simplifying assumptions that sometimes ignore reality (e.g., assuming continuous media when particles are discrete). Then again the assumption that particles are discrete is also a convenient fiction that ignores the wavefunction of quantum mechanics. 

Lean can be used to prove derivations in classical mechanics, but to be explicit about the bounds of those proofs we'd also need to indicate "v << c" and "assume space is Euclidean." 

For molecular dynamics, another constraint to account for is "temperature << 1E10 Kelvin" or whatever temperature the atoms breaks down into plasma. 

Distinguishing the context of (classical mechanics from quantum) and (classical from relativistic) and (conventional gas versus plasma) seems important so that we know when a claim proven in Lean is applicable. 

Saturday, March 2, 2024

dichotomy of assumptions

In Physics there are some assumptions that form a dichotomy:

  • is the speed of light constant or variable?
  • is the measure of energy discrete or continuous?

In the dichotomy of assumptions, one of the two assumptions is reflective of reality, and the other is an oversimplification. The oversimplification is related to reality by assumptions, constraints, and limits. 

(I define "oversimplification" as the extension of useful assumptions to incorrect regions.)

Another case where oversimplification is the link between domains is quantum physics and (classical) statistical physics. Quantum particles are either Fermions (odd half integer spin) or Bosons (integer spin), but that is practically irrelevant for large ensembles of particles at room temperature. The aspects that get measured at one scale (e.g., particle velocity) are related to but separate from metrics at another scale (e.g, temperature, entropy). Mathematically this transition manifests as the switch from summation to integration.


So what? 
This is a new-to-me category of derivations which span domains. What constitutes a domain is set by the assumptions that form the boundaries, and oversimplification is how to cross the boundaries. 

What boundaries should the Physics Derivation Graph transgress? What oversimplifications are adjacent?

The evidences of dissonance (e.g, Mercury’s perihelion based on Newtonian gravitation versus relativity, the Deflection of Starlight; source) are not relevant for bridging domains. They are illustrations of the oversimplification.

Update 2024-03-10: on the page https://en.wikipedia.org/wiki/Phase_space#Quantum_mechanics

"by expressing quantum mechanics in phase space (the same ambit as for classical mechanics), the Weyl map facilitates recognition of quantum mechanics as a deformation (generalization) of classical mechanics, with deformation parameter ħ/S, where S is the action of the relevant process. (Other familiar deformations in physics involve the deformation of classical Newtonian into relativistic mechanics, with deformation parameter v/c; or the deformation of Newtonian gravity into general relativity, with deformation parameter Schwarzschild radius/characteristic dimension.)
 
Classical expressions, observables, and operations (such as Poisson brackets) are modified by ħ-dependent quantum corrections, as the conventional commutative multiplication applying in classical mechanics is generalized to the noncommutative star-multiplication characterizing quantum mechanics and underlying its uncertainty principle."
See also https://en.wikipedia.org/wiki/Wigner%E2%80%93Weyl_transform#Deformation_quantization

Sunday, February 25, 2024

derivations, identities, and other categories of mathematical physics

Transforming from one expression to another is carried out with respect to a goal. There are different categories of goals (listed below). I don't yet know what distinguishes one category from another.

My motivation for the Physics Derivation Graph is exclusively the first category -- derivations, specifically across domains. This motivation is more specific than a broader question, "what is the relation between every expression in mathematical physics and every other expression?" because the answer might be "symbols and inference rules are common."

The other categories are neat but not as rewarding for me. These other categories are included in the PDG because the infrastructure for software and UI and data format are the same.


Examples of Derivations

Relation between two disparate descriptions of nature:

Derivation of expression from simpler observations plus assumptions:
An expression in one domain simplifies to another expression in a different domain under modified assumptions

Examples of Identities

An equation can be expressed differently but be equivalent:
Show that two sides of an equation are, in fact, the same:

Examples of Use Case

Start from one or more expressions to derive a conclusion with practical utility
This is the focus of the ATOMS lab at UMBC, specifically enacted using Lean.

Sunday, January 21, 2024

Python dependencies visualization using pydeps

To see what Python modules are available on my system I can run

pydoc3 modules

One of the modules is pandas. Here's a script that just loads pandas module:

cat load_pandas.py 
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
import pandas as pd 

When the script is run nothing happens

python3 load_pandas.py
I have pydeps installed:
pydeps --version
pydeps v1.12.17

Output of analysis by pydeps:

pydeps -o myfile.png --show-dot -T png --noshow load_pandas.py 

digraph G {
    concentrate = true;

    rankdir = TB;
    node [style=filled,fillcolor="#ffffff",fontcolor="#000000",fontname=Helvetica,fontsize=10];

    load_pandas_py [fillcolor="#a65959",fontcolor="#ffffff",label="load_pandas.py"];
    pandas [fillcolor="#039595",fontcolor="#ffffff"];
    pandas__config [fillcolor="#24d0d0",label="pandas._config"];
    pandas__version [fillcolor="#47c2c2",label="pandas\.\n_version"];
    pandas__version_meson [fillcolor="#47c2c2",label="pandas\.\n_version_meson"];
    pandas_api [fillcolor="#3db8b8",label="pandas.api"];
    pandas_arrays [fillcolor="#46a4a4",label="pandas.arrays"];
    pandas_compat [fillcolor="#17d3d3",label="pandas.compat"];
    pandas_compat__optional [fillcolor="#31c4c4",label="pandas\.\ncompat\.\n_optional"];
    pandas_compat_pyarrow [fillcolor="#3db8b8",label="pandas\.\ncompat\.\npyarrow"];
    pandas_core [fillcolor="#10f9f9",label="pandas.core"];
    pandas_core_api [fillcolor="#3a8888",fontcolor="#ffffff",label="pandas\.\ncore\.\napi"];
    pandas_core_computation [fillcolor="#3bcece",label="pandas\.\ncore\.\ncomputation"];
    pandas_core_computation_api [fillcolor="#46a4a4",label="pandas\.\ncore\.\ncomputation\.\napi"];
    pandas_core_config_init [fillcolor="#3a8888",fontcolor="#ffffff",label="pandas\.\ncore\.\nconfig_init"];
    pandas_core_dtypes [fillcolor="#2fdbdb",label="pandas\.\ncore\.\ndtypes"];
    pandas_core_dtypes_dtypes [fillcolor="#339999",fontcolor="#ffffff",label="pandas\.\ncore\.\ndtypes\.\ndtypes"];
    pandas_core_reshape [fillcolor="#47c2c2",label="pandas\.\ncore\.\nreshape"];
    pandas_core_reshape_api [fillcolor="#46a4a4",label="pandas\.\ncore\.\nreshape\.\napi"];
    pandas_errors [fillcolor="#3ab0b0",label="pandas.errors"];
    pandas_io [fillcolor="#0bdfdf",label="pandas.io"];
    pandas_io_api [fillcolor="#46a4a4",label="pandas.io.api"];
    pandas_io_json [fillcolor="#23c8c8",label="pandas.io.json"];
    pandas_io_json__normalize [fillcolor="#4cb3b3",label="pandas\.\nio\.\njson\.\n_normalize"];
    pandas_plotting [fillcolor="#31c4c4",label="pandas\.\nplotting"];
    pandas_testing [fillcolor="#4cb3b3",label="pandas.testing"];
    pandas_tseries [fillcolor="#23c8c8",label="pandas.tseries"];
    pandas_tseries_api [fillcolor="#46a4a4",label="pandas\.\ntseries\.\napi"];
    pandas_tseries_offsets [fillcolor="#26d9d9",label="pandas\.\ntseries\.\noffsets"];
    pandas_util [fillcolor="#05e5e5",label="pandas.util"];
    pandas_util__print_versions [fillcolor="#409696",fontcolor="#ffffff",label="pandas\.\nutil\.\n_print_versions"];
    pandas_util__tester [fillcolor="#46a4a4",label="pandas\.\nutil\.\n_tester"];
    pandas -> load_pandas_py [fillcolor="#039595",minlen="2"];
    pandas__config -> pandas [fillcolor="#24d0d0"];
    pandas__config -> pandas_core_config_init [fillcolor="#24d0d0",minlen="2"];
    pandas__config -> pandas_errors [fillcolor="#24d0d0"];
    pandas__version -> pandas [fillcolor="#47c2c2"];
    pandas__version -> pandas_util__print_versions [fillcolor="#47c2c2",minlen="2"];
    pandas__version_meson -> pandas [fillcolor="#47c2c2"];
    pandas__version_meson -> pandas_util__print_versions [fillcolor="#47c2c2",minlen="2"];
    pandas_api -> pandas [fillcolor="#3db8b8"];
    pandas_arrays -> pandas [fillcolor="#46a4a4"];
    pandas_compat -> pandas [fillcolor="#17d3d3"];
    pandas_compat -> pandas_core_dtypes_dtypes [fillcolor="#17d3d3",minlen="3"];
    pandas_compat -> pandas_util__print_versions [fillcolor="#17d3d3",minlen="2"];
    pandas_compat -> pandas_util__tester [fillcolor="#17d3d3",minlen="2"];
    pandas_compat__optional -> pandas [fillcolor="#31c4c4",minlen="2"];
    pandas_compat__optional -> pandas_util__print_versions [fillcolor="#31c4c4",minlen="2"];
    pandas_compat__optional -> pandas_util__tester [fillcolor="#31c4c4",minlen="2"];
    pandas_compat_pyarrow -> pandas [fillcolor="#3db8b8",minlen="2"];
    pandas_compat_pyarrow -> pandas_compat [fillcolor="#3db8b8",weight="2"];
    pandas_core -> pandas [fillcolor="#10f9f9"];
    pandas_core -> pandas_arrays [fillcolor="#10f9f9"];
    pandas_core -> pandas_util [fillcolor="#10f9f9"];
    pandas_core_api -> pandas [fillcolor="#3a8888",minlen="2"];
    pandas_core_computation -> pandas [fillcolor="#3bcece",minlen="2"];
    pandas_core_computation -> pandas_core_config_init [fillcolor="#3bcece",weight="2"];
    pandas_core_computation_api -> pandas [fillcolor="#46a4a4",minlen="3"];
    pandas_core_config_init -> pandas [fillcolor="#3a8888",minlen="2"];
    pandas_core_dtypes -> pandas [fillcolor="#2fdbdb",minlen="2"];
    pandas_core_dtypes -> pandas_core_api [fillcolor="#2fdbdb",weight="2"];
    pandas_core_dtypes -> pandas_core_config_init [fillcolor="#2fdbdb",weight="2"];
    pandas_core_dtypes_dtypes -> pandas [fillcolor="#339999",minlen="3"];
    pandas_core_dtypes_dtypes -> pandas_core_api [fillcolor="#339999",minlen="2",weight="2"];
    pandas_core_reshape -> pandas [fillcolor="#47c2c2",minlen="2"];
    pandas_core_reshape_api -> pandas [fillcolor="#46a4a4",minlen="3"];
    pandas_errors -> pandas [fillcolor="#3ab0b0"];
    pandas_errors -> pandas_core_dtypes_dtypes [fillcolor="#3ab0b0",minlen="3"];
    pandas_io -> pandas [fillcolor="#0bdfdf"];
    pandas_io -> pandas_core_api [fillcolor="#0bdfdf",minlen="2"];
    pandas_io -> pandas_core_config_init [fillcolor="#0bdfdf",minlen="2"];
    pandas_io_api -> pandas [fillcolor="#46a4a4",minlen="2"];
    pandas_io_json -> pandas [fillcolor="#23c8c8",minlen="2"];
    pandas_io_json -> pandas_io [fillcolor="#23c8c8",weight="2"];
    pandas_io_json -> pandas_io_api [fillcolor="#23c8c8",weight="2"];
    pandas_io_json__normalize -> pandas [fillcolor="#4cb3b3",minlen="3"];
    pandas_plotting -> pandas [fillcolor="#31c4c4"];
    pandas_plotting -> pandas_core_config_init [fillcolor="#31c4c4",minlen="2"];
    pandas_testing -> pandas [fillcolor="#4cb3b3"];
    pandas_tseries -> pandas [fillcolor="#23c8c8"];
    pandas_tseries -> pandas_core_api [fillcolor="#23c8c8",minlen="2"];
    pandas_tseries_api -> pandas [fillcolor="#46a4a4",minlen="2"];
    pandas_tseries_offsets -> pandas [fillcolor="#26d9d9",minlen="2"];
    pandas_tseries_offsets -> pandas_core_api [fillcolor="#26d9d9",minlen="2"];
    pandas_tseries_offsets -> pandas_tseries [fillcolor="#26d9d9",weight="2"];
    pandas_tseries_offsets -> pandas_tseries_api [fillcolor="#26d9d9",weight="2"];
    pandas_util -> pandas [fillcolor="#05e5e5"];
    pandas_util -> pandas_arrays [fillcolor="#05e5e5"];
    pandas_util -> pandas_compat__optional [fillcolor="#05e5e5",minlen="2"];
    pandas_util -> pandas_compat_pyarrow [fillcolor="#05e5e5",minlen="2"];
    pandas_util -> pandas_core_dtypes_dtypes [fillcolor="#05e5e5",minlen="3"];
    pandas_util -> pandas_errors [fillcolor="#05e5e5"];
    pandas_util__print_versions -> pandas [fillcolor="#409696",minlen="2"];
    pandas_util__tester -> pandas [fillcolor="#46a4a4",minlen="2"];
}