Thursday, December 4, 2025

website unresponsive: diagnostic steps and blocking the AI crawlers

I received an email alert this morning with subject "Wachete error notification" which indicated my website wasn't responsive. 

In a browser I verified that https://allofphysics.com/ is hanging. (No immediate error response.) After a few minutes I got "504 Gateway Time-out nginx/1.17.9"

I ssh'd into the VPS (virtual private server) and ran 
docker ps
to verify the containers were running.

I used
top
to verify the CPU load and memory load. Two instances of gunicorn are using 2% of the CPU each and 10% of the RAM each. That's expected.

Next I logged into the VPS web portal to review system usage for the past 7 days. There is certainly a noticeable change of metrics that started suddenly yesterday:



The last interaction I had with the server (more than a week ago) was to update the HTTPS certificates using Let's Encrypt. Although the website had returned a 504 error I could check the certificate expiration in the browser. The certs were valid.

In the logs directory on the server 

-rwxrwxrwx 1 usr  usr          0 Sep  3  2024 auth.log*
-rw-r--r-- 1 usr  usr    6194913 Dec  3 14:38 flask_critical_and_error_and_warning.log
-rw-r--r-- 1 usr  usr     383596 Nov 26 06:27 flask_critical_and_error_and_warning.log.1
-rw-r--r-- 1 usr  usr    9999931 Nov 23 18:02 flask_critical_and_error_and_warning.log.2
-rw-r--r-- 1 usr  usr    9945758 Dec  3 14:42 flask_critical_and_error_and_warning_and_info.log
-rw-r--r-- 1 usr  usr    9999983 Dec  2 16:14 flask_critical_and_error_and_warning_and_info.log.1
-rw-r--r-- 1 usr  usr    9999938 Dec  1 16:40 flask_critical_and_error_and_warning_and_info.log.2
-rw-r--r-- 1 usr  usr    1206714 Dec  3 14:42 flask_critical_and_error_and_warning_and_info_and_debug.log
-rw-r--r-- 1 usr  usr    9999916 Dec  3 12:46 flask_critical_and_error_and_warning_and_info_and_debug.log.1
-rw-r--r-- 1 usr  usr    9999926 Dec  3 00:53 flask_critical_and_error_and_warning_and_info_and_debug.log.2
-rw-r--r-- 1 usr  usr  125459598 Dec  3 14:42 gunicorn_access.log
-rw-r--r-- 1 usr  usr  166722892 Dec  3 14:42 gunicorn_error.log
-rw-r--r-- 1 root root 126147128 Dec  4 11:01 nginx_access.log
-rw-r--r-- 1 root root  28785863 Dec  4 11:01 nginx_error.log
Only nginx logs have today's date. That's consistent with the blocker being nginx. Using
tail -f nginx_access.log
I see the latest entries are associated with https://webmaster.petalsearch.com/site/petalbot which says the crawler
"establish an index database which enables users to search the content of your site in Petal search engine and present content recommendations for the user in Huawei Assistant and AI Search services"

Using Gemini 2.5 Flash from https://aistudio.google.com/ I ask

I'm running a webserver that uses nginx and runs on linux. I am interested in blocking certain IP address ranges. Should I configure nginx to filter IP ranges or should I filter using the linux firewall? I want to use the software I already have rather than add yet another tool for this blocking.
and learn that linux firewall is recommended over nginx.

Next question for Gemini 2.5 Flash LLM is

I'm using Ubuntu for a webserver. How do I determine what firewall is being used from the command line?

I then run the following on my VPS:

$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), deny (routed)
New profiles: skip

To                         Action      From
--                         ------      ----
22/tcp (OpenSSH)           ALLOW IN    Anywhere                  
443                        ALLOW IN    Anywhere                  
80                         ALLOW IN    Anywhere                  
22/tcp (OpenSSH (v6))      ALLOW IN    Anywhere (v6)             
443 (v6)                   ALLOW IN    Anywhere (v6)             
80 (v6)                    ALLOW IN    Anywhere (v6)             
I then also verify that nft exists using
$ sudo nft list ruleset
# Warning: table ip filter is managed by iptables-nft, do not touch!
table ip filter {
	chain ufw-before-logging-input {
	}
...long output, snipped...
From a few minutes of reviewing tail -f nginx_access.log the major offenders for this denial-of-service (DOS) attack appear to be
104.210.140.141 = OpenAI, observed 2025-12-04
114.119.147.137 = PetalBot (for Huawei), observed 2025-12-04
156.59.198.136 = bytedance, observed 2025-12-04

LLM query:

ufw block IP address range for web server
followed by
how to pick the correct CIDR value for IP blocking?
from which I learn /24 is the last octet (0 to 255)

I then run

$ sudo ufw deny from 156.59.198.136/24
WARN: Rule changed after normalization
Rule added
$ sudo ufw deny from 114.119.147.0/24
Rule added
$ sudo ufw deny from 104.210.140.0/24
Rule added
Check the results
$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
OpenSSH                    ALLOW       Anywhere                  
443                        ALLOW       Anywhere                  
80                         ALLOW       Anywhere                  
Anywhere                   DENY        156.59.198.0/24           
Anywhere                   DENY        114.119.147.0/24          
Anywhere                   DENY        104.210.140.0/24          
OpenSSH (v6)               ALLOW       Anywhere (v6)             
443 (v6)                   ALLOW       Anywhere (v6)             
80 (v6)                    ALLOW       Anywhere (v6)             
The LLM had warned me that "If you find that the new deny rules are at the bottom of the list, you may need to use the insert function to put them at the top (e.g., position 1 and 2)." 
Gemini 2.5 says the general best practice for firewall rules is:
  1. Specific DENY rules (blocking known bad actors).
  2. Specific ALLOW rules (allowing trusted hosts/networks).
  3. General ALLOW rules (allowing public services).
  4. General DENY rules (the default policy, often implied).
Gemini 2.5's advice was almost correct. The LLM got the rule indices wrong. Here are the commands I ran:
$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
OpenSSH                    ALLOW       Anywhere                  
443                        ALLOW       Anywhere                  
80                         ALLOW       Anywhere                  
Anywhere                   DENY        156.59.198.0/24           
Anywhere                   DENY        114.119.147.0/24          
Anywhere                   DENY        104.210.140.0/24          
OpenSSH (v6)               ALLOW       Anywhere (v6)             
443 (v6)                   ALLOW       Anywhere (v6)             
80 (v6)                    ALLOW       Anywhere (v6)             

$ sudo ufw status numbered
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] OpenSSH                    ALLOW IN    Anywhere                  
[ 2] 443                        ALLOW IN    Anywhere                  
[ 3] 80                         ALLOW IN    Anywhere                  
[ 4] Anywhere                   DENY IN     156.59.198.0/24           
[ 5] Anywhere                   DENY IN     114.119.147.0/24          
[ 6] Anywhere                   DENY IN     104.210.140.0/24          
[ 7] OpenSSH (v6)               ALLOW IN    Anywhere (v6)             
[ 8] 443 (v6)                   ALLOW IN    Anywhere (v6)             
[ 9] 80 (v6)                    ALLOW IN    Anywhere (v6)             

$ sudo ufw delete 4
Deleting:
 deny from 156.59.198.0/24
Proceed with operation (y|n)? y
Rule deleted
$ sudo ufw insert 1 deny from 156.59.198.0/24 to any
Rule inserted
$ sudo ufw delete 4
Deleting:
 allow 80
Proceed with operation (y|n)? n
Aborted
$ sudo ufw status numbered
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] Anywhere                   DENY IN     156.59.198.0/24           
[ 2] OpenSSH                    ALLOW IN    Anywhere                  
[ 3] 443                        ALLOW IN    Anywhere                  
[ 4] 80                         ALLOW IN    Anywhere                  
[ 5] Anywhere                   DENY IN     114.119.147.0/24          
[ 6] Anywhere                   DENY IN     104.210.140.0/24          
[ 7] OpenSSH (v6)               ALLOW IN    Anywhere (v6)             
[ 8] 443 (v6)                   ALLOW IN    Anywhere (v6)             
[ 9] 80 (v6)                    ALLOW IN    Anywhere (v6)             

$ sudo ufw delete 5
Deleting:
 deny from 114.119.147.0/24
Proceed with operation (y|n)? y
Rule deleted
$ sudo ufw insert 2 deny from 114.119.147.0/24 to any
Rule inserted
$ sudo ufw status numbered
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] Anywhere                   DENY IN     156.59.198.0/24           
[ 2] Anywhere                   DENY IN     114.119.147.0/24          
[ 3] OpenSSH                    ALLOW IN    Anywhere                  
[ 4] 443                        ALLOW IN    Anywhere                  
[ 5] 80                         ALLOW IN    Anywhere                  
[ 6] Anywhere                   DENY IN     104.210.140.0/24          
[ 7] OpenSSH (v6)               ALLOW IN    Anywhere (v6)             
[ 8] 443 (v6)                   ALLOW IN    Anywhere (v6)             
[ 9] 80 (v6)                    ALLOW IN    Anywhere (v6)             

$ sudo ufw delete 6
Deleting:
 deny from 104.210.140.0/24
Proceed with operation (y|n)? y
Rule deleted
$ sudo ufw insert 3 deny from 104.210.140.0/24 to any
Rule inserted
$ sudo ufw status numbered
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] Anywhere                   DENY IN     156.59.198.0/24           
[ 2] Anywhere                   DENY IN     114.119.147.0/24          
[ 3] Anywhere                   DENY IN     104.210.140.0/24          
[ 4] OpenSSH                    ALLOW IN    Anywhere                  
[ 5] 443                        ALLOW IN    Anywhere                  
[ 6] 80                         ALLOW IN    Anywhere                  
[ 7] OpenSSH (v6)               ALLOW IN    Anywhere (v6)             
[ 8] 443 (v6)                   ALLOW IN    Anywhere (v6)             
[ 9] 80 (v6)                    ALLOW IN    Anywhere (v6)            

 

In a web browser I visited https://allofphysics.com/ and the page loaded immediately. Yay!

Sunday, November 23, 2025

a single step fully verified using SymPy and Lean

As very simple example, start with the equation

T = 1/f 
where T is period of oscillation and f is frequency of oscillation. A transformation would be to multiply both sides by f to get
f T = 1

Verification of a step using a Computer Algebra System like SymPy

The single step above could be verified using a Computer Algebra System like SymPy. The generic form of the inference rule is "multiply both sides of (LHS=RHS) by feed to get (LHS*feed=RHS*feed)". To show the inference rule was correctly applied, we want to show that

LHS_in*feed == LHS_out
and
RHS_in*feed == RHS_out
Another way to describe the equivalence is that the difference should be zero:
LHS_in*feed - LHS_out = 0
and
RHS_in*feed - RHS_out = 0
That's the generic formulation of the inference rule check. In this step,
LHS_in = T
RHS_in = 1/f
feed = f
LHS_out = f T
RHS_out = 1

This check is implemented in line 496 of validate_steps_sympy.py as

import sympy
def multiply_both_sides_by(LHS_in, RHS_in, feed, LHS_out, RHS_out):
    diff1 = sympy.simplify(sympy.Mul(LHS_in, feed) - LHS_out)
    diff2 = sympy.simplify(sympy.Mul(RHS_in, feed) - RHS_out)
    if (diff1 == 0) and (diff2 == 0):
        return "valid"
    else:
        return "LHS diff is " + str(diff1) + "\n" + "RHS diff is " + str(diff2)
We can run that using
>>> import sympy
>>> print(sympy.__version__)
1.11.1
>>> f, T = sympy.symbols('f T')
>>> multiply_both_sides_by(T, 1/f, f, f*T, 1)
'valid'

Wahoo! The step has been validated using SymPy to show the inference rule is applied correctly.

The cool part is that the "multiply_both_sides()" Python function is generic to any input expression. The same check can be used for many different steps that use the inference rule. Using SymPy we can gain confidence that the expressions associated with a step were correctly transformed. 


Consistency of dimensionality using SymPy

In addition to evaluating the transformation of symbols in a step, we can verify the consistency of dimensions for each expression. That requires more than just the symbols -- the user will have to specify the dimensions of each symbol.

For example, in JSON for period T we have

        "9491": {
            "category": "variable",
            "dimensions": {
                "amount of substance": 0,
                "electric charge": 0,
                "length": 0,
                "luminous intensity": 0,
                "mass": 0,
                "temperature": 0,
                "time": 1
            },
            "latex": "T",
            "name": "period",
            "scope": [
                "real"
            ]
        },

The script validate_dimensions_sympy.py

>>> import sympy
>>> from sympy.physics.units import mass, length, time, temperature, luminous_intensity, amount_of_substance, charge  # type: ignore
>>> from sympy.physics.units.systems.si import dimsys_SI

>>> from sympy.parsing.latex import parse_latex
>>> sympy.srepr(parse_latex('T = 1/f'))

TODO
>>> determine_consistency = dimsys_SI.equivalent_dims( eval(str(LHS)), eval(str(RHS)) )

See https://physicsderivationgraph.blogspot.com/2020/07/function-latex-for-sympy.html

Verification of the step using Lean

To prove
(T=1/f) -> (f*T=1)

additional constraints are needed for reasoning around division by 0. If you expect to avoid negative or zero frequency or period, you could define f and T to have be "positive real numbers" (which exclude zero; non-negative reals include zero). This does define the context more precisely, but there is a price - we won’t have nearly as many proofs for positive reals as we have for reals. The alternative is to add additional hypotheses as constraints. The latter case (additional hypotheses) is favored.

import Mathlib.Data.Real.Basic
import Mathlib.Tactic -- Import standard tactics, specifically import Mathlib.Tactic.FieldSimp

theorem inversion_amultbeq1
  (a b : Real)
  (hb_ne_zero : b ≠ 0) :
  a = 1 / b <-> a * b = 1 := by
  -- field_simp clears the denominator 'b' on the LHS,
  -- turning (a = 1 / b) into (a * b = 1) automatically.
  field_simp [hb_ne_zero]

Sunday, November 2, 2025

finding an alternative Neo4j; a schema for SQL

I had been using Neo4j's property graph for the Physics Derivation Graph for a few years and recently stopped due to the 1GB of memory used by the server that is constantly running. (That 1GB of RAM is unrelated to the size of the underlying database, which in my case was a few hundred nodes.) 

I realized that what I want is a SQLite-like "offline" database that only is noticeable to CPU and RAM use when there's a query. I'm ok with the database's initialization cost resulting slower queries.

Search for Alternatives

https://jgrapht.org/ and NetworkX are irrelevant to the needs of the Physics Derivation Graph

https://en.wikipedia.org/wiki/Blazegraph is unsupported since 2018

KuzuDB was exactly what I was interested in -- an embedded offline property database comptible with Cypher. Coincidentally, https://kuzudb.github.io/ announced Oct 10 (3 weeks ago) that the project was being terminated immediately.
https://news.ycombinator.com/item?id=45560036
https://www.reddit.com/r/Database/comments/1o3w73z/status_of_kuzudb_from_kuzu_inc/
https://www.theregister.com/2025/10/14/kuzudb_abandoned/
A fork: https://github.com/Kineviz/bighorn

Possible contenders:

However, the most reliable approach may be to switch to SQLite and use LLMs to craft Cypher-like queries. This is inelegant (I had been exposing read-only Cypher capabilities on the web) and requires coming up with a schema (something I have no experience with). 

Convert Property Graph to Relational Database for SQL

Here's the prompt I provided Gemini 2.5 Pro:
I have a complicated property graph in Neo4j and want to convert to SQLite. Design a schema for the relational database.

The nodes present in the property graph representation are below.


node type: 'derivation'
properties={'author_name_latex': str,  
  'name_latex': str,     
  'id': str,  
  'abstract_latex': str,  
  'reference_latex': str, 
  'created_datetime': str} 

node type: 'step'
properties={'author_name_latex': str,   
'note_before_step_latex': str, 
'note_after_step_latex': str, 
'id': str,   
 'created_datetime': str} 

node type: 'feed'
properties={'author_name_latex': str,  
'id': str,  
'sympy': str, 
'lean': str,   
 'latex': str} 

node type: 'expression'
properties={'author_name_latex': str,   
 'name_latex': str, 
 'description_latex': str,  
 'reference_latex': str,   
 'id': str,         
 'latex': str,  
 'lean': str,   
 'sympy': str}  

node type: 'operation'
properties={'author_name_latex': str, 
'name_latex': str,  'argument_count': int,  
'description_latex': str,     
 'reference_latex': str, 
   'id': str,  
   'latex': str} 

node type: 'scalar'
properties={'author_name_latex': str,
 'name_latex': str, 
 'description_latex': str,  
 'id': str,         
 'reference_latex': str,   
 'latex': str,     
 'scope': str, 
 'variable_or_constant': str,  
 'domain': str,   
 'dimension_length': int,      
 'dimension_time': int,        
 'dimension_mass': int,        
 'dimension_temperature': int, 
 'dimension_electric_charge': int,     
 'dimension_amount_of_substance': int, 
 'dimension_luminous_intensity': int}  

node type: 'value_and_units'
properties={'author_name_latex': str
 'id': str,         
 'dimension_length_unit': int, 
 'dimension_time_unit': int,   
 'dimension_mass_unit': int,   
 'dimension_temperature_unit': int, 
 'dimension_electric_charge_unit': int,  
 'dimension_amount_of_substance_unit': int,
 'dimension_luminous_intensity_unit': int, 

node type: 'vector'
properties={'author_name_latex': str,
 'name_latex': str, 
 'description_latex': str,  
 'reference_latex': str,   
 'id': str,         
 'latex': str}     

node type: 'matrix'
properties={'author_name_latex': str,
 'name_latex': str, 
 'description_latex': str,  
 'id': str,         
 'reference_latex': str,   
 'latex': str}     

node type: 'quantum_operator'
properties={'author_name_latex': str,
 'name_latex': str, 
 'description_latex': str,  
 'requires_arguments': bool,
 'reference_latex': str,   
 'id': str,         
 'latex': str}

node type: 'inference_rule'
properties={'author_name_latex': str,  
 'number_of_inputs': int,     
 'number_of_outputs': int,    
 'number_of_feeds': int,      
 'name_latex': str, 
 'id': str,         
 'latex': str}      





The nodes and edges occur only in certain combinations described below.


edge 'HAS_STEP' properties={'sequence_index': int}
connects node type "derivation"
to node type "step"

edge 'HAS_INPUT' properties={'sequence_index': int}
connects node type "step"
to node type "expression"

edge 'HAS_FEED' properties={'sequence_index': int}
connects node type "step"
to node type "feed"

edge 'HAS_OUTPUT' properties={'sequence_index': int}
connects node type "step"
to node type "expression"

edge 'HAS_INFERENCE_RULE'
connects node type "step"
to node type "inference_rule"


edge 'HAS_SYMBOL'
connects node type "expression"
to node type "scalar"

edge 'HAS_SYMBOL'
connects node type "expression"
to node type "vector"

edge 'HAS_SYMBOL'
connects node type "expression"
to node type "matrix"

edge 'HAS_SYMBOL'
connects node type "expression"
to node type "operation"

edge 'HAS_SYMBOL'
connects node type "expression"
to node type "quantum_operator"

edge 'HAS_SYMBOL'
connects node type "feed"
to node type "scalar"

edge 'HAS_SYMBOL'
connects node type "feed"
to node type "vector"

edge 'HAS_SYMBOL'
connects node type "feed"
to node type "matrix"

edge 'HAS_SYMBOL'
connects node type "feed"
to node type "operation"

edge 'HAS_SYMBOL'
connects node type "feed"
to node type "quantum_operator"


edge 'HAS_VALUE'
connects node type "scalar"
to node type "value_and_units"


Response from Gemini:

Converting a property graph to a relational model involves translating nodes into tables and edges into relationships (either through foreign keys or dedicated join tables). Here is a comprehensive SQLite schema designed to represent your Neo4j graph structure, balancing relational purity with the flexibility of the original graph.

Design Principles

  • Nodes as Tables: Each distinct node type (`derivation`, `step`, `expression`, etc.) will become its own table.
  • Properties as Columns: The properties of each node will become columns in the corresponding table. The node's `id` will serve as the `PRIMARY KEY`.
  • Edges as Relationships:
    • One-to-Many / One-to-One Edges: These are modeled using a foreign key column in the "many" or "child" table. For example, since a `step` belongs to one `derivation`, the `steps` table will have a `derivation_id` column.
    • Many-to-Many Edges: These are modeled using a separate "join table" (or "link table") that contains foreign keys to the two connected tables. For example, a `step` can have many `expression` inputs, and an `expression` can be an input to many `steps`. This requires a `step_inputs` join table.
  • Polymorphic Edges: The `HAS_SYMBOL` edge is polymorphic because it can connect to multiple different types of symbol nodes (`scalar`, `vector`, etc.). To handle this cleanly without losing foreign key integrity, we will create a dedicated join table for each specific relationship (e.g., `expression_scalars`, `expression_vectors`). A more "graph-like" but less relationally pure alternative is discussed in the rationale section.

SQLite Schema Definition

Here are the `CREATE TABLE` statements for the SQLite database. Node Tables. These tables store the data for each type of node.
-- The top-level container for a mathematical derivation.
CREATE TABLE derivations (
    id TEXT PRIMARY KEY,
    author_name_latex TEXT,
    name_latex TEXT,
    abstract_latex TEXT,
    reference_latex TEXT,
    created_datetime TEXT NOT NULL
);

-- Represents a single logical step in a derivation.
-- It connects to a derivation (one-to-many) and an inference_rule (many-to-one).
CREATE TABLE steps (
    id TEXT PRIMARY KEY,
    derivation_id TEXT NOT NULL,
    inference_rule_id TEXT NOT NULL,
    sequence_index INTEGER NOT NULL, -- From the 'HAS_STEP' edge property
    author_name_latex TEXT,
    note_before_step_latex TEXT,
    note_after_step_latex TEXT,
    created_datetime TEXT NOT NULL,
    FOREIGN KEY (derivation_id) REFERENCES derivations(id) ON DELETE CASCADE,
    FOREIGN KEY (inference_rule_id) REFERENCES inference_rules(id) ON DELETE RESTRICT,
    UNIQUE (derivation_id, sequence_index) -- A derivation cannot have two steps at the same index
);

-- Represents the mathematical content of an expression.
CREATE TABLE expressions (
    id TEXT PRIMARY KEY,
    author_name_latex TEXT,
    name_latex TEXT,
    description_latex TEXT,
    reference_latex TEXT,
    latex TEXT,
    lean TEXT,
    sympy TEXT
);

-- Represents feed-in values or assumptions for a step.
CREATE TABLE feeds (
    id TEXT PRIMARY KEY,
    author_name_latex TEXT,
    sympy TEXT,
    lean TEXT,
    latex TEXT
);

-- Represents a rule of logic or mathematics used in a step.
CREATE TABLE inference_rules (
    id TEXT PRIMARY KEY,
    author_name_latex TEXT,
    name_latex TEXT,
    latex TEXT,
    number_of_inputs INTEGER,
    number_of_outputs INTEGER,
    number_of_feeds INTEGER
);

-- Represents a mathematical operation like addition or differentiation.
CREATE TABLE operations (
    id TEXT PRIMARY KEY,
    author_name_latex TEXT,
    name_latex TEXT,
    description_latex TEXT,
    reference_latex TEXT,
    latex TEXT,
    argument_count INTEGER
);

-- Represents a scalar value (a single number).
CREATE TABLE scalars (
    id TEXT PRIMARY KEY,
    author_name_latex TEXT,
    name_latex TEXT,
    description_latex TEXT,
    reference_latex TEXT,
    latex TEXT,
    scope TEXT,
    variable_or_constant TEXT,
    domain TEXT,
    dimension_length INTEGER,
    dimension_time INTEGER,
    dimension_mass INTEGER,
    dimension_temperature INTEGER,
    dimension_electric_charge INTEGER,
    dimension_amount_of_substance INTEGER,
    dimension_luminous_intensity INTEGER
);

-- Represents a vector.
CREATE TABLE vectors (
    id TEXT PRIMARY KEY,
    author_name_latex TEXT,
    name_latex TEXT,
    description_latex TEXT,
    reference_latex TEXT,
    latex TEXT
);

-- Represents a matrix.
CREATE TABLE matrices (
    id TEXT PRIMARY KEY,
    author_name_latex TEXT,
    name_latex TEXT,
    description_latex TEXT,
    reference_latex TEXT,
    latex TEXT
);

-- Represents a quantum mechanical operator.
CREATE TABLE quantum_operators (
    id TEXT PRIMARY KEY,
    author_name_latex TEXT,
    name_latex TEXT,
    description_latex TEXT,
    reference_latex TEXT,
    latex TEXT,
    requires_arguments INTEGER -- Using INTEGER for BOOLEAN (0=false, 1=true)
);

-- Represents the numerical value and units for a scalar.
-- This has a one-to-one relationship with a scalar.
CREATE TABLE values_and_units (
    id TEXT PRIMARY KEY,
    scalar_id TEXT NOT NULL UNIQUE, -- The UNIQUE constraint enforces the 1-to-1 relationship
    author_name_latex TEXT,
    dimension_length_unit INTEGER,
    dimension_time_unit INTEGER,
    dimension_mass_unit INTEGER,
    dimension_temperature_unit INTEGER,
    dimension_electric_charge_unit INTEGER,
    dimension_amount_of_substance_unit INTEGER,
    dimension_luminous_intensity_unit INTEGER,
    FOREIGN KEY (scalar_id) REFERENCES scalars(id) ON DELETE CASCADE
);

Relationship (Join) Tables. These tables model the many-to-many relationships defined by the edges.
-- Links steps to their input expressions.
CREATE TABLE step_inputs (
    step_id TEXT NOT NULL,
    expression_id TEXT NOT NULL,
    sequence_index INTEGER NOT NULL,
    PRIMARY KEY (step_id, sequence_index),
    FOREIGN KEY (step_id) REFERENCES steps(id) ON DELETE CASCADE,
    FOREIGN KEY (expression_id) REFERENCES expressions(id) ON DELETE CASCADE
);

-- Links steps to their output expressions.
CREATE TABLE step_outputs (
    step_id TEXT NOT NULL,
    expression_id TEXT NOT NULL,
    sequence_index INTEGER NOT NULL,
    PRIMARY KEY (step_id, sequence_index),
    FOREIGN KEY (step_id) REFERENCES steps(id) ON DELETE CASCADE,
    FOREIGN KEY (expression_id) REFERENCES expressions(id) ON DELETE CASCADE
);

-- Links steps to their feed-in values.
CREATE TABLE step_feeds (
    step_id TEXT NOT NULL,
    feed_id TEXT NOT NULL,
    sequence_index INTEGER NOT NULL,
    PRIMARY KEY (step_id, sequence_index),
    FOREIGN KEY (step_id) REFERENCES steps(id) ON DELETE CASCADE,
    FOREIGN KEY (feed_id) REFERENCES feeds(id) ON DELETE CASCADE
);

-- Join tables for the polymorphic 'HAS_SYMBOL' relationships
-- These tables link expressions and feeds to the various symbol types they contain.

-- Expression-to-Symbol Links
CREATE TABLE expression_symbols (
    expression_id TEXT NOT NULL,
    symbol_id TEXT NOT NULL,
    symbol_type TEXT NOT NULL, -- e.g., 'scalar', 'vector', 'matrix', 'operation', 'quantum_operator'
    PRIMARY KEY (expression_id, symbol_id),
    FOREIGN KEY (expression_id) REFERENCES expressions(id) ON DELETE CASCADE
);

-- Feed-to-Symbol Links
CREATE TABLE feed_symbols (
    feed_id TEXT NOT NULL,
    symbol_id TEXT NOT NULL,
    symbol_type TEXT NOT NULL, -- e.g., 'scalar', 'vector', 'matrix', 'operation', 'quantum_operator'
    PRIMARY KEY (feed_id, symbol_id),
    FOREIGN KEY (feed_id) REFERENCES feeds(id) ON DELETE CASCADE
);

Rationale and Explanation

Direct Relationships (`derivation` -> `step`, `step` -> `inference_rule`): The `HAS_STEP` edge connects one derivation to many steps. This is a classic one-to-many relationship, best modeled by adding a `derivation_id` foreign key to the `steps` table. The `sequence_index` from the edge property is also placed here. The same logic applies to the `HAS_INFERENCE_RULE` edge.

Step Inputs/Outputs/Feeds: A `step` can have multiple ordered inputs, outputs, and feeds. These are many-to-many relationships modeled with the join tables `step_inputs`, `step_outputs`, and `step_feeds`. The `sequence_index` property from the edge is a crucial part of the primary key to maintain order and uniqueness.

One-to-One Relationship (`scalar` -> `value_and_units`): The `HAS_VALUE` edge is modeled by adding a `scalar_id` foreign key to the `values_and_units` table. Adding a `UNIQUE` constraint on `scalar_id` ensures that each scalar can have at most one `value_and_units` entry, perfectly modeling the one-to-one relationship.

Handling Polymorphism (`HAS_SYMBOL`): This is the most complex relationship to model. An `expression` or a `feed` can contain many symbols of different types.

  • Chosen Approach: The schema uses two tables, `expression_symbols` and `feed_symbols`. Each table stores the ID of the source (`expression_id` or `feed_id`), the ID of the target symbol (`symbol_id`), and a `symbol_type` column to tell you which table to join with to get the symbol's details (e.g., `scalars`, `vectors`).
  • Advantage: This approach is pragmatic. It keeps the number of tables manageable.
  • Trade-off: It sacrifices full relational integrity because you cannot create a native `FOREIGN KEY` on `symbol_id` that points to five different tables. You would enforce this relationship at the application layer (e.g., when you retrieve an `expression`'s symbols, you read the `symbol_type` and then query the appropriate table).

In my assessment, the SQL schema is inelegant compared to the property graph representation.

The use of `ON DELETE CASCADE` is a guess by the LLM and is not always the desired behavior. (The `ON DELETE CASCADE` means that if the parent gets deleted then automatically delete the linked child.) 
The alternative is `ON DELETE RESTRICT` (or `NO ACTION`). It prevents deletion of the parent row as long as any child rows  exist. The database would return an error. This is used when you want to protect the child data.


The above tailored-to-PDG schema is complicated. A simpler schema like the one below incurs a higher query cost.

From Gemini I learned that the EAV (Entity-Attribute-Value) model is a normalized and traditional approach that is portable across almost any relational database.

Saturday, November 1, 2025

links to papers on Hilbert's Sixth problem

https://en.wikipedia.org/wiki/Hilbert%27s_sixth_problem

"A Field Guide to Recent Work on the Foundations of Statistical Mechanics" by R. Frigg
arxiv.org/abs/0804.0399 and
https://romanfrigg.org/wp-content/uploads/writings/SM_Fieldguide.pdf

"Hilbert's Sixth Problem: the endless road to rigour" by A. N. Gorban
https://arxiv.org/abs/1803.03599


"Long time derivation of the Boltzmann equation from hard sphere dynamics" by Yu Deng, Zaher Hani, Xiao Ma
https://arxiv.org/abs/2408.07818

"Hilbert's sixth problem: derivation of fluid equations via Boltzmann's kinetic theory" by Yu Deng, Zaher Hani, Xiao Ma
https://arxiv.org/abs/2503.01800

Friday, October 17, 2025

I know more now than I did a few years ago

Why do I think I know more now than I did a few years ago?

  • Some of my assumptions have been validated or invalidated by experience of implementation
  • Some things I thought I knew have gained more nuance
  • I have gained insight on what needs to be learned - new domains, additional details
  • I better understand better how previously disparate concepts are related
  • Implementation forces decisions about design that a mental model is not confronted with
  • Experience from implementing provides nuance and validation
  • Sometimes I have to undo previously made decisions due to new insights
  • I better understand which paths are deadends and which are worth investment
Interestingly, even though I know that I need to learn more I cannot a priori know which nuance is relevant, which path is a deadend.

Saturday, October 4, 2025

resizing to smaller VPS

I'm currently paying $12/month for my VPS (virtual private server) that has 2GB of RAM and 25GB of storage. I had been paying $6/month previously, but including Neo4j bumped the memory usage to nearly 100%. (CPU usage is averaging around 10% and spikes to 30%.)

I have a few iterations of the source code for the webserver so I'll need to figure out which is actually in use. After logging into the VPS, I see I have two folders:

allofphysics.com
ui_v7_website_flask_json

Using docker images I see the latest image (from 7 months ago) is allofphysicscom-flask. That doesn't help me figure out which repo is in use.

allofphysics.com/docker-compose.yaml has neo4j, whereas ui_v7_website_flask_json/docker-compose.yaml does not. Therefore I'm currently operating out of allofphysics.com/docker-compose.yaml

I have two options: either revert to the "v7" or disable neo4j in "allofphysics.com".

Going with option 1,

~/allofphysics.com$ docker compose down
~/allofphysics.com$ cd ~/ui_v7_website_flask_json
~/ui_v7_website_flask_json$ git pull
~/ui_v7_website_flask_json$ make
Turns out the packages are pretty old. The opencv package wasn't compatible. After I reverted all pip packages to known good version numbers I found the crypto libraries weren't happy. Eventually I was able to get a docker image built.

Next the certs weren't present so I copied those from ~/allofphysics.com/certs/ and that worked.

To free up disk space I ran

docker images | grep none | tr -s ' ' | cut -d' ' -f3 | xargs docker rmi

I powered down the VPS, changed to a smaller size, powered on, and then ran make to get a running webserver.

Tuesday, June 17, 2025

best practices for use of LLMs

I've previously written about best practices for prompts. This post is more abstract.

 

Avoid asking factual questions

The LLM is not a database of facts. Historical events, dates, places are not stored as exact references. LLMs generate their response based on statistical probabilities derived from patterns.

The more widely documented something is, the better the LLM knows it

The LLM's training is roughly proportional to the representation of the information on the Internet. An LLM is more reliable and detailed when discussing common knowledge.

Precise questions using relevant jargon with context yields useful output

Poorly worded questions that do not use domain-specific terminology are less likely to produce clear answers.

Do not trust citations

The LLM does not have citations hard-coded into the network. Citations are most likely to be hallucinations 

Decompose complex tasks and questions into a sequence of iterative prompts

There is a limited amount of "thinking" by the LLM per prompt, so simpler tasks are more likely to produce relevant answers.

Structure your question to produce a page or less of output

Producing a 200 page book from a single prompt devolves into hallucinations after a few pages. Shorter answers are more likely to remain lucid, so phrase your question in a way that can be answered with a small amount of text.

LLMs default to the average

While LLM output can be creative (in unexpected ways), seeking exceptional insight yields the mundane

Simplify your question to a one-shot prompt

Iterative questions are more likely to yield hallucinations


Delegation to an intern who doesn't learn

This can be confusing, as the LLM occasionally knows more than you do.