Previously I've considered the Physics Derivation Graph to be an unlikely dream, in that a complete representation of Physics is infeasible, and even a breadth-first approach is infeasible. My expectation of infeasibility was based on the constraints that data entry would not just require a significant number of person-hours, but the data entry would need to be enacted by highly-qualified people with training tailored to the Physics Derivation Graph.
If the workflow can be segmented into small tasks, LLMs might provide relevant automation of complex tasks that require semantic context.
Overview of workflow
For example,
Given a corpus of papers on arxiv,
identify whether a paper has a derivation.
Given a paper with a derivation, can the steps be identified? (Associate specific inference rule)
Can the steps be verified (e.g., SymPy, Lean)?
For derivations with steps, what variables and constants are used in the paper?
Do these variables and constants appear in other derivations?
That's a combination of text search and semantic awareness of the text content. Not every step has to be "just a human" or "just an LLM" -- an augmented capability is a reasonable workflow.
Another enabling aspect is the Latex representation for arxiv papers. That makes parsing (by regex or LLM) easier.
$ tar xzvf arXiv-2004.04818v2.tar.gz
x elemental.pdf
x fig3.pdf
x sound1.tex
x sound-speeds.pdf
The file sound1.tex is 37110 bytes (36kB), 302 lines, 37108 characters, and 5134 words.
Loading the .tex into Gemini 2.0's prompt uses 10,518 tokens (with the max for Gemini 2.0 being 1,048,574 tokens).
Gemini 2.0 (on 2025-01-07) says
Yes, the article contains a derivation. It presents two approaches to derive the speed of sound, starting from system elasticity and vibrational properties, respectively. Both approaches lead to the same result, which is then used to find the upper bound for the speed of sound.
I ask a follow-up question,
Gemini's response:
I follow with another prompt,
Gemini's response:
Let's see if Gemini can extract symbol definitions:
Gemini's response:
The use of unicode isn't desirable. Let's see if we can force the use of Latex using the prompt
Gemini's response:
Now lets see if Gemini can identify the steps of the derivation.
Gemini's response
Next step: extract the variables used in the above derivation to see if they are present in the Physics Derivation Graph by using the PDG API.
Gemini's response (2025-01-07):
TODO: the PDG API needs to be able to handle tasks associated with a workflow that uses an LLM.
tl;dr: I'm unsuccessful at getting Gemini 2.0 (on 2025-01-07) to relate Brewster's angle to the Law of Refraction. However, Gemini is able to provide the steps for simple (non-trivial) derivations.
Reformatting HTML table
The site https://aistudio.google.com/ provides Gemini 2.0 LLM for free.
To experiment with using it I wanted to provide the inference rules. There are 94 inference rules listed on https://allofphysics.com/list_all_inference_rules in an 8 column table.
After removing javascript, headers, and footers the HTML file has 3869 lines. That takes 32,164 tokens for the prompt
Gemini provided the correct 5 column HTML file. Huzzah!
Brewster's angle and Law of Refraction
Next step is to see if Gemini can use the inference rules for a derivation.
To which Gemini replied (on 2025-01-07),
"Unsure about answer"
That's reasonable, so I removed the inference rules and tried again
Again Gemini replied (on 2025-01-07),
"Unsure about answer"
I then relaxed the constraint about right answers only and encouraged step-by-step reasoning
Gemini's response (on 2025-01-07):
It's not possible to directly derive Brewster's angle from Snell's Law, or vice versa, using standard mathematical manipulations. Brewster's angle is a specific condition related to polarization, while Snell's Law describes refraction in general. Therefore, no sequence of inference rules can relate them directly. They are independent physical principles governing light behavior under different conditions.
Simple derivation
Gemini 2.0 on 2025-01-07 responds with
That's "good enough" in my view to be useful -- associating the relevant inference rule with specific steps in a multi-step derivation!
Running as root doesn't help
As per https://stackoverflow.com/a/71576242 I tried
As a consequence, while I can run the container,
I am unable to mount the local directory
separate the property graph backend from an API and have the web interface call the API
find a lighter weight alternative to neo4j backend
get a minimum viable product working using the existing property graph backend
separate static documentation pages from the dynamic paths
get docker working on my new Mac
That last one is the blocker for everything else, so get docker working on my new Mac is priority. The good news is that the code does run on my old MacBook Air.
Next I'll work on getting a minimum viable product working using the existing property graph backend
This post defines discrete levels of software quality. There are many dimensions of what "software quality" means, so the discrete levels described here are an oversimplification. The attribute clusters below are typically observed together.
software quality level 1
"Worked for me at one point."
Code does not compile
No documentation
Required dependencies not included
Purpose of code is unclear
Hard coded numerical values
Code is wrong - severe design and implementation flaws