Physics Derivation Graph: llm

Showing posts with label llm. Show all posts

Tuesday, June 17, 2025

best practices for use of LLMs

I've previously written about best practices for prompts. This post is more abstract.

Avoid asking factual questions

The LLM is not a database of facts. Historical events, dates, places are not stored as exact references. LLMs generate their response based on statistical probabilities derived from patterns.

The more widely documented something is, the better the LLM knows it

The LLM's training is roughly proportional to the representation of the information on the Internet. An LLM is more reliable and detailed when discussing common knowledge.

Precise questions using relevant jargon with context yields useful output

Poorly worded questions that do not use domain-specific terminology are less likely to produce clear answers.

Do not trust citations

The LLM does not have citations hard-coded into the network. Citations are most likely to be hallucinations

Decompose complex tasks and questions into a sequence of iterative prompts

There is a limited amount of "thinking" by the LLM per prompt, so simpler tasks are more likely to produce relevant answers.

Structure your question to produce a page or less of output

Producing a 200 page book from a single prompt devolves into hallucinations after a few pages. Shorter answers are more likely to remain lucid, so phrase your question in a way that can be answered with a small amount of text.

LLMs default to the average

While LLM output can be creative (in unexpected ways), seeking exceptional insight yields the mundane

Simplify your question to a one-shot prompt

Iterative questions are more likely to yield hallucinations

Delegation to an intern who doesn't learn

This can be confusing, as the LLM occasionally knows more than you do.

Thursday, January 9, 2025

analyzing the PDG repo using Gemini

This post documents the first time I used an LLM to assess the Physics Derivation Graph

All of the files in the git repo (tl;dr: Gemini crashed)

I uploaded all of the Physics Derivation Graph software and asked Gemini 2.0

The git repo has 96 files; 14 .py and 58 .html

$ find . -type f | wc -l
      96

10503 lines of Python, 4337 lines of HTML, and 968kB

Uploading the files uses 230,346 tokens of an available window size of 1,048,576 tokens on 2025-01-09.

Here's the prompt I provided along with the files:

The following files are from a project that works but is incomplete. 
Review the code and provide suggestions of what the most important 
tasks are to work on next.

Result: Gemini crashed ("An internal error has occurred.") after 40 seconds of processing. Two more tries also didn't work.

Just the .py and .html files

Removed everything except for 13 py and 58 html files, down to 816kB.

Now the token count is 209,717

The following files are from a project that works but is incomplete. 
Review the code and provide suggestions of what the most important 
tasks are to work on next.

That prompt resulted in Gemini making 7 very reasonable suggestions. I opened two new issues on github as a result of Gemini's assessment.

In the same context I asked

Briefly describe the purpose and goals of the software project. 
Highlight its key features and benefits.

The result from Gemini offered relevant text that I have included in the README.md file. The text is literally better than what I could have written!

Next I asked for a visualization,

Create a visual diagram of the program's flow, showing the sequence 
of operations, decision points, and loops. Create a markdown file 
with ASCII art for use in a README.md file.

That produced some almost-correctly-formatted ASCII art. The caveats provided by Gemini were very reasonable

"This is a simplified representation. The actual program flow involves many more specific states and transitions, not shown for simplicity."

Lastly I asked Gemini

Document the public APIs and web interfaces exposed by the software, 
including input/output formats and usage examples. Provide the answer 
as a single HTML file.

Gemini's output was mostly HTML (some Markdown) and very useful.

Tuesday, January 7, 2025

use of LLMs could make the Physics Derivation Graph more feasible

Previously I've considered the Physics Derivation Graph to be an unlikely dream, in that a complete representation of Physics is infeasible, and even a breadth-first approach is infeasible. My expectation of infeasibility was based on the constraints that data entry would not just require a significant number of person-hours, but the data entry would need to be enacted by highly-qualified people with training tailored to the Physics Derivation Graph.

If the workflow can be segmented into small tasks, LLMs might provide relevant automation of complex tasks that require semantic context.

Overview of workflow

For example,

Given a corpus of papers on arxiv,
identify whether a paper has a derivation.
Given a paper with a derivation, can the steps be identified? (Associate specific inference rule)
Can the steps be verified (e.g., SymPy, Lean)?
For derivations with steps, what variables and constants are used in the paper?
Do these variables and constants appear in other derivations?

That's a combination of text search and semantic awareness of the text content. Not every step has to be "just a human" or "just an LLM" -- an augmented capability is a reasonable workflow.

Another enabling aspect is the Latex representation for arxiv papers. That makes parsing (by regex or LLM) easier.

The above process could also be applied to pages on Wikipedia. See https://github.com/allofphysicsgraph/proofofconcept/wiki/All-Branches-of-Physics

Analysis of a specific paper

I had previously explored extraction of content from a specific paper in https://physicsderivationgraph.blogspot.com/2021/08/steps-to-convert-published-scientific.html. The following is an exploration of how to extract a derivation using Gemini 2.0 LLM on 2025-01-07.

For "Speed of sound from fundamental physical constants" the .tex files are available from https://arxiv.org/abs/2004.04818

$ tar xzvf arXiv-2004.04818v2.tar.gz 
x elemental.pdf
x fig3.pdf
x sound1.tex
x sound-speeds.pdf

The file sound1.tex is 37110 bytes (36kB), 302 lines, 37108 characters, and 5134 words.

Loading the .tex into Gemini 2.0's prompt uses 10,518 tokens (with the max for Gemini 2.0 being 1,048,574 tokens).

The following is a scientific article. Answer the 
question about the article below.

```latex
\documentclass[aps,prl,groupedaddress,fleqn,twocolumn,10pt]{revtex4}
\pdfoutput=1
\usepackage{graphicx}

...snipped...

may be requested from the authors.
\end{document} 
```

Question: Does this article contain a derivation?

Answer:

Gemini 2.0 (on 2025-01-07) says

Yes, the article contains a derivation. It presents two approaches to derive the speed of sound, starting from system elasticity and vibrational properties, respectively. Both approaches lead to the same result, which is then used to find the upper bound for the speed of sound.

I ask a follow-up question,

What is the latex equation that is the result for the 
derivation for the speed of sound in this paper?

Gemini's response:

The main result of the derivation for the upper 
bound of the speed of sound, as given in the paper, is:

```latex
\frac{v_u}{c}=\alpha\left(\frac{m_e}{2m_p}\right)^{\frac{1}{2}}
```

This is equation (1) in the paper (though it is labeled (v0)).

There is also an intermediate result for the speed 
of sound *v* before it is specialized to the upper bound *vu*:

```latex
v=\alpha\left(\frac{m_e}{2m}\right)^{\frac{1}{2}}c
```

This is equation (6), labeled (v00).

I follow with another prompt,

What latex equation is the start of the derivation in this article?

Provide the latex for the equation and 
the equation label from the article.

Gemini's response:

The starting equation for the first derivation approach, 
based on system elasticity, is:

```latex
v=\left({\frac{M}{\rho}}\right)^{\frac{1}{2}}
```

This equation does not have a label.

The second approach, based on vibrational properties starts with:

```latex
v=\frac{1}{\pi}\omega_{\rm D}a.
```
This is equation (7), labeled (v001).

Let's see if Gemini can extract symbol definitions:

Using only the article and the latex equation
```latex
v=\frac{1}{\pi}\omega_{\rm D}a.
```
what is the name or meaning of each variable used in the equation?

Gemini's response:

Okay, using only the provided equation and the article, here's the breakdown of each variable:

*   **v**: This represents the **longitudinal speed 
of sound**. The article states, "The longitudinal 
speed of sound, $v$, can be evaluated as the phase velocity..."

*   **ω_D**: This is the **Debye frequency**. The article 
defines it as "$\omega_{\rm D}$ are Debye frequency..." 
in the context of the Debye approximation.

*   **a**: This is the **interatomic (inter-molecule) 
separation**. The article states, 
"Using $k_{\rm D}=\frac{\pi}{a}$, where $a$ is the 
interatomic (inter-molecule) separation, gives..."

The use of unicode isn't desirable. Let's see if we can force the use of Latex using the prompt

please restate the previous answer and use 
latex to express the symbols and equations.

Gemini's response:

Okay, here's the restatement of the previous answer, 
using LaTeX to express the symbols and equations:

Using only the provided equation and the article, 
here's the breakdown of each variable:

*   $v$: This represents the **longitudinal speed of 
sound**. The article states, "The longitudinal speed 
of sound, $v$, can be evaluated as the phase velocity..."

*   $\omega_{\rm D}$: This is the **Debye frequency**. 
The article defines it as "$\omega_{\rm D}$ are Debye 
frequency..." in the context of the Debye approximation.

*   $a$: This is the **interatomic (inter-molecule) 
separation**. The article states, 
"Using $k_{\rm D}=\frac{\pi}{a}$, where $a$ is the 
interatomic (inter-molecule) separation, gives..."

The equation itself is:

```latex
v=\frac{1}{\pi}\omega_{\rm D}a.
```

Now lets see if Gemini can identify the steps of the derivation.

The starting equation for the first derivation approach, 
based on system elasticity is
```latex
v=\left({\frac{M}{\rho}}\right)^{\frac{1}{2}}
```

The upper bound of the speed of sound, as given in the paper, is:
\frac{v_u}{c}=\alpha\left(\frac{m_e}{2m_p}\right)^{\frac{1}{2}}

What are the steps of the derivation to get from the starting
equation to the final result? 

Use Latex for equations and symbols in your answer. 
Explain how to get from one step to the next in the derivation.

Gemini's response

Okay, let's trace the steps of the derivation from 
the initial equation based on system elasticity to 
the final upper bound for the speed of sound.

**Starting Point:**

The derivation begins with the equation for the 
longitudinal speed of sound based on system elasticity:

```latex
v=\left({\frac{M}{\rho}}\right)^{\frac{1}{2}}
```
where:
*  $v$ is the speed of sound.
*  $M$ is the longitudinal modulus (related to bulk and shear modulus).
*  $\rho$ is the density.

**Step 1: Relating Modulus to Bonding Energy**

The article states that elastic constants are 
related to the density of electromagnetic energy. 
Specifically, the modulus *M* is related to the bonding 
energy *E* and the interatomic separation *a* with a 
proportionality coefficient $f$.
```latex
M = f\frac{E}{a^3}
```
where *f* is a proportionality coefficient in the range of about 1-6

**Step 2: Expressing Density in Terms of Atomic Mass**

The density, $\rho$, can be expressed in terms 
of the mass *m* of the atom or molecule and the 
interatomic separation $a$. Using the relationship 
between mass, density, and volume, we have the volume 
on the order of $a^3$, and hence:
```latex
m = \rho a^3
```
or
```latex
\rho = \frac{m}{a^3}
```

**Step 3: Substituting into the Speed of Sound Equation**

Substituting the expressions for *M* and $\rho$ 
in terms of E, m and a into the original equation for $v$ :

```latex
v=\left({\frac{M}{\rho}}\right)^{\frac{1}{2}}
```

gives

```latex
v = \left( \frac{f\frac{E}{a^3}}{\frac{m}{a^3}}\right)^{\frac{1}{2}} = \left(f \frac{E}{m}\right)^{\frac{1}{2}}
```
Since the factor  $f^{\frac{1}{2}}$ is about 1-2 
and can be dropped in an approximate evaluation of $v$

```latex
v = \left(\frac{E}{m}\right)^{\frac{1}{2}}
```

**Step 4: Introducing the Rydberg Energy**
The bonding energy in condensed phases is approximately 
equal to the Rydberg energy, $E_{\rm R}$
```latex
E_{\rm R}=\frac{m_ee^4}{32\pi^2\epsilon_0^2\hbar^2}
```
Thus we can replace *E* with $E_{\rm R}$ in the equation for *v*.
```latex
v = \left(\frac{E_{\rm R}}{m}\right)^{\frac{1}{2}}
```

**Step 5: Rewriting the Rydberg Energy**

The Rydberg energy can be rewritten in terms of the 
fine-structure constant $\alpha$ and the speed of light *c* as:
```latex
E_{\rm R} = \frac{1}{2}m_e \left(\alpha c\right)^2
```
where $\alpha=\frac{1}{4\pi\epsilon_0}\frac{e^2}{\hbar c}$ 
is the fine structure constant.

**Step 6: Substituting the Rewritten Rydberg Energy**

Substitute the rewritten Rydberg energy into the equation for *v*:
```latex
v = \left(\frac{\frac{1}{2}m_e \left(\alpha c\right)^2}{m}\right)^{\frac{1}{2}}
```
which simplifies to:
```latex
v=\alpha\left(\frac{m_e}{2m}\right)^{\frac{1}{2}}c
```

**Step 7: Defining the Upper Bound**

The upper bound of the speed of sound, $v_u$, is found 
by setting the mass $m$ to be the proton mass $m_p$, 
which is the lightest relevant atomic mass scale. This 
corresponds to the case of atomic hydrogen.
```latex
v_u = \alpha \left(\frac{m_e}{2m_p}\right)^{\frac{1}{2}}c
```
Dividing by c gives the final result:

```latex
\frac{v_u}{c}=\alpha\left(\frac{m_e}{2m_p}\right)^{\frac{1}{2}}
```
Thus, by relating the speed of sound to the density, 
modulus, and then to the bonding energy and finally, 
the Rydberg energy, the derivation arrives at the upper 
bound of the speed of sound expressed in terms of fundamental constants.

Next step: extract the variables used in the above derivation to see if they are present in the Physics Derivation Graph by using the PDG API.

What are all the symbols and their definitions used in
the derivation steps?

Do not include the associated equations -- just provide 
a list of latex symbols and the definitions as a list.

Use Latex for equations and symbols in your answer.

Gemini's response (2025-01-07):

Okay, here's a list of all the symbols and their 
definitions used in the derivation steps, without 
including the associated equations:

*   $v$: longitudinal speed of sound
*   $M$: longitudinal modulus (related to bulk and shear modulus)
*   $\rho$: density
*   $f$: proportionality coefficient
*   $E$: bonding energy
*   $a$: interatomic separation
*  $m$: mass of an atom or molecule
*  $E_{\rm R}$: Rydberg energy
*   $m_e$: electron mass
*   $e$: electron charge
*   $\epsilon_0$: permittivity of free space
*   $\hbar$: reduced Planck constant
*  $\alpha$: fine-structure constant
*   $c$: speed of light in vacuum
*   $m_p$: proton mass
* $v_u$: upper bound for speed of sound

TODO: the PDG API needs to be able to handle tasks associated with a workflow that uses an LLM.

There are a few additional steps outlined in https://physicsderivationgraph.blogspot.com/2021/08/steps-to-convert-published-scientific.html, like identifying inference rules between equations. I explored that in this post: https://physicsderivationgraph.blogspot.com/2025/01/using-gemini-llm-to-create-derivation.html.

using Gemini LLM to create derivation steps using inference rules

Previous related post: LLM prompts for PDG.

tl;dr: I'm unsuccessful at getting Gemini 2.0 (on 2025-01-07) to relate Brewster's angle to the Law of Refraction. However, Gemini is able to provide the steps for simple (non-trivial) derivations.

Reformatting HTML table

The site https://aistudio.google.com/ provides Gemini 2.0 LLM for free. To experiment with using it I wanted to provide the inference rules. There are 94 inference rules listed on https://allofphysics.com/list_all_inference_rules in an 8 column table. After removing javascript, headers, and footers the HTML file has 3869 lines. That takes 32,164 tokens for the prompt

The following is an HTML table. The table has 8 columns currently. 
Please remove the last 3 columns from the table and provide the HTML 
result. No explanation is necessary -- just show the HTML table with 
the 3 columns ("Used in derivations", "number of uses", "assumptions") 
removed. The new HTML table should have 5 columns.

```html
<!DOCTYPE html>
<HTML lang='en'>
  <body>

<table border="1"  class="sortable" id="infrule_table">
  <tr>
    <th>inf rule name</th>
    <th>inputs</th>
...

Gemini provided the correct 5 column HTML file. Huzzah!

Brewster's angle and Law of Refraction

Next step is to see if Gemini can use the inference rules for a derivation.

Answer the question based on the context below. Keep 
the answer short and concise. Respond "Unsure about 
answer" if not sure about the answer.

Context: Latex is software for document preparation 
to generate PDF files. Mathematical equations can be 
expressed in Latex using markup syntax. 

Brewster's angle is the angle of incidence at which 
light is perfectly transmitted through a transparent 
dielectric surface with no reflection. It's also known 
as the polarizing angle.

Expressed in Latex, Brewster's angle is
```latex
\theta_{\rm Brewster} = \arctan{ \left( \frac{ n_1 }{ n_2 } \right) }
```

The law of refraction, also called Snell's Law, expressed in Latex, is
```latex
n_1 \sin( \theta_1 ) = n_2 \sin( \theta_2 )
```

Context: a derivation in mathematical Physics consists 
of a sequence of steps. Each step relates mathematical 
expressions to an inference rule. An expression is 
comprised of symbols and operators. An inference rule 
typically transforms input expressions into output expressions.

Inference rules:
```html
<!DOCTYPE html>
<HTML lang='en'>
  <body>

<table border="1"  class="sortable" id="infrule_table">
  <tr>
    <th>inf rule name</th>
    <th>inputs</th>
    <th>outputs</th>
    <th>feeds</th>
    <th>Latex</th>
  </tr>

  <tr>
    <td><a name="LHS of expr 1 equals LHS of expr 2">LHS of expr 1 equals LHS of expr 2</a></td>
    <td>2</td>
    <td>1</td>
    <td>0</td>
    <td>LHS of Eq.~ref{eq:#1} is equal to LHS of Eq.~ref{eq:#2}; yields Eq.~ref{eq:#3}.</td>
  </tr>

  <tr>
    <td><a name="RHS of expr 1 equals RHS of expr 2">RHS of expr 1 equals RHS of expr 2</a></td>
    <td>2</td>
    <td>1</td>
    <td>0</td>
    <td>RHS of Eq.~ref{eq:#1} is equal to RHS of Eq.~ref{eq:#2}; yields Eq.~ref{eq:#3}.</td>
  </tr>

...snipped...

  <tr>
    <td><a name="take curl of both sides">take curl of both sides</a></td>
    <td>1</td>
    <td>1</td>
    <td>0</td>
    <td>Apply curl to both sides of Eq.~ref{eq:#1}; yields Eq.~ref{eq:#2}.</td>
  </tr>

  <caption>
    <a name="table of inference rules">
      Physics Derivation Graph: 94 Inference Rules
    </a>
  </caption>
</table>

    </body>
</html>
```

Question: Using the inference rules above, 
what are the derivation steps that relate 
Brewster's angle to the Law of refraction?

Answer:

To which Gemini replied (on 2025-01-07),

"Unsure about answer"

That's reasonable, so I removed the inference rules and tried again

Answer the question based on the context below. Keep 
the answer short and concise. Respond "Unsure about 
answer" if not sure about the answer.

Context: Latex is software for document preparation 
to generate PDF files. Mathematical equations can be 
expressed in Latex using markup syntax. 

Brewster's angle is the angle of incidence at which 
light is perfectly transmitted through a transparent 
dielectric surface with no reflection. It's also known 
as the polarizing angle.

Expressed in Latex, Brewster's angle is
```latex
\theta_{\rm Brewster} = \arctan{ \left( \frac{ n_1 }{ n_2 } \right) }
```

The law of refraction, also called Snell's Law, expressed in Latex, is
```latex
n_1 \sin( \theta_1 ) = n_2 \sin( \theta_2 )
```

Context: a derivation in mathematical Physics consists 
of a sequence of steps. Each step relates mathematical 
expressions to an inference rule. An expression is 
comprised of symbols and operators. An inference rule 
typically transforms input expressions into output expressions.

Question: what are the derivation steps that relate 
Brewster's angle to the Law of refraction?

Answer:

Again Gemini replied (on 2025-01-07),

"Unsure about answer"

I then relaxed the constraint about right answers only and encouraged step-by-step reasoning

Answer the question based on the context below. 
Keep the answer short and concise. Step through 
the reasoning step by step, explaining why each 
step is correct before proceeding to the next step 
in the derivation.

Context: Latex is software for document preparation 
to generate PDF files. Mathematical equations can be 
expressed in Latex using markup syntax.

Brewster's angle is the angle of incidence at which 
light is perfectly transmitted through a transparent 
dielectric surface with no reflection. It's also known 
as the polarizing angle.

Expressed in Latex, Brewster's angle is
```latex
\theta_{\rm Brewster} = \arctan{ \left( \frac{ n_1 }{ n_2 } \right) }
```

The law of refraction, also called Snell's Law, expressed in Latex, is
```latex
n_1 \sin( \theta_1 ) = n_2 \sin( \theta_2 )
```

Context: a derivation in mathematical Physics consists 
of a sequence of steps. Each step relates mathematical 
expressions to an inference rule. An expression is 
comprised of symbols and operators. An inference rule 
typically transforms input expressions into output expressions.

Question: What are the derivation steps that relate 
Brewster's angle to the Law of refraction?

Answer:

Gemini's response (on 2025-01-07):

It's not possible to directly derive Brewster's angle from Snell's Law, or vice versa, using standard mathematical manipulations. Brewster's angle is a specific condition related to polarization, while Snell's Law describes refraction in general. Therefore, no sequence of inference rules can relate them directly. They are independent physical principles governing light behavior under different conditions.

Simple derivation

Answer the question based on the context below. 
Keep the answer short and concise. Step through 
the reasoning step by step, explaining why each 
step is correct before proceeding to the next step 
in the derivation.

Context: Latex is software for document preparation 
to generate PDF files. Mathematical equations can be 
expressed in Latex using markup syntax.

Context: a derivation in mathematical Physics consists 
of a sequence of steps. Each step relates mathematical 
expressions to an inference rule. An expression is 
comprised of symbols and operators. An inference rule 
typically transforms input expressions into output expressions.

Here are four inference rules that are used in derivations:
```latex
"declare initial expression"
number of inputs: 0; feeds: 0; outputs: 1
Eq.~\ref{eq:#1} is an initial expression.

"add X to both sides"
number of inputs: 1; feeds: 1; outputs: 1
Add $#1$ to both sides of Eq.~\ref{eq:#2}; yields Eq.~\ref{eq:#3}.

"declare final expression"
number of inputs: 1; feeds: 0; outputs: 0
Eq.~\ref{eq:#1} is one of the final expressions.
```

Question: What are the derivation steps that relate 
```latex
a = b
``` 
to the expression
```latex
a + c = b + c
```

Answer:

Gemini 2.0 on 2025-01-07 responds with

1. **"declare initial expression"**: The initial expression is `a = b`.
2. **"add X to both sides"**: Adding `c` to both sides of `a = b` yields `a + c = b + c`.

That's "good enough" in my view to be useful -- associating the relevant inference rule with specific steps in a multi-step derivation!

Non-trivial derivation

Answer the question based on the context below. 
Keep the answer short and concise. Step through 
the reasoning step by step, explaining why each 
step is correct before proceeding to the next step 
in the derivation.

Context: Latex is software for document preparation 
to generate PDF files. Mathematical equations can be 
expressed in Latex using markup syntax.

Context: a derivation in mathematical Physics consists 
of a sequence of steps. Each step relates mathematical 
expressions to an inference rule. An expression is 
comprised of symbols and operators. An inference rule 
typically transforms input expressions into output expressions.

Here are four inference rules that are used in derivations:
```latex
"declare initial expression"
number of inputs: 0; feeds: 0; outputs: 1
Eq.~\ref{eq:#1} is an initial expression.

"add X to both sides"
number of inputs: 1; feeds: 1; outputs: 1
Add $#1$ to both sides of Eq.~\ref{eq:#2}; yields Eq.~\ref{eq:#3}.

"substitute LHS of expr 1 into expr 2	"
number of inputs: 2; feeds: 0; outputs: 1
Substitute LHS of Eq.~\ref{eq:#1} into Eq.~\ref{eq:#2}; yields Eq.~\ref{eq:#3}.	

"swap LHS with RHS"	
number of inputs: 1; feeds: 0; outputs: 0
Swap LHS of Eq.~\ref{eq:#1} with RHS; yields Eq.~\ref{eq:#2}.	

"subtract X from both sides"
number of inputs: 1; feeds: 1; outputs: 1
Subtract $#1$ from both sides of Eq.~\ref{eq:#2}; yields Eq.~\ref{eq:#3}.	

"declare final expression"
number of inputs: 1; feeds: 0; outputs: 0
Eq.~\ref{eq:#1} is one of the final expressions.
```

Question: Using the inference rules listed above, what are the derivation steps that relate 
```latex
n_1 \sin( \theta_{\rm Brewster} ) = n_2 \sin( \theta_{\rm refracted} )
``` 
and
```latex
\theta_{\rm Brewster} + \theta_{\rm refracted} = 90^{\circ}
```
to the expression
```latex
n_1 \sin( \theta_{\rm Brewster} ) = n_2 \sin( 90^{\circ} - \theta_{\rm Brewster} )
```

Answer:

Gemini 2.0 on 2025-01-07 responds with

1. **"declare initial expression"**: The first initial expression is `n_1 \sin( \theta_{\rm Brewster} ) = n_2 \sin( \theta_{\rm refracted} )`.
2. **"declare initial expression"**: The second initial expression is `\theta_{\rm Brewster} + \theta_{\rm refracted} = 90^{\circ}`.
3. **"subtract X from both sides"**: Subtract `\theta_{\rm Brewster}` from both sides of `\theta_{\rm Brewster} + \theta_{\rm refracted} = 90^{\circ}` to get `\theta_{\rm refracted} = 90^{\circ} - \theta_{\rm Brewster}`.
4. **"substitute LHS of expr 1 into expr 2	"**: Substitute `\theta_{\rm refracted}` from `\theta_{\rm refracted} = 90^{\circ} - \theta_{\rm Brewster}` into `n_1 \sin( \theta_{\rm Brewster} ) = n_2 \sin( \theta_{\rm refracted} )` to get `n_1 \sin( \theta_{\rm Brewster} ) = n_2 \sin( 90^{\circ} - \theta_{\rm Brewster} )`.

Tuesday, December 19, 2023

LLM that includes the concept of inference rules

Question: What's the difference between a plain old search engine and an LLM+RAG?
Answer: LLM+RAG provides an experience like semantic search capability plus synthesis but without the need for semantic tagging on the front-end or the back-end.
[https://www.sbert.net/examples/applications/semantic-search/README.html#semantic-search]

Relevance to the Physics Derivation Graph: add the following to an existing large language model (LLM)

the list of inference rules for the Physics Derivation Graph
examples of Latex-to-Sage conversion
example Lean4 proofs

"fine tuning" versus "context provision"

How is "context provision" different from RAG?

What's the difference between a transformer and a model?

https://huggingface.co/docs/transformers/main/model_doc/llama2

https://huggingface.co/meta-llama

specifically

https://huggingface.co/meta-llama/Llama-2-7b-chat

output of help for llama.cpp

docker run -it --rm  -v `pwd`:/scratch llama-cpp-with-mistral-7b-v0.1.q6_k:2023-12-22 /bin/bash 
root@dc98ac4a23d5:/opt/llama.cpp# ./main -h

usage: ./main [options]

options:
  -h, --help            show this help message and exit
      --version         show version and build info
  -i, --interactive     run in interactive mode
  --interactive-first   run in interactive mode and wait for input right away
  -ins, --instruct      run in instruction mode (use with Alpaca models)
  -cml, --chatml        run in chatml mode (use with ChatML-compatible models)
  --multiline-input     allows you to write or paste multiple lines without ending each in '\'
  -r PROMPT, --reverse-prompt PROMPT
                        halt generation at PROMPT, return control in interactive mode
                        (can be specified more than once for multiple prompts).
  --color               colorise output to distinguish prompt and user input from generations
  -s SEED, --seed SEED  RNG seed (default: -1, use random seed for < 0)
  -t N, --threads N     number of threads to use during generation (default: 20)
  -tb N, --threads-batch N
                        number of threads to use during batch and prompt processing (default: same as --threads)
  -p PROMPT, --prompt PROMPT
                        prompt to start generation with (default: empty)
  -e, --escape          process prompt escapes sequences (\n, \r, \t, \', \", \\)
  --prompt-cache FNAME  file to cache prompt state for faster startup (default: none)
  --prompt-cache-all    if specified, saves user input and generations to cache as well.
                        not supported with --interactive or other interactive options
  --prompt-cache-ro     if specified, uses the prompt cache but does not update it.
  --random-prompt       start with a randomized prompt.
  --in-prefix-bos       prefix BOS to user inputs, preceding the `--in-prefix` string
  --in-prefix STRING    string to prefix user inputs with (default: empty)
  --in-suffix STRING    string to suffix after user inputs with (default: empty)
  -f FNAME, --file FNAME
                        prompt file to start generation.
  -n N, --n-predict N   number of tokens to predict (default: -1, -1 = infinity, -2 = until context filled)
  -c N, --ctx-size N    size of the prompt context (default: 512, 0 = loaded from model)
  -b N, --batch-size N  batch size for prompt processing (default: 512)
  --samplers            samplers that will be used for generation in the order, separated by ';', for example: "top_k;tfs;typical;top_p;min_p;temp"
  --sampling-seq        simplified sequence for samplers that will be used (default: kfypmt)
  --top-k N             top-k sampling (default: 40, 0 = disabled)
  --top-p N             top-p sampling (default: 0.9, 1.0 = disabled)
  --min-p N             min-p sampling (default: 0.1, 0.0 = disabled)
  --tfs N               tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
  --typical N           locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
  --repeat-last-n N     last n tokens to consider for penalize (default: 64, 0 = disabled, -1 = ctx_size)
  --repeat-penalty N    penalize repeat sequence of tokens (default: 1.1, 1.0 = disabled)
  --presence-penalty N  repeat alpha presence penalty (default: 0.0, 0.0 = disabled)
  --frequency-penalty N repeat alpha frequency penalty (default: 0.0, 0.0 = disabled)
  --mirostat N          use Mirostat sampling.
                        Top K, Nucleus, Tail Free and Locally Typical samplers are ignored if used.
                        (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
  --mirostat-lr N       Mirostat learning rate, parameter eta (default: 0.1)
  --mirostat-ent N      Mirostat target entropy, parameter tau (default: 5.0)
  -l TOKEN_ID(+/-)BIAS, --logit-bias TOKEN_ID(+/-)BIAS
                        modifies the likelihood of token appearing in the completion,
                        i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello',
                        or `--logit-bias 15043-1` to decrease likelihood of token ' Hello'
  --grammar GRAMMAR     BNF-like grammar to constrain generations (see samples in grammars/ dir)
  --grammar-file FNAME  file to read grammar from
  --cfg-negative-prompt PROMPT
                        negative prompt to use for guidance. (default: empty)
  --cfg-negative-prompt-file FNAME
                        negative prompt file to use for guidance. (default: empty)
  --cfg-scale N         strength of guidance (default: 1.000000, 1.0 = disable)
  --rope-scaling {none,linear,yarn}
                        RoPE frequency scaling method, defaults to linear unless specified by the model
  --rope-scale N        RoPE context scaling factor, expands context by a factor of N
  --rope-freq-base N    RoPE base frequency, used by NTK-aware scaling (default: loaded from model)
  --rope-freq-scale N   RoPE frequency scaling factor, expands context by a factor of 1/N
  --yarn-orig-ctx N     YaRN: original context size of model (default: 0 = model training context size)
  --yarn-ext-factor N   YaRN: extrapolation mix factor (default: 1.0, 0.0 = full interpolation)
  --yarn-attn-factor N  YaRN: scale sqrt(t) or attention magnitude (default: 1.0)
  --yarn-beta-slow N    YaRN: high correction dim or alpha (default: 1.0)
  --yarn-beta-fast N    YaRN: low correction dim or beta (default: 32.0)
  --ignore-eos          ignore end of stream token and continue generating (implies --logit-bias 2-inf)
  --no-penalize-nl      do not penalize newline token
  --temp N              temperature (default: 0.8)
  --logits-all          return logits for all tokens in the batch (default: disabled)
  --hellaswag           compute HellaSwag score over random tasks from datafile supplied with -f
  --hellaswag-tasks N   number of tasks to use when computing the HellaSwag score (default: 400)
  --keep N              number of tokens to keep from the initial prompt (default: 0, -1 = all)
  --draft N             number of tokens to draft for speculative decoding (default: 8)
  --chunks N            max number of chunks to process (default: -1, -1 = all)
  -np N, --parallel N   number of parallel sequences to decode (default: 1)
  -ns N, --sequences N  number of sequences to decode (default: 1)
  -pa N, --p-accept N   speculative decoding accept probability (default: 0.5)
  -ps N, --p-split N    speculative decoding split probability (default: 0.1)
  -cb, --cont-batching  enable continuous batching (a.k.a dynamic batching) (default: disabled)
  --mmproj MMPROJ_FILE  path to a multimodal projector file for LLaVA. see examples/llava/README.md
  --image IMAGE_FILE    path to an image file. use with multimodal models
  --mlock               force system to keep model in RAM rather than swapping or compressing
  --no-mmap             do not memory-map model (slower load but may reduce pageouts if not using mlock)
  --numa                attempt optimizations that help on some NUMA systems
                        if run without this previously, it is recommended to drop the system page cache before using this
                        see https://github.com/ggerganov/llama.cpp/issues/1437
  --verbose-prompt      print prompt before generation
  -dkvc, --dump-kv-cache
                        verbose print of the KV cache
  -nkvo, --no-kv-offload
                        disable KV offload
  -ctk TYPE, --cache-type-k TYPE
                        KV cache data type for K (default: f16)
  -ctv TYPE, --cache-type-v TYPE
                        KV cache data type for V (default: f16)
  --simple-io           use basic IO for better compatibility in subprocesses and limited consoles
  --lora FNAME          apply LoRA adapter (implies --no-mmap)
  --lora-scaled FNAME S apply LoRA adapter with user defined scaling S (implies --no-mmap)
  --lora-base FNAME     optional model to use as a base for the layers modified by the LoRA adapter
  -m FNAME, --model FNAME
                        model path (default: models/7B/ggml-model-f16.gguf)
  -md FNAME, --model-draft FNAME
                        draft model for speculative decoding
  -ld LOGDIR, --logdir LOGDIR
                        path under which to save YAML logs (no logging if unset)
  --override-kv KEY=TYPE:VALUE
                        advanced option to override model metadata by key. may be specified multiple times.
                        types: int, float, bool. example: --override-kv tokenizer.ggml.add_bos_token=bool:false

log options:
  --log-test            Run simple logging test
  --log-disable         Disable trace logs
  --log-enable          Enable trace logs
  --log-file            Specify a log filename (without extension)
  --log-new             Create a separate new log file on start. Each log file will have unique name: "<name>.<ID>.log"
  --log-append          Don't truncate the old log file.

Sunday, June 4, 2023

summarization, information retrieval, and creative synthesis

Large Language Models like ChatGPT are a hot topic due to the novelty of results in multiple application domains. Stepping back from the hype, the central capabilities seem to include summarization of content, information retrieval, and creative synthesis. Unfortunately those are not separate categories -- the summarization or information retrieval can contain hallucinations that get stated confidently.

Focusing on the topic of information retrieval and setting aside hallucinations, let's consider alternative mechanisms for search:

plain text search, like what Google supports
boolean logic, i.e., AND/OR/NOT
use of special indicators like wild cards, quotes for exact search
regular expressions
graph queries for inference engines that support inductive, deductive, and abduction

Except for the last, those search mechanisms all return specific results from a previously collected set of sources.

--> I expect conventional search to remain important. There are cases where I really am looking for a specific document and not a summarization.

--> Specialized search capabilities like regular expressions and wild cards will remain relevant for matching specific text strings. An LLM might provide suggestions on designing the regex?

--> Graph queries rely on bespoke databases that LLMs are not trained on currently. I'm not aware of any reason these can't be combined.

The Physics Derivation Graph effectively provides a knowledge graph for mathematical Physics. Combining this with machine learning is feasible.