Tuesday, June 17, 2025

best practices for use of LLMs

I've previously written about best practices for prompts. This post is more abstract.

Avoid asking factual questions

The LLM is not a database of facts. Historical events, dates, places are not stored as exact references. LLMs generate their response based on statistical probabilities derived from patterns.

The more widely documented something is, the better the LLM knows it

The LLM's training is roughly proportional to the representation of the information on the Internet. An LLM is more reliable and detailed when discussing common knowledge.

Precise questions using relevant jargon with context yields useful output

Poorly worded questions that do not use domain-specific terminology are less likely to produce clear answers.

Do not trust citations

The LLM does not have citations hard-coded into the network. Citations are most likely to be hallucinations

Decompose complex tasks and questions into a sequence of iterative prompts

There is a limited amount of "thinking" by the LLM per prompt, so simpler tasks are more likely to produce relevant answers.

Structure your question to produce a page or less of output

Producing a 200 page book from a single prompt devolves into hallucinations after a few pages. Shorter answers are more likely to remain lucid, so phrase your question in a way that can be answered with a small amount of text.

LLMs default to the average

While LLM output can be creative (in unexpected ways), seeking exceptional insight yields the mundane

Simplify your question to a one-shot prompt

Iterative questions are more likely to yield hallucinations

Delegation to an intern who doesn't learn

This can be confusing, as the LLM occasionally knows more than you do.

Wednesday, May 7, 2025

arm64 and amd64 docker builds

Makefile

# Get the machine architecture.
# On arm64 (Apple Silicon M1/M2/etc.), `uname -m` outputs "arm64".
# On amd64 (Intel), `uname -m` outputs "x86_64".
ARCH := $(shell uname -m)

ifeq ($(ARCH), arm64)
        this_arch=arm64
else ifeq ($(ARCH), x86_64)
        this_arch=amd64
else
        @echo "Unknown architecture: $(ARCH). Cannot determine if Mac is new or old."
endif

`docker push` multiple platforms using `buildx`

I wasn't able to get docker manifest to amend both amd64 and arm64, so I used the buildx approach:

docker buildx build --push --platform linux/arm64,linux/amd64 --tag benislocated/allofphysicscom-flask:latest .

References

https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/

Sunday, May 4, 2025

requirements generation

BLUF/tl;dr: methods of requirements generation are described in this post: "one-shot think hard and brainstorm", learn from others, iterative adversarial feedback, and formal models.

As a solo hobby developer I get to do what feels right with regard to my project. That autonomy and freedom applies to both prioritization and scope. Part of the joy of working is based on being able to follow my curiosity, and to do so at a time and rate of my choosing.

When I work with another person on a shared goal, then there is value in identifying how to divide tasks. For example, by skill, by interests, by availability.

A vision distills into use cases, and these use cases determine requirements which determine tasks. Then sequencing and parallelization of tasks can happen. Let's refer to this as the "think hard and brainstorm" waterfall method. The success of waterfall relies on the ability of planners to identify all contingencies before taking action. Use of an LLM for generating requirements fits in this category as an alternative to thinking hard.

If someone else has a similar situation, learning from their requirements is a valid way of making progress. Plagiarism is acceptable; no need for being original.

The optimistic waterfall method described above assumes the alignment of incentives for participants doing the tasks. If the participants doing tasks are looking for the easiest solution to the requirement they may provided results that don't satisfy the vision.

If the folks satisfying a requirement may be adversarial, that can be accounted for in an iterative manner.

think hard and brainstorm to come up with an initial draft of requirements
provide the draft requirements to adversarial works with the instructions, "provide a solution in a day." Leverage their creativity to provide an insufficient result.
Determine why the adversarial solutions (which do meet the requirements) don't satisfy the vision. Use that insight to develop better requirements.

Repeat the iterations until requirements are "fool proof" for the sample pool of fools.

A third method of coming up with requirements is to use formal methods. For example,

"Program Derivation is the practice of beginning with a specification of a function, and by a series of mechanical steps, deriving an efficient implementation." (source: https://www.youtube.com/watch?v=JDqD6RZpnZA)

https://p-org.github.io/P/ and https://github.com/p-org/P
https://www.youtube.com/watch?v=FdXZXnkMDxs
https://www.youtube.com/watch?v=tZnX585r4ms

https://www.cs.toronto.edu/~hehner/aPToP/aPToP.pdf

Tuesday, April 22, 2025

formal evaluation of adding one in Python

Here's what I want to accomplish:

program requirements, v1

objective: add one to a user-provided value

And here's an implementation in Python that I think provides that capability:

program implementation, v1

#!/usr/bin/env python3
import sys
"""
how hard can it be to add one to a user-provided value?
"""

def add_one_to_arg(number_str: str):
    """
    do the addition
    """
    try:
        number = int(number_str)
    except ValueError:
        try:
            number = float(number_str)
        except ValueError:
            print(f"Error: '{number_str}' is not a valid float.")
            print(f"Usage: python {sys.argv[0]} <number>")
            sys.exit(1) # Exit with an error status code

    return number + 1
    
if __name__ == "__main__":

    if len(sys.argv) < 2:
        print("Error: Please provide a number as a command-line argument.")
        print(f"Usage: python {sys.argv[0]} <number>")
        sys.exit(1) # Exit with a non-zero status code to indicate an error

    # Get the first argument (index 1)
    number_str = sys.argv[1]

    result = add_one_to_arg(number_str)

    print(f"{number_str} plus one is {result}")

#EOF

When I run the above script I get

$ python3 add_one.py 4
4 plus one is 5
$ python3 add_one.py 4.2
4.2 plus one is 5.2
$ python3 add_one.py cat
Error: 'cat' is not a valid float.
Usage: python add_one.py <number>

Next I'm going to intentionally add a few bugs and then ask how to prove the implementation has no bugs:

program implementation, v2 (with bugs)

#!/usr/bin/env python3

import sys
import random
import os
"""
how hard can it be to add one to a user-provided value?
"""

def add_one_to_arg(number_str: str):
    """
    do the addition
    """
    try:
        number = int(number_str)
    except ValueError:
        try:
            number = float(number_str)
        except ValueError:
            print(f"Error: '{number_str}' is not a valid float.")
            print(f"Usage: python {sys.argv[0]} <number>")
            sys.exit(1) # Exit with an error status code

    # deterministic bug
    if number == 3.14159265359:
        return 6

    # random bug
    if random.random()<0.001:
        return number+2

    # bug that is specific to user environment
    my_var = os.environ.get("ASDF")
    if my_var:
        return number+3

    return number + 1

if __name__ == "__main__":

    if len(sys.argv) < 2:
        print("Error: Please provide a number as a command-line argument.")
        print(f"Usage: python {sys.argv[0]} <number>")
        sys.exit(1) # Exit with a non-zero status code to indicate an error

    # Get the first argument (index 1)
    number_str = sys.argv[1]

    result = add_one_to_arg(number_str)

    print(f"{number_str} plus one is {result}")

#EOF

I've added three bugs in v2: a deterministic bug, a random bug, and bug that depends on the user's environment. A brute force test would be expensive but could identify the first two bugs.

There are a couple problems with v1 of the program requirements to "add one to a user-provided value."

The input range (unstated) is negative infinity to positive infinity.
- Python does not have a built-in limit for the size of integers. The maximum integer value is restricted only by the available memory of the system.
- For floats there is an upper bound:
```
>>> import sys
>>> sys.float_info.max
1.7976931348623157e+308
```
Time-out conditions are unspecified. So if the program doesn't respond for 5 minutes, the requirements have nothing to say about that.

Rewrite the program requirements to be more specific:

program requirements, v2

If the "computer" doing this calculation has a 1Hz CPU clock frequency with 1byte of RAM, that might result in the Python program being "right" but the hardware being inadequate.

Also, let's make explicit the assumption that we are operating in base 10 numbers.

To be safe with the input string, let's bound that to be less than 1000 characters.

program assumptions and requirements, v3

objective: add one to a user-provided value.
constraint: User-provided value must be between -1E100 and +1E100.
constraint: Response to user must be provided less than 1 minute after user input.
constraint: user-provided input and the result are both base 10 numbers.
constraint: user-provided input is less than 1000 characters.
assumption: the program is run on a MacBook Air M2 with 8GB of RAM with factory settings.

The revised implementation is

program implementation, v3 -- with bugs and constraints

#!/usr/bin/env python3

import sys
import random
import os
"""
how hard can it be to add one to a user-provided value?
"""

def add_one_to_arg(number_str: str):
    """
    do the addition
    """
    try:
        number = int(number_str)
    except ValueError:
        try:
            number = float(number_str)
        except ValueError:
            print(f"Error: '{number_str}' is not a valid float.")
            print(f"Usage: python {sys.argv[0]} <number>")
            sys.exit(1) # Exit with an error status code

    assert(number<1E100)
    assert(number>-1E100)

    # deterministic bug
    if number == 3.14159265359:
        return 6

    # random bug
    if random.random()<0.001:
        return number+2

    # bug that is specific to user environment
    my_var = os.environ.get("ASDF")
    if my_var:
        return number+3

    return number + 1

if __name__ == "__main__":

    if len(sys.argv) < 2:
        print("Error: Please provide a number as a command-line argument.")
        print(f"Usage: python {sys.argv[0]} <number>")
        sys.exit(1) # Exit with a non-zero status code to indicate an error

    # Get the first argument (index 1)
    number_str = sys.argv[1]

    assert(len(number_str)<1000)

    result = add_one_to_arg(number_str)

    print(f"{number_str} plus one is {result}")

#EOF

Normal testing involves evaluating pre-determined cases, like "input 5, get 6" and "input 5.4, get 6.4" and "input 'cat', get error" and "input (nothing), get error."

program implementation, v4 -- with pytest

#!/usr/bin/env python3

import sys
import random
import os
"""
how hard can it be to add one to a user-provided value?
"""

def add_one(number_str: str):
    """
    do the addition
    """
    try:
        number = int(number_str)
    except ValueError:
        try:
            number = float(number_str)
        except ValueError:
            print(f"Error: '{number_str}' is not a valid float.")
            print(f"Usage: python {sys.argv[0]} <number>")
            sys.exit(1) # Exit with an error status code

    assert(number<1E100)
    assert(number>-1E100)

    # deterministic bug
    if number == 3.14159265359:
        return 6

    # random bug
    if random.random()<0.001:
        return number+2

    # bug that is specific to user environment
    my_var = os.environ.get("ASDF")
    if my_var:
        return number+3

    return number + 1

def test_add_one_to_int():
    result = add_one(number_str="5")
    assert result == 6

def test_add_one_to_float():
    result = add_one(number_str="5.3")
    assert result == 6.3
    
def test_add_one_to_nuthin():
    with pytest.raises(SystemExit):
        result = add_one(number_str="")    

#EOF

Property-based testing (e.g., https://hypothesis.readthedocs.io/en/latest/) is where you "write tests which should pass for all inputs in whatever range you describe, and let Hypothesis randomly choose which of those inputs to check - including edge cases you might not have thought about."

program implementation, v4 -- with hypothesis

#!/usr/bin/env python3

import sys
import random
import os
import pytest
import hypothesis
"""
how hard can it be to add one to a user-provided value?
"""

def add_one(number_str: str):
    """
    do the addition
    """
    try:
        number = int(number_str)
    except ValueError:
        try:
            number = float(number_str)
        except ValueError:
            print(f"Error: '{number_str}' is not a valid float.")
            print(f"Usage: python {sys.argv[0]} <number>")
            sys.exit(1) # Exit with an error status code

    assert(number<1E100)
    assert(number>-1E100)

    # deterministic bug
    if number == 3.14159265359:
        return 6

    # random bug
    if random.random()<0.001:
        return number+2
        
    # bug that is specific to user environment
    my_var = os.environ.get("ASDF")
    if my_var:
        return number+3

    return number + 1

@hypothesis.given(number=(hypothesis.strategies.integers(-1E10, 1E10) | hypothesis.strategies.floats(-1E10, 1E10, allow_nan=False)))
@hypothesis.settings(max_examples=1000) # default is 200
def test_add_one_properties(number):
    result = add_one(number_str=str(number))
    assert result == number+1

#EOF

To run the above script, use

  pytest name_of_file.py

Similarly, https://github.com/zlorb/PyModel is a model checker that generates test cases based on constraints.

This blog post is about formal methods. There are a few options:

formal design specification

https://github.com/fizzbee-io/FizzBee is a design specification language. FizzBee is supposed to be before you start coding

formal implementation verification

https://github.com/marcoeilers/nagini, a static verification tool for Python using Viper, is an implementation verification language. https://www.youtube.com/watch?v=PIwP3SuWLb0
https://deal.readthedocs.io/basic/verification.html is an implementation verification language.
https://github.com/pschanely/CrossHair: a static verification tool for Python using symbolic execution. "Repeatedly calls your functions with symbolic inputs. It uses an SMT solver (a kind of theorem prover) to explore viable execution paths and find counterexamples for you."

https://github.com/formal-land/coq-of-python - Translate Python code to Coq code for formal verification. "formal-land" is a commercial company selling verification-as-a-service:
https://github.com/arsalan0c/dafny-of-python: translates a program written in a subset of typed Python along with its specification, to the Dafny verification language. Uses Dune, a build system for OCaml projects. Last update was 4 years ago. See https://dune.build/ and https://dune.readthedocs.io/en/latest/index.html

Design by Contract (https://en.wikipedia.org/wiki/Design_by_contract) approaches for Python include Dafny, Deal, and icontract.
For Dafny you write the program in Dafny and compile to Python.
For Deal you write Python and provide decorators.

Dafny

https://en.wikipedia.org/wiki/Dafny

"Dafny lifts the burden of writing bug-free code into that of writing bug-free annotations."
Dafny was created by Rustan Leino at Microsoft Research.
Dafny uses the Z3 automated theorem prover and Boogie.

Boogie is a simple programming language that is meant to be

an easy compile target (think "like JVM bytecode, but for proving code correct")
easy to analyze soundly
not actually intended to be executable

Instead of running Boogie programs, the Boogie compiler looks through the Boogie code to find assertions. For each assertion, the compiler generates a "verification condition", which is a formula based on a (symbolic) analysis of the program; the verification condition formula is constructed so that if the verification condition is true, the assertion holds.
It then hands those verification conditions, along with annotations in the program like assumptions, preconditions, postconditions, and loop invariants, to an SMT solver (Boogie uses Z3 by default). The SMT solver determines whether or not the assumptions definitely ensure the verification condition holds; Boogie complains about the assertions whose verification-conditions haven't been shown to hold.

source: https://www.reddit.com/r/ProgrammingLanguages/comments/tc55ld/how_does_the_dafny_programming_language/

https://dafny.org/dafny/OnlineTutorial/guide.html

Since "add 1" doesn't have a loop, the main aspects we'll need in Dafny are

@requires(...): Preconditions -- what must be true before the function is called.
@ensures(...): Postconditions -- what must be true after the function returns normally.

Requires input is a number (int or float)

Ensures input < output

Deal

Integrates with Z3 prover

https://github.com/life4/deal

https://github.com/life4/deal-solver

https://deal.readthedocs.io/

https://deal.readthedocs.io/basic/verification.html#background

As of April 2025, Deal doesn't have an issue tracker and doesn't seem to be active.

deal example

#!/usr/bin/env python3
import deal
from typing import Union

@deal.pre(lambda number: number<1E100) # Precondition must be true before the function is executed.
@deal.pre(lambda number: number>-1E100)
@deal.ensure(lambda number, result: result==number+1)
def add_one_to_arg(number: Union[int, float]) -> Union[int, float]:
    """
    do the addition
    """
    return number + 1
#EOF

which can be run using

 python3 -m deal prove name_of_file.py

iContract

No prover, just consistency of conditions within Python using decorators.

https://github.com/Parquery/icontract

https://icontract.readthedocs.io/en/latest/

This report found lack of coverage

https://github.com/mristin/python-by-contract-corpus/blob/main/python_by_contract_corpus/incorrect_from_recorded/aoc2020/day_10_adapter_array/missed_edge_case_empty.py

from https://github.com/mristin/python-by-contract-corpus

Thursday, April 17, 2025

criteria for an impactful and efficient prototype

Prototypes are impactful and efficient when they feature only the essential features. The consequence of that claim is that the prototype is janky (not easy to use), fragile (not robust), shared prematurely (not "professional" looking). For software, a prototype might act on fake data and produce incorrect results.

After stakeholders provide feedback, then the RoI has been confirmed and the direction for where to invest more effort is clarified -- what else is essential? Correctness is typically of interest, but that competes with ease-of-use, speed, and robustness.

Tuesday, January 28, 2025

a sequence of prompts to get an LLM to provide content for the Physics Derivation Graph

In https://github.com/allofphysicsgraph/allofphysics.com/issues/16 I created a set of prompts that got Gemini 2.0 (https://aistudio.google.com/prompts/) to generate content relevant for the Physics Derivation Graph. Probably saved a day or 2 of manual labor per derivation.

The steps below have been vetted in https://github.com/allofphysicsgraph/allofphysics.com/issues/20

Find a relevant derivation

bridge two concepts

In mathematical Physics a derivation is comprised of steps and equations.

Is there a derivation that bridges Newton's Second Law for the Center of Mass to the concepts of torque and moment of inertia which describe rotational motion? If yes, provide the steps of the derivation. Format the answer using Latex as a single file.

Latex is a software system for typesetting documents. LaTeX is widely used for the communication and publication of scientific documents and technical note-taking in many fields, owing partially to its support for complex mathematical notation.

Do not have equations inline with the text. Equations should be separate and inside a math environment for Latex:

\begin{equation}
\end{equation}

Each equation should have a left-hand side and right-hand side. Do specify equations, but avoid using equations that contain multiple equal signs.

Alternative starting point

reformat wikipedia to Latex

Wikipedia uses MediaWiki syntax. Convert the following MediaWiki content to a single Latex document.

Manually clean up the latex using Overleaf

Insert labels for equations.

For the following Latex file, write out the file and insert a label for each equation. The label that is added should be descriptive of the equation and contain no spaces, only underscores.

Labels for equations should be unique within the document.

Don't provide any commentary. Just write the revised Latex file content as your answer.

Make expression references explicit

The following Latex file contains a derivation with many steps. Sometimes a previous expression is referenced implicitly. Edit the following text to manually insert references where appropriate. For example replace text like "substitute the forces" with "substitute Eq.~\ref{} to get". Be sure to reference the relevant equation labels so that Latex can compile the document.

Add more steps

A derivation in mathematical Physics consists of a sequence of steps. Each step relates mathematical equations.

Modify the Latex file below by adding explanatory text about each step of the derivation. Document the mathematical transformations that relate each equation in the file to other equations in the file.

Use the Latex equation labels to reference equations.

Only if needed:

Separate equations

A derivation in mathematical Physics consists of a sequence of steps. Each step relates mathematical equations.

In the Latex file below, for equations that have multiple instances of the equals sign, separate those into multiple equations such that each equation only has one use of the equals sign.

Do not have equations in the text. Equations should be inside

\begin{equation}
\end{equation}

Write out the modified content as a single Latex file.

Add explanatory text between equations

A derivation in mathematical Physics consists of a sequence of steps. Each step relates mathematical equations.

In the Latex file below, where there are two equations with no explanatory text between them, modify the Latex file by adding explanatory text to document the mathematical transformations that relate each equation in the file to other equations in the file.

Use the Latex equation labels to reference equations.

List of variables is last step

list variables used in the derivation with reference

The following latex file contains equations and variables. Provide a list, formatted as Latex, of every unique variable and a description of that variable.

For each entry in the list include a reference to the labeled equations where each variable is used. Write out just the Latex list as your answer.

If you already have a list of variables,

list variables used in the derivation with reference

The following Latex file contains equations and variables. The first section contains a list of all the variables used in the equations. Modify the Latex list to include a reference to the labeled equations where each variable is used. Write out just the modified Latex list as your answer.

Monday, January 20, 2025

spanning the topics and assumptions of Physics

In two previous posts I've described two different ways to span physics:

identify dichotomies: https://physicsderivationgraph.blogspot.com/2024/03/dichotomy-of-assumptions.html
identify a list of topics: https://physicsderivationgraph.blogspot.com/2017/07/finding-edges-for-physics-derivation.html

In this post I add a third: contrasting variables.

The "contrasting variables" and approaches compare extremes like

fast (HEP) and slow (classical)
large (relativity) and small (quantum)

The question in this approach is whether there is a derivation that spans each pair of the two extremes.

A starting point for this approach would be to identify a specific variable (mass, velocity, temperature) and find derivations that involve the variable at each extreme.

Tuesday, June 17, 2025

best practices for use of LLMs

Avoid asking factual questions

The more widely documented something is, the better the LLM knows it

Precise questions using relevant jargon with context yields useful output

Do not trust citations

Decompose complex tasks and questions into a sequence of iterative prompts

Structure your question to produce a page or less of output

LLMs default to the average

Simplify your question to a one-shot prompt

Delegation to an intern who doesn't learn

Wednesday, May 7, 2025

arm64 and amd64 docker builds

Makefile

docker push multiple platforms using buildx

References

Sunday, May 4, 2025

requirements generation

Tuesday, April 22, 2025

formal evaluation of adding one in Python

Dafny

Deal

iContract

Thursday, April 17, 2025

criteria for an impactful and efficient prototype

Tuesday, January 28, 2025

a sequence of prompts to get an LLM to provide content for the Physics Derivation Graph

Find a relevant derivation

List of variables is last step

Monday, January 20, 2025

spanning the topics and assumptions of Physics

`docker push` multiple platforms using `buildx`