The LLM is not a database of facts. Historical events, dates, places are not stored as exact references. LLMs generate their response based on statistical probabilities derived from patterns.
The more widely documented something is, the better the LLM knows it
The LLM's training is roughly proportional to the representation of the information on the Internet. An LLM is more reliable and detailed when discussing common knowledge.
Precise questions using relevant jargon with context yields useful output
Poorly worded questions that do not use domain-specific terminology are less likely to produce clear answers.
Do not trust citations
The LLM does not have citations hard-coded into the network. Citations are most likely to be hallucinations
Decompose complex tasks and questions into a sequence of iterative prompts
There is a limited amount of "thinking" by the LLM per prompt, so simpler tasks are more likely to produce relevant answers.
Structure your question to produce a page or less of output
Producing a 200 page book from a single prompt devolves into hallucinations after a few pages. Shorter answers are more likely to remain lucid, so phrase your question in a way that can be answered with a small amount of text.
LLMs default to the average
While LLM output can be creative (in unexpected ways), seeking exceptional insight yields the mundane
Simplify your question to a one-shot prompt
Iterative questions are more likely to yield hallucinations
Delegation to an intern who doesn't learn
This can be confusing, as the LLM occasionally knows more than you do.
Previously I've considered the Physics Derivation Graph to be an unlikely dream, in that a complete representation of Physics is infeasible, and even a breadth-first approach is infeasible. My expectation of infeasibility was based on the constraints that data entry would not just require a significant number of person-hours, but the data entry would need to be enacted by highly-qualified people with training tailored to the Physics Derivation Graph.
If the workflow can be segmented into small tasks, LLMs might provide relevant automation of complex tasks that require semantic context.
Overview of workflow
For example,
Given a corpus of papers on arxiv,
identify whether a paper has a derivation.
Given a paper with a derivation, can the steps be identified? (Associate specific inference rule)
Can the steps be verified (e.g., SymPy, Lean)?
For derivations with steps, what variables and constants are used in the paper?
Do these variables and constants appear in other derivations?
That's a combination of text search and semantic awareness of the text content. Not every step has to be "just a human" or "just an LLM" -- an augmented capability is a reasonable workflow.
Another enabling aspect is the Latex representation for arxiv papers. That makes parsing (by regex or LLM) easier.
$ tar xzvf arXiv-2004.04818v2.tar.gz
x elemental.pdf
x fig3.pdf
x sound1.tex
x sound-speeds.pdf
The file sound1.tex is 37110 bytes (36kB), 302 lines, 37108 characters, and 5134 words.
Loading the .tex into Gemini 2.0's prompt uses 10,518 tokens (with the max for Gemini 2.0 being 1,048,574 tokens).
Gemini 2.0 (on 2025-01-07) says
Yes, the article contains a derivation. It presents two approaches to derive the speed of sound, starting from system elasticity and vibrational properties, respectively. Both approaches lead to the same result, which is then used to find the upper bound for the speed of sound.
I ask a follow-up question,
Gemini's response:
I follow with another prompt,
Gemini's response:
Let's see if Gemini can extract symbol definitions:
Gemini's response:
The use of unicode isn't desirable. Let's see if we can force the use of Latex using the prompt
Gemini's response:
Now lets see if Gemini can identify the steps of the derivation.
Gemini's response
Next step: extract the variables used in the above derivation to see if they are present in the Physics Derivation Graph by using the PDG API.
Gemini's response (2025-01-07):
TODO: the PDG API needs to be able to handle tasks associated with a workflow that uses an LLM.
tl;dr: I'm unsuccessful at getting Gemini 2.0 (on 2025-01-07) to relate Brewster's angle to the Law of Refraction. However, Gemini is able to provide the steps for simple (non-trivial) derivations.
Reformatting HTML table
The site https://aistudio.google.com/ provides Gemini 2.0 LLM for free.
To experiment with using it I wanted to provide the inference rules. There are 94 inference rules listed on https://allofphysics.com/list_all_inference_rules in an 8 column table.
After removing javascript, headers, and footers the HTML file has 3869 lines. That takes 32,164 tokens for the prompt
Gemini provided the correct 5 column HTML file. Huzzah!
Brewster's angle and Law of Refraction
Next step is to see if Gemini can use the inference rules for a derivation.
To which Gemini replied (on 2025-01-07),
"Unsure about answer"
That's reasonable, so I removed the inference rules and tried again
Again Gemini replied (on 2025-01-07),
"Unsure about answer"
I then relaxed the constraint about right answers only and encouraged step-by-step reasoning
Gemini's response (on 2025-01-07):
It's not possible to directly derive Brewster's angle from Snell's Law, or vice versa, using standard mathematical manipulations. Brewster's angle is a specific condition related to polarization, while Snell's Law describes refraction in general. Therefore, no sequence of inference rules can relate them directly. They are independent physical principles governing light behavior under different conditions.
Simple derivation
Gemini 2.0 on 2025-01-07 responds with
That's "good enough" in my view to be useful -- associating the relevant inference rule with specific steps in a multi-step derivation!
docker run -it --rm -v `pwd`:/scratch llama-cpp-with-mistral-7b-v0.1.q6_k:2023-12-22 /bin/bash
root@dc98ac4a23d5:/opt/llama.cpp# ./main -h
usage: ./main [options]
options:
-h, --help show this help message and exit
--version show version and build info
-i, --interactive run in interactive mode
--interactive-first run in interactive mode and wait for input right away
-ins, --instruct run in instruction mode (use with Alpaca models)
-cml, --chatml run in chatml mode (use with ChatML-compatible models)
--multiline-input allows you to write or paste multiple lines without ending each in '\'
-r PROMPT, --reverse-prompt PROMPT
halt generation at PROMPT, return control in interactive mode
(can be specified more than once for multiple prompts).
--color colorise output to distinguish prompt and user input from generations
-s SEED, --seed SEED RNG seed (default: -1, use random seed for < 0)
-t N, --threads N number of threads to use during generation (default: 20)
-tb N, --threads-batch N
number of threads to use during batch and prompt processing (default: same as --threads)
-p PROMPT, --prompt PROMPT
prompt to start generation with (default: empty)
-e, --escape process prompt escapes sequences (\n, \r, \t, \', \", \\)
--prompt-cache FNAME file to cache prompt state for faster startup (default: none)
--prompt-cache-all if specified, saves user input and generations to cache as well.
not supported with --interactive or other interactive options
--prompt-cache-ro if specified, uses the prompt cache but does not update it.
--random-prompt start with a randomized prompt.
--in-prefix-bos prefix BOS to user inputs, preceding the `--in-prefix` string
--in-prefix STRING string to prefix user inputs with (default: empty)
--in-suffix STRING string to suffix after user inputs with (default: empty)
-f FNAME, --file FNAME
prompt file to start generation.
-n N, --n-predict N number of tokens to predict (default: -1, -1 = infinity, -2 = until context filled)
-c N, --ctx-size N size of the prompt context (default: 512, 0 = loaded from model)
-b N, --batch-size N batch size for prompt processing (default: 512)
--samplers samplers that will be used for generation in the order, separated by ';', for example: "top_k;tfs;typical;top_p;min_p;temp"
--sampling-seq simplified sequence for samplers that will be used (default: kfypmt)
--top-k N top-k sampling (default: 40, 0 = disabled)
--top-p N top-p sampling (default: 0.9, 1.0 = disabled)
--min-p N min-p sampling (default: 0.1, 0.0 = disabled)
--tfs N tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
--typical N locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
--repeat-last-n N last n tokens to consider for penalize (default: 64, 0 = disabled, -1 = ctx_size)
--repeat-penalty N penalize repeat sequence of tokens (default: 1.1, 1.0 = disabled)
--presence-penalty N repeat alpha presence penalty (default: 0.0, 0.0 = disabled)
--frequency-penalty N repeat alpha frequency penalty (default: 0.0, 0.0 = disabled)
--mirostat N use Mirostat sampling.
Top K, Nucleus, Tail Free and Locally Typical samplers are ignored if used.
(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
--mirostat-lr N Mirostat learning rate, parameter eta (default: 0.1)
--mirostat-ent N Mirostat target entropy, parameter tau (default: 5.0)
-l TOKEN_ID(+/-)BIAS, --logit-bias TOKEN_ID(+/-)BIAS
modifies the likelihood of token appearing in the completion,
i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello',
or `--logit-bias 15043-1` to decrease likelihood of token ' Hello'
--grammar GRAMMAR BNF-like grammar to constrain generations (see samples in grammars/ dir)
--grammar-file FNAME file to read grammar from
--cfg-negative-prompt PROMPT
negative prompt to use for guidance. (default: empty)
--cfg-negative-prompt-file FNAME
negative prompt file to use for guidance. (default: empty)
--cfg-scale N strength of guidance (default: 1.000000, 1.0 = disable)
--rope-scaling {none,linear,yarn}
RoPE frequency scaling method, defaults to linear unless specified by the model
--rope-scale N RoPE context scaling factor, expands context by a factor of N
--rope-freq-base N RoPE base frequency, used by NTK-aware scaling (default: loaded from model)
--rope-freq-scale N RoPE frequency scaling factor, expands context by a factor of 1/N
--yarn-orig-ctx N YaRN: original context size of model (default: 0 = model training context size)
--yarn-ext-factor N YaRN: extrapolation mix factor (default: 1.0, 0.0 = full interpolation)
--yarn-attn-factor N YaRN: scale sqrt(t) or attention magnitude (default: 1.0)
--yarn-beta-slow N YaRN: high correction dim or alpha (default: 1.0)
--yarn-beta-fast N YaRN: low correction dim or beta (default: 32.0)
--ignore-eos ignore end of stream token and continue generating (implies --logit-bias 2-inf)
--no-penalize-nl do not penalize newline token
--temp N temperature (default: 0.8)
--logits-all return logits for all tokens in the batch (default: disabled)
--hellaswag compute HellaSwag score over random tasks from datafile supplied with -f
--hellaswag-tasks N number of tasks to use when computing the HellaSwag score (default: 400)
--keep N number of tokens to keep from the initial prompt (default: 0, -1 = all)
--draft N number of tokens to draft for speculative decoding (default: 8)
--chunks N max number of chunks to process (default: -1, -1 = all)
-np N, --parallel N number of parallel sequences to decode (default: 1)
-ns N, --sequences N number of sequences to decode (default: 1)
-pa N, --p-accept N speculative decoding accept probability (default: 0.5)
-ps N, --p-split N speculative decoding split probability (default: 0.1)
-cb, --cont-batching enable continuous batching (a.k.a dynamic batching) (default: disabled)
--mmproj MMPROJ_FILE path to a multimodal projector file for LLaVA. see examples/llava/README.md
--image IMAGE_FILE path to an image file. use with multimodal models
--mlock force system to keep model in RAM rather than swapping or compressing
--no-mmap do not memory-map model (slower load but may reduce pageouts if not using mlock)
--numa attempt optimizations that help on some NUMA systems
if run without this previously, it is recommended to drop the system page cache before using this
see https://github.com/ggerganov/llama.cpp/issues/1437
--verbose-prompt print prompt before generation
-dkvc, --dump-kv-cache
verbose print of the KV cache
-nkvo, --no-kv-offload
disable KV offload
-ctk TYPE, --cache-type-k TYPE
KV cache data type for K (default: f16)
-ctv TYPE, --cache-type-v TYPE
KV cache data type for V (default: f16)
--simple-io use basic IO for better compatibility in subprocesses and limited consoles
--lora FNAME apply LoRA adapter (implies --no-mmap)
--lora-scaled FNAME S apply LoRA adapter with user defined scaling S (implies --no-mmap)
--lora-base FNAME optional model to use as a base for the layers modified by the LoRA adapter
-m FNAME, --model FNAME
model path (default: models/7B/ggml-model-f16.gguf)
-md FNAME, --model-draft FNAME
draft model for speculative decoding
-ld LOGDIR, --logdir LOGDIR
path under which to save YAML logs (no logging if unset)
--override-kv KEY=TYPE:VALUE
advanced option to override model metadata by key. may be specified multiple times.
types: int, float, bool. example: --override-kv tokenizer.ggml.add_bos_token=bool:false
log options:
--log-test Run simple logging test
--log-disable Disable trace logs
--log-enable Enable trace logs
--log-file Specify a log filename (without extension)
--log-new Create a separate new log file on start. Each log file will have unique name: "<name>.<ID>.log"
--log-append Don't truncate the old log file.
Large Language Models like ChatGPT are a hot topic due to the novelty of results in multiple application domains. Stepping back from the hype, the central capabilities seem to include summarization of content, information retrieval, and creative synthesis. Unfortunately those are not separate categories -- the summarization or information retrieval can contain hallucinations that get stated confidently.
Focusing on the topic of information retrieval and setting aside hallucinations, let's consider alternative mechanisms for search:
plain text search, like what Google supports
boolean logic, i.e., AND/OR/NOT
use of special indicators like wild cards, quotes for exact search
regular expressions
graph queries for inference engines that support inductive, deductive, and abduction
Except for the last, those search mechanisms all return specific results from a previously collected set of sources.
--> I expect conventional search to remain important. There are cases where I really am looking for a specific document and not a summarization.
--> Specialized search capabilities like regular expressions and wild cards will remain relevant for matching specific text strings. An LLM might provide suggestions on designing the regex?
--> Graph queries rely on bespoke databases that LLMs are not trained on currently. I'm not aware of any reason these can't be combined.
The Physics Derivation Graph effectively provides a knowledge graph for mathematical Physics. Combining this with machine learning is feasible.