Sunday, June 11, 2023

chain of custody for formally-verified software

Relating disparate sources of knowledge in science involves:

  • Formally-verified Mathematical Physics (e.g., in Lean). Purpose: verify the mathematical model is consistent with specified axioms.
  • Formally-verified software; see https://physicsderivationgraph.blogspot.com/2022/10/tla-comparison.html for a list.
  • Performant parallelized software (e.g., MPI + OpenMP + Fortran or C/C++) for High Performance Computing (HPC).
  • Peer-reviewed journal articles, e.g., arxiv; rendered using Latex. [Of the four, this has historically been closest to the focus of the Physics Derivation Graph.]

Each of these content sources are independent and involve expressing a mathematical model in different ways. 


While kernels of mathematical models can be implemented in formally-verified software, a lot of scientific computing uses High Performance Computing (HPC). In the HPC realm the compilers and libraries and runtimes can all be sources of bugs. 

Need to compile or interpret software with a verified compiler or interpreter. 


How can the consistency of content across sources be ensured? 

There are 6 bilateral transitions for the 4 sources. For example, how does the model transition between theorem prover (e.g., Lean) and formally-verified software implementation?


Another possible issue: provenance

May need to track provenance of the source code to binaries. Is the verified algorithm what was provided to the compiler? Is the binary I'm currently using related to the compiler output? Need to be able to distinguish "binary I don't know the origin of" from "binary that has a known chain of custody." 

This could be just a log of hashes coupled to metadata for the "where did this come from." The provenance needs to include versions for compilers and libraries and runtimes. The binary should be reproducible. See https://en.wikipedia.org/wiki/Reproducible_builds

No comments:

Post a Comment