Tuesday, April 22, 2025

formal evaluation of adding one in Python

Here's what I want to accomplish:
  
program requirements, v1

objective: add one to a user-provided value


And here's an implementation in Python that I think provides that capability:

program implementation, v1
#!/usr/bin/env python3
import sys
"""
how hard can it be to add one to a user-provided value?
"""

def add_one_to_arg(number_str: str):
    """
    do the addition
    """
    try:
        number = int(number_str)
    except ValueError:
        try:
            number = float(number_str)
        except ValueError:
            print(f"Error: '{number_str}' is not a valid float.")
            print(f"Usage: python {sys.argv[0]} <number>")
            sys.exit(1) # Exit with an error status code

    return number + 1
    
if __name__ == "__main__":

    if len(sys.argv) < 2:
        print("Error: Please provide a number as a command-line argument.")
        print(f"Usage: python {sys.argv[0]} <number>")
        sys.exit(1) # Exit with a non-zero status code to indicate an error

    # Get the first argument (index 1)
    number_str = sys.argv[1]

    result = add_one_to_arg(number_str)

    print(f"{number_str} plus one is {result}")

#EOF

When I run the above script I get

$ python3 add_one.py 4
4 plus one is 5
$ python3 add_one.py 4.2
4.2 plus one is 5.2
$ python3 add_one.py cat
Error: 'cat' is not a valid float.
Usage: python add_one.py <number>


Next I'm going to intentionally add a few bugs and then ask how to prove the implementation has no bugs:

program implementation, v2 (with bugs)
#!/usr/bin/env python3

import sys
import random
import os
"""
how hard can it be to add one to a user-provided value?
"""

def add_one_to_arg(number_str: str):
    """
    do the addition
    """
    try:
        number = int(number_str)
    except ValueError:
        try:
            number = float(number_str)
        except ValueError:
            print(f"Error: '{number_str}' is not a valid float.")
            print(f"Usage: python {sys.argv[0]} <number>")
            sys.exit(1) # Exit with an error status code

    # deterministic bug
    if number == 3.14159265359:
        return 6

    # random bug
    if random.random()<0.001:
        return number+2

    # bug that is specific to user environment
    my_var = os.environ.get("ASDF")
    if my_var:
        return number+3

    return number + 1

if __name__ == "__main__":

    if len(sys.argv) < 2:
        print("Error: Please provide a number as a command-line argument.")
        print(f"Usage: python {sys.argv[0]} <number>")
        sys.exit(1) # Exit with a non-zero status code to indicate an error

    # Get the first argument (index 1)
    number_str = sys.argv[1]

    result = add_one_to_arg(number_str)

    print(f"{number_str} plus one is {result}")

#EOF

I've added three bugs in v2: a deterministic bug, a random bug, and bug that depends on the user's environment. A brute force test would be expensive but could identify the first two bugs.

There are a couple problems with v1 of the program requirements to "add one to a user-provided value."

  • The input range (unstated) is negative infinity to positive infinity.
    • Python does not have a built-in limit for the size of integers. The maximum integer value is restricted only by the available memory of the system.
    • For floats there is an upper bound:
      >>> import sys
      >>> sys.float_info.max
      1.7976931348623157e+308
  • Time-out conditions are unspecified. So if the program doesn't respond for 5 minutes, the requirements have nothing to say about that.

Rewrite the program requirements to be more specific:

program requirements, v2

objective: add one to a user-provided value.
constraint: User-provided value must be between -1E100 and +1E100.
constraint: Response to user must be provided less than 1 minute after user input.

If the "computer" doing this calculation has a 1Hz CPU clock frequency with 1byte of RAM, that might result in the Python program being "right" but the hardware being inadequate.

Also, let's make explicit the assumption that we are operating in base 10 numbers.

To be safe with the input string, let's bound that to be less than 1000 characters.

program assumptions and requirements, v3

objective: add one to a user-provided value.
constraint: User-provided value must be between -1E100 and +1E100.
constraint: Response to user must be provided less than 1 minute after user input.
constraint: user-provided input and the result are both base 10 numbers.
constraint: user-provided input is less than 1000 characters.
assumption: the program is run on a MacBook Air M2 with 8GB of RAM with factory settings.

The revised implementation is

program implementation, v3 -- with bugs and constraints
#!/usr/bin/env python3

import sys
import random
import os
"""
how hard can it be to add one to a user-provided value?
"""

def add_one_to_arg(number_str: str):
    """
    do the addition
    """
    try:
        number = int(number_str)
    except ValueError:
        try:
            number = float(number_str)
        except ValueError:
            print(f"Error: '{number_str}' is not a valid float.")
            print(f"Usage: python {sys.argv[0]} <number>")
            sys.exit(1) # Exit with an error status code

    assert(number<1E100)
    assert(number>-1E100)

    # deterministic bug
    if number == 3.14159265359:
        return 6

    # random bug
    if random.random()<0.001:
        return number+2

    # bug that is specific to user environment
    my_var = os.environ.get("ASDF")
    if my_var:
        return number+3

    return number + 1

if __name__ == "__main__":

    if len(sys.argv) < 2:
        print("Error: Please provide a number as a command-line argument.")
        print(f"Usage: python {sys.argv[0]} <number>")
        sys.exit(1) # Exit with a non-zero status code to indicate an error

    # Get the first argument (index 1)
    number_str = sys.argv[1]

    assert(len(number_str)<1000)

    result = add_one_to_arg(number_str)

    print(f"{number_str} plus one is {result}")

#EOF

Normal testing involves evaluating pre-determined cases, like "input 5, get 6" and "input 5.4, get 6.4" and "input 'cat', get error" and "input (nothing), get error."

program implementation, v4 -- with pytest
#!/usr/bin/env python3

import sys
import random
import os
"""
how hard can it be to add one to a user-provided value?
"""

def add_one(number_str: str):
    """
    do the addition
    """
    try:
        number = int(number_str)
    except ValueError:
        try:
            number = float(number_str)
        except ValueError:
            print(f"Error: '{number_str}' is not a valid float.")
            print(f"Usage: python {sys.argv[0]} <number>")
            sys.exit(1) # Exit with an error status code

    assert(number<1E100)
    assert(number>-1E100)

    # deterministic bug
    if number == 3.14159265359:
        return 6

    # random bug
    if random.random()<0.001:
        return number+2

    # bug that is specific to user environment
    my_var = os.environ.get("ASDF")
    if my_var:
        return number+3

    return number + 1

def test_add_one_to_int():
    result = add_one(number_str="5")
    assert result == 6

def test_add_one_to_float():
    result = add_one(number_str="5.3")
    assert result == 6.3
    
def test_add_one_to_nuthin():
    with pytest.raises(SystemExit):
        result = add_one(number_str="")    

#EOF


Property-based testing (e.g., https://hypothesis.readthedocs.io/en/latest/) is where you "write tests which should pass for all inputs in whatever range you describe, and let Hypothesis randomly choose which of those inputs to check - including edge cases you might not have thought about." 

program implementation, v4 -- with hypothesis
#!/usr/bin/env python3

import sys
import random
import os
import pytest
import hypothesis
"""
how hard can it be to add one to a user-provided value?
"""

def add_one(number_str: str):
    """
    do the addition
    """
    try:
        number = int(number_str)
    except ValueError:
        try:
            number = float(number_str)
        except ValueError:
            print(f"Error: '{number_str}' is not a valid float.")
            print(f"Usage: python {sys.argv[0]} <number>")
            sys.exit(1) # Exit with an error status code

    assert(number<1E100)
    assert(number>-1E100)

    # deterministic bug
    if number == 3.14159265359:
        return 6

    # random bug
    if random.random()<0.001:
        return number+2
        
    # bug that is specific to user environment
    my_var = os.environ.get("ASDF")
    if my_var:
        return number+3

    return number + 1

@hypothesis.given(number=(hypothesis.strategies.integers(-1E10, 1E10) | hypothesis.strategies.floats(-1E10, 1E10, allow_nan=False)))
@hypothesis.settings(max_examples=1000) # default is 200
def test_add_one_properties(number):
    result = add_one(number_str=str(number))
    assert result == number+1

#EOF
To run the above script, use
  pytest name_of_file.py
  

Similarly, https://github.com/zlorb/PyModel is a model checker that generates test cases based on constraints.

This blog post is about formal methods. There are a few options:



Design by Contract (https://en.wikipedia.org/wiki/Design_by_contract) approaches for Python include Dafny, Deal, and icontract.
For Dafny you write the program in Dafny and compile to Python.
For Deal you write Python and provide decorators.

Dafny


"Dafny lifts the burden of writing bug-free code into that of writing bug-free annotations." 
Dafny was created by Rustan Leino at Microsoft Research.
Dafny uses the Z3 automated theorem prover and Boogie.
Boogie is a simple programming language that is meant to be
  • an easy compile target (think "like JVM bytecode, but for proving code correct")
  • easy to analyze soundly
  • not actually intended to be executable

Instead of running Boogie programs, the Boogie compiler looks through the Boogie code to find assertions. For each assertion, the compiler generates a "verification condition", which is a formula based on a (symbolic) analysis of the program; the verification condition formula is constructed so that if the verification condition is true, the assertion holds.

It then hands those verification conditions, along with annotations in the program like assumptions, preconditions, postconditions, and loop invariants, to an SMT solver (Boogie uses Z3 by default). The SMT solver determines whether or not the assumptions definitely ensure the verification condition holds; Boogie complains about the assertions whose verification-conditions haven't been shown to hold.

source: https://www.reddit.com/r/ProgrammingLanguages/comments/tc55ld/how_does_the_dafny_programming_language/



Since "add 1" doesn't have a loop, the main aspects we'll need in Dafny are
  • @requires(...): Preconditions -- what must be true before the function is called. 
  • @ensures(...): Postconditions -- what must be true after the function returns normally.
Requires input is a number (int or float)
Ensures input < output

Deal

Integrates with Z3 prover
https://deal.readthedocs.io/basic/verification.html#background

As of April 2025, Deal doesn't have an issue tracker and doesn't seem to be active. 
deal example
#!/usr/bin/env python3
import deal
from typing import Union

@deal.pre(lambda number: number<1E100) # Precondition must be true before the function is executed.
@deal.pre(lambda number: number>-1E100)
@deal.ensure(lambda number, result: result==number+1)
def add_one_to_arg(number: Union[int, float]) -> Union[int, float]:
    """
    do the addition
    """
    return number + 1
#EOF
which can be run using
 python3 -m deal prove name_of_file.py

iContract

No prover, just consistency of conditions within Python using decorators.


This report found lack of coverage

Thursday, April 17, 2025

criteria for an impactful and efficient prototype

Prototypes are impactful and efficient when they feature only the essential features. The consequence of that claim is that the prototype is janky (not easy to use), fragile (not robust), shared prematurely (not "professional" looking). For software, a prototype might act on fake data and produce incorrect results.

After stakeholders provide feedback, then the RoI has been confirmed and the direction for where to invest more effort is clarified -- what else is essential? Correctness is typically of interest, but that competes with ease-of-use, speed, and robustness.