Wednesday, May 31, 2023

OpenAI's process supervision for math problems and relevance to the Physics Derivation Graph

OpenAI just announced (see https://openai.com/research/improving-mathematical-reasoning-with-process-supervision) progress on solving math problems using process supervision during training.

The data on https://github.com/openai/prm800k/tree/main comes from https://github.com/hendrycks/math (which is for https://arxiv.org/pdf/2103.03874.pdf) and there are examples in that data which come from https://artofproblemsolving.com/wiki/index.php/2015_AIME_II_Problems/Problem_6

AoPS describes itself as "Math texts, online classes, and more for students in grades 5-12."

The problems are constrained and feel very artificial. See for example https://artofproblemsolving.com/wiki/index.php/Mock_AIME_1_Pre_2005_Problems/Problem_4

The training data doesn't have inference rules, so the output from the LLM doesn't have inference rules. As a consequence, the output of the LLM cannot be confirmed by a Computer Algebra System. The output text needs to be validated by a human. LLMs are hallucinating answers that sound reasonable, so checking each step is still vital. 

The ability to resolve distinct variables across all of Mathematical Physics is beyond the scope of the training data. 

On a positive note, if the Physics Derivation Graph content existed, I now think an LLM-based approach could be used to make progress in Mathematical Physics.

Saturday, May 27, 2023

tracing Python in JSON-based workflow is untenable

I added

def mytrace(frame, event, arg):
    if event == "call":
        print("call", frame.f_code.co_name, frame.f_locals)
    elif event == "return":
        print("return", frame.f_code.co_name, arg)
    return mytrace

sys.settrace(mytrace)

to https://github.com/allofphysicsgraph/proofofconcept/blob/gh-pages/v7_pickle_web_interface/flask/controller.py but the output wasn't that useful since I'm passing the entire database as JSON. The full JSON shows up in almost every argument and return value, making the output of the trace unreadable.

When I switch to the Neo4j/Cypher-based approach, the trace might be more usable.

Monday, May 1, 2023

Omniscient project management

The Physics Derivation Graph relies on a diverse set of skills. To speed up the development process we could identify separable tasks and then spread the tasks among a team of contributors. Collaboration requires coordination, and that coordination can be top-down or organic or a mixture.

This post focuses on the top-down approach and assumes an omniscient view. 

A standard data structure in project management is the Gantt chart. A Gantt chart uses information about tasks, task dependencies, task durations, and dates to create a visualization associated with a project.

task ID task description task duration [days] depends on tasks earliest start date
14235 something useful 3 N/A 2022-03-01
25532 hard work 2 [14235] N/A
3456252 keeping busy 3 [25532] N/A

That table can be visualized with tasks versus time:

Visualization of a Gantt chart. Four tasks. Task 2 and 3 depend on task 1 being completed. Task 4 depends on task 2 being completed.

 

That data structure doesn't account for staffing, skills, equipment, or budget. The  Gantt chart doesn't account for uncertainty of task duration, nor alternative paths.

Gantt charts present a single path

Project management involves contingency planning.  

IF this THEN
   that 
ELSE
   other

Every conditional statement is a branching of possible paths, each a separate Gantt chart.

A single Gantt chart is a snapshot of a single path.


Staffing, budget, equipment, skills, uncertainty

Augmenting the basic Gantt chart means extending the table data structure to something like
task ID task description task duration [days] depends on tasks earliest start date depends on equipment minimum skill set and level uncertainty in duration [days]
14235 something useful 3 N/A 2022-03-01 [Photoshop] photo editing, intermediate +/-1
25532 hard work 2 [14235] N/A [Excel] math, beginner; text editing, beginner +2
3456252 keeping busy 3 [25532] N/A Chrome browser clicking on links, beginner 0

That information needs to be augmented with a cost table for equipment:

equipment cost per day [USD] acquisition cost [USD]
Photoshop 0 100
Excel 0 20
Chrome browser 0 0

Lastly, we need people who can do the tasks.

person name hourly cost [USD] skill and level dates available
Alice 20 Excel, beginner; text editing, intermediate [2022-01-02, 2022-01-03, 2022-01-04]
Bob 15 Excel, intermediate; Math, beginner [2022-02-01, 2022-02-15, 2022-02-24]
Charlie 24 photo editing, beginner [2022-01-12, 2022-01-23, 2022-01-24]

Caveat: the above model is focused exclusively on experts doing tasks using equipment. The model does not account for managers, nor does the model account for support staff. The model could include any scope of tasks, but a boundary needs to be drawn somewhere to avoid becoming Seldon's psychohistory. The problem of tasks existing outside the model is the risk that out-of-model tasks block project progress or alter the project cost. Anything outside the model could be an invisible dependency.

Derived project views

The following views can be derived from the three tables above:
  • standard task Gantt chart
  • project duration histogram. (Each "task duration" has an uncertainty that can be aggregated and results in variance.)
  • per-person activity schedule -- who works on what when. A prioritization based on task dependencies and when people with skills are available
  • cost per day -- the spend rate

Blockers inhibiting the omniscient project management view

Even though the issue can be formulated into a set of data structures

Blockers are the amount of time needed to 

  • gather the relevant information and keep the information up-to-date as the situation evolves
  • document the information and tune the model
  • survey the skills of the workforce
  • have the workforce track their time spent on a task
  • define each task, along with a "definition of done"
  • track costs of the workforce and equipment
  • identifying conditional paths and tracking which alternative was selected

Blockers that aren't time-based:

  • inability to break project into atomic tasks (where "atomic" is based on skill set and skill level)
  • can break project into tasks, but unable to identify relevant skills for task

The centralization of this model is merely to construct a narrative. Software like Atlassian's Jira is a way of distributing the task tracking rather than try to administer the project in a centralized top-down approach.

Conclusion

The top-down omniscient view of project management is an unrealistic fantasy. However, it might be a helpful artifact for negotiation among decision makers. The alternative (where decision makers don't have a quantitative model to argue about) devolves into reliance on personal relationships, turf battles, and political factions. Bureaucratic processes evolve as a substitute for the lack of top-down omniscient view of project management.