Monday, February 21, 2022

Lots of tasks for 2022; what are the priorities

With the JSON/SQL implementation, I showed myself that what I was imagining (Latex entry, CAS integration, symbol tracking, Latex/PDF output) was in fact feasible. However, the JSON/SQL backend and the forms-based web front-end were sufficiently embarrassing that I wasn't interested in showing off the idea. 

Now my goal with the Neo4j/SQL backend my goal is 1) provide query capability and 2) to not be embarrassed. 


High priority:

Low priority: 

  • analysis of server logs -- https://github.com/allofphysicsgraph/proofofconcept/issues/246

Tuesday, February 8, 2022

old undated project goals

This page content is from two migrations, most recently https://sites.google.com/site/physicsderivationgraph/goals

I don't know the original date of this post


Objectives

  • Create a framework capable of describing all mathematics needed for physics derivations. I use Latex as the syntax in the framework because Latex is how I think of equations. However, Latex is insufficient for processing by computer algebra systems. Status: proof of concept exists

  • Create machine-readable databases which use the above framework to capture the mathematical derivations in physics. To hold the content of the databases I'm using custom XML. Status: proof of concept exists

  • Create graphical representation of relations content in the databases. I'm using GraphViz to render the visualization. Status: proof of concept works

  • Use a computer algebra system to verify the relations in the databases. I'm using SymPy as the CAS (see also a list of candidates). Status: proof of concept works.

  • Create a web browser-based viewing of the generated graph. HTML5 seems capable. Status: investigated, not started

Create a web browser-based graph input tool. Status: not started


The above workflow applies to both CLI and web-based GUI. Implementing a web-based GUI is its own learning curve, so separate the above workflow diagram into two sets of tasks. The following applies to both web-based GUI and CLI.

Tasks

  • user adds content to internal data structure (user --> IDS)

  • write content to external database from internal data structure (IDS --> DB)

  • read content from external database into internal data structure (IDS <-- DB)

  • render internal data structure as visual graph (IDS --> graph)

  • check internal data structure content using CAS (IDS --> CAS)

Currently (20140527) what I'm actually doing is (user --> DB) and (DB --> graph)

Rant 1: historical progress

How we do math and physics has undergone some historical transitions in how the process is carried out. 

Initially research was done by individuals by paper and pen, then communicated via letters and later journals. <claim>Mechanical computers were initially used for computation of known capability, rather than enabling novel research. </claim> 

In the past 50 years, electronic computers have enabled numeric and symbolic computation. High performance computers at large scale allow for research at unprecedented pace. Results in article form are still communicated via printed journals, and more recently, electronically. Sharing data and algorithms in electronic format is the current revolution. 

The tools we use have expanded from paper and pen to Computer Algebra Systems, databases, and programming languages.

old log entries from 2014

20140927 syntax := ABNF

An issue I've previously encountered is how to store the information associated with the graph. I started with plain text, and then moved to XML (the current state). The content stored in the XML is Latex. Latex renders well, but lacks context (meaning). I am now investigating Content MathML as a stricter (but less widely adopted) syntax. Although Content MathML is better, it doesn't entirely capture the use I intend for this project.

The core of the project centers on the following two conventions:

<statement> = <left hand side> <relation> <right hand side>
<statement> --> <inference rule> --> <statement>
The relation is typically equality or inequality, though it could include union and other operators.

The inference rule acts on one or more statements.

The statement is the relation between two mathematical expressions

This is outside the scope of Content MathML, XML, and Latex. I think the appropriate capture is to use something like Augmented Backus-Naur Form (ABNF). Here are a few possible rules:

derivation = 1*2statement inferenceRule statement; some derivations involve multiple input statements, here we allow 1 or 2

statement = mathExpression relation mathExpression

mathExpression = *ALPHA *DIGIT *SP *operator *"\" *"." *"," *"'" 

relation = ( "=" / "<=" / "<" / ">" / ">=" / "U" )

operator = ( "+" / "-" / "/" / "*" / "^" / "!" / ( "[" "]" ) / ( "(" ")" ) / ( "{" "}" ) / "|" )
Looking at the code, I've already adopted the following rules
infrule_name = 1*ALPHA

statement_punid = 10DIGIT

statement_tunid = 7DIGIT

symbol_punid = 15DIGIT

From my notes, it appears I first figured out the requirement for the statement label restrictions on 20110406.

20140527 commercial projects

In the past week I've been made aware of 3 commercial efforts similar to this project

SymboLab and Formula-Database are similar, in that both are about search of equations. In addition to search, SymboLab provides derivations and plot. It's not clear whether these are dynamic or hand-coded. SymboLab appears to be more mathematics oriented, whereas Formula-Database has descriptions of the physics and symbol definitions. EquationMap is about user-generated graphs of derivations. My project has a poorer user interface, and the objective is grander (capture all of physics derivations).

None of these three are direct competition with this open source project, though there is overlap with subtasks from each project. I don't see a clear use case for any of the three -- if it were that useful as a product, I would have expected someone else to already be working in this area.

I'm not clear on why there aren't other open source projects in this area. Proofwiki.org is not graph-based, and PlanetPhysics disappeared.

20140527 task prioritization

Task prioritization:

  • navigable interface. The current graph is large enough that intuitive navigation is an issue which needs to be addressed. The graph is currently rendered as a static PNG file. Rendering the graph in a web browser might be more accessible (allowing interactive navigation), and would generate a new set of issues to be addressed (i.e., reading XML databases into a new data structure). By "navigable interface" I am distinguishing from an interface for entering new data to be stored in the database
  • ability to high-light subsections of the total graph related to a specific derivation
  • ability to high-light the symbol use graph within the statement graph
  • add content (E=mc^2, Maxwell's equations)
  • fully check graph using CAS (currently SymPy)
  • user input through CLI
  • user input through web browser GUI
To create a navigable interface, my first guess is to start by finding an example of an HTML5/Javascript directed graph with png/svg files as nodes. This is because I think it's probably unrealistic to render the Latex statements in a browser with a dynamic graph

As far as intuitive interface, I like Google Maps and its ability to zoom in/out when visualizing large spatial data. EquationMap is a good start but isn't open source.

20140526 computer algebra system (CAS) inputs

In reviewing candidate Computer Algebra systems, I still think SymPy is the best option compared to the other free and open source CASs https://en.wikipedia.org/wiki/List_of_computer_algebra_systems

Currently I expect the user to provide two inputs for each statement: Latex for visual rendering, and the SymPy equivalent for checking with the CAS. This is redundant, since SymPy can render Latex.

The reason I let the user supply Latex is because as a physicist I think in terms of Latex (not SymPy). I treat the addition of the SymPy equivalent as a second step

I have a fundamental conflict with (1) wanting the input format to be easy (i.e., Latex) and (2) wanting the content to be checkable by a CAS (i.e., MathML). I don't think there is a an easy-to-input format which is also easily checkable by a CAS.

Does the database need to be checked by a CAS?

There are two reasons one builds this database:

  • as a notepad for current research, possibly to be used in a publication. Assumptions: dynamic; written to by only a few people; it may contain mistakes. Thus, it would be helpful to be checked by a CAS
  • to store relations between all accumulated knowledge. Assumptions: static database; written to by many people; read by many people. Thus, mistakes are likely to be found whether or not there is a CAS

20140510 graph vs relational database

I need to better understand the difference between Graph vs relational database vs object

  • https://en.wikipedia.org/wiki/Graph_database
  • https://en.wikipedia.org/wiki/Relational_databases