Friday, March 20, 2020

data structure continues to evolve

I've experimented with seven different data structures for the Physics Derivation Graph:

  • v1_plain_text
  • v2_XML
  • v3_CSV
  • v4_file_per_expression
  • v5_property_graph
  • v6_sqlite
  • v7_pickle_web_interface
Each of these have required a rewrite of the code from scratch, as well as transfer code (to move from n to n+1). 

These changes progress concurrently with my knowledge of data structures. I didn't know about property graphs when I was implementing v1, v2, and v3. I wasn't comfortable with SQL when I implemented v4. I didn't know about Tidy data when I implemented v1 to v6.  The data structures used in the PDG slightly lag my understanding of data structures. 

Within a given implementation, there are design decisions with trade-offs to evaluate. I typically don't know all the options or consequences until I implement one of them and then determine what inefficiencies exist. Knowledge gained through evolutionary iteration is expensive and takes a lot of time. 

Here's an example of two "minor" tweaks that incur a rewrite of all the code. My current data structure in v7 is

dat['derivations'] = {
  'fun deriv': { # name of derivation
     '4928482': {    # key is "step ID"   
          'inf rule': 'declare initial expr',
          'inputs':  {},
          'feeds':   {},
          'outputs': {'9428': '4928923942'}, # key is "expr local ID", value is "expr global ID"
          'linear index': 1}, # linear index for PDF and for graph orientation
     '2948592': {
          'inf rule': 'add X to both sides',
          'inputs':  {'9428': '4928923942'},
          'feeds':   {'3190': '9494829190'},
          'outputs': {'3921': '9499959299'},
          'linear index': 2},

A better data structure would be

dat['derivations'] = {
  'fun deriv': { # name of derivation
     '4928482': {    # key is "step ID"   
          'inf rule': 'declare initial expr',
          'inputs':  {},
          'feeds':   {},
          'outputs': {1: '9428'}, # key is index, value is "expr local ID"
          'linear index': 1}, # linear index for PDF and for graph orientation
     '2948592': {
          'inf rule': 'add X to both sides',
          'inputs':  {1: '9428'},
          'feeds':   {1: '3190'},
          'outputs': {1: '3921'},
          'linear index': 2},

dat['expr local to global'] = {
        '9428': '4928923942',
        '3190': '9494829190',
        '3921': '9499959299',
        '9128': '1492842000'}

The reasons this second data structure is an improvement is
  1. the global expression ID does not appear in the 'derivations' dict
  2. the inputs, feeds, and outputs have an index. The index is relevant for both printing in a PDF and use in inference rules. 
I'm slowly evolving towards the likelihood that there will be a "v8" based on tables. The backend database would be something like SQLite3, and the internal representation in Python would be dataframes. 

I'm not going to switch to v8 yet; I'll continue to invest effort in v7 for a bit longer to explore a few challenges (like implementation of inference rules). 

Friday, March 13, 2020

notes from reading "A Step-by-Step Solution Methodology for Mathematical Expressions"

A Step-by-Step Solution Methodology for Mathematical Expressions
by Sahereh Hosseinpour, Mir Mohammad Reza Alavi Milani and Hüseyin Pehlivan
Symmetry 2018, 10(7), 285; https://doi.org/10.3390/sym10070285
https://www.mdpi.com/2073-8994/10/7/285/htm


The following sites parse input, take a specified action, and show the steps
(I'm not sure whether these two sites are related or clones of each other)
https://www.softmath.com/math-com-calculator/quadratic-equations/myalgebra.com.html
https://www.mathsite.org/maths-factors/perpendicular-lines/myalgebra.com.html
Also see https://www.mathway.com/Algebra and http://www.webmath.com/

I find it fascinating to read about someone else having an idea similar to mine:
"Our methodology uses a grammar-based approach to convert mathematical expressions into abstract syntax trees (AST) on which new methods can be developed for different evaluation or interpretation requirements. In this way, users can dynamically enter expressions and all the intermediate operations on them can successfully be achieved through AST nodes." ... "It can also be a good platform for the development of educational products in the field of mathematics, which can display all problem-solving steps."

I'm not clear on the distinctions of the three approaches to solvers, but it's helpful to know there's more than one approach

  • computer algebra system
  • rule based: "Starting with a mathematical expression, this approach transforms it into another equivalent expression and compares it with the intended target expression for equivalence verification."
  • structural approach: "two mathematical expressions are first represented as two mathematical expression trees. Then, tree matching algorithm is used to compare the two trees using dynamic programming for equivalence verification.

Section 2.3 walks through the entire process of getting from BNF to an AST. 

avoiding the need for logins in the Physics Derivation Graph website

Now that the PDG interface is a webpage, and users (not just developers) are the audience, there is a question of whether users of the website need accounts.

User accounts

Having an account would enable persistence of data specific to one user.
Downsides:

  • Password management and security
  • user account shenanigans


No users accounts

Live use without an account, modeled after services like https://repl.it/  and https://www.onlinecharttool.com/graph, enable users to explore the capabilities without need for registering and creating a password.

Data could be uploaded and download

Source code

Historically the Physics Derivation Graph was developed on the command line using bash and Python. This implied a single user.
The source code was available, so other developers could pull a local copy to make edits.

API calls

This would allow interfaces to be developed that are independent of the backend.

Sunday, March 8, 2020

why I'm excited about the Physics Derivation Graph

A few months ago I realized that rather than try to figure out what the best storage format was (MathML, Latex, Sympy, etc) and what database should store that representation (CSV, SQL, XML, etc), the "easy" solution was to simply store strings in dictionaries as a Python Pickle. No translation needed -- simply save the internal representation to disk and read it in as a Python variable.

Then I had an insight about how the front-end was supposed to work using the Model-View-Controller (MVC) approach. Now I felt comfortable about both the back-end and front-end aspects of the PDG! I had a backlog of features which were now easy and intuitive to implement. However, that didn't result in the excitement and motivation I now feel.

In the process of reviewing my hand-written notes from graduate school, I realized I now have actual hope of converting the notes to an electronic format. My reinvigorated interest in implementing the Physics Derivation graph is because I now have the relevant aspects figured out and see a well-defined end point.

That shifted my view of what I should be working on in the PDG code. Rather than working on arbitrary features, I am now focused on addressing aspects that are blocking me from converting my paper notes into PDG content.