Physics Derivation Graph: data structure continues to evolve

I've experimented with seven different data structures for the Physics Derivation Graph:

v1_plain_text
v2_XML
v3_CSV
v4_file_per_expression
v5_property_graph
v6_sqlite
v7_pickle_web_interface

Each of these have required a rewrite of the code from scratch, as well as transfer code (to move from n to n+1).

These changes progress concurrently with my knowledge of data structures. I didn't know about property graphs when I was implementing v1, v2, and v3. I wasn't comfortable with SQL when I implemented v4. I didn't know about Tidy data when I implemented v1 to v6. The data structures used in the PDG slightly lag my understanding of data structures.

Within a given implementation, there are design decisions with trade-offs to evaluate. I typically don't know all the options or consequences until I implement one of them and then determine what inefficiencies exist. Knowledge gained through evolutionary iteration is expensive and takes a lot of time.

Here's an example of two "minor" tweaks that incur a rewrite of all the code. My current data structure in v7 is

dat['derivations'] = {

'fun deriv': { # name of derivation

'4928482': { # key is "step ID"

'inf rule': 'declare initial expr',

'inputs': {},

'feeds': {},

'outputs': {'9428': '4928923942'}, # key is "expr local ID", value is "expr global ID"

'linear index': 1}, # linear index for PDF and for graph orientation

'2948592': {

'inf rule': 'add X to both sides',

'inputs': {'9428': '4928923942'},

'feeds': {'3190': '9494829190'},

'outputs': {'3921': '9499959299'},

'linear index': 2},

A better data structure would be

dat['derivations'] = {

'fun deriv': { # name of derivation

'4928482': { # key is "step ID"

'inf rule': 'declare initial expr',

'inputs': {},

'feeds': {},

'outputs': {1: '9428'}, # key is index, value is "expr local ID"

'linear index': 1}, # linear index for PDF and for graph orientation

'2948592': {

'inf rule': 'add X to both sides',

'inputs': {1: '9428'},

'feeds': {1: '3190'},

'outputs': {1: '3921'},

'linear index': 2},

dat['expr local to global'] = {

'9428': '4928923942',

'3190': '9494829190',

'3921': '9499959299',

'9128': '1492842000'}

The reasons this second data structure is an improvement is

the global expression ID does not appear in the 'derivations' dict
the inputs, feeds, and outputs have an index. The index is relevant for both printing in a PDF and use in inference rules.

I'm slowly evolving towards the likelihood that there will be a "v8" based on tables. The backend database would be something like SQLite3, and the internal representation in Python would be dataframes.

I'm not going to switch to v8 yet; I'll continue to invest effort in v7 for a bit longer to explore a few challenges (like implementation of inference rules).

Physics Derivation Graph

Friday, March 20, 2020

data structure continues to evolve

No comments:

Post a Comment