- v1_plain_text
- v2_XML
- v3_CSV
- v4_file_per_expression
- v5_property_graph
- v6_sqlite
- v7_pickle_web_interface
Each of these have required a rewrite of the code from scratch, as well as transfer code (to move from n to n+1).
These changes progress concurrently with my knowledge of data structures. I didn't know about property graphs when I was implementing v1, v2, and v3. I wasn't comfortable with SQL when I implemented v4. I didn't know about Tidy data when I implemented v1 to v6. The data structures used in the PDG slightly lag my understanding of data structures.
Within a given implementation, there are design decisions with trade-offs to evaluate. I typically don't know all the options or consequences until I implement one of them and then determine what inefficiencies exist. Knowledge gained through evolutionary iteration is expensive and takes a lot of time.
Here's an example of two "minor" tweaks that incur a rewrite of all the code. My current data structure in v7 is
dat['derivations'] = {
'fun deriv': { # name of derivation
'4928482': { # key is "step ID"
'inf rule': 'declare initial expr',
'inputs': {},
'feeds': {},
'outputs': {'9428': '4928923942'}, # key is "expr local ID", value is "expr global ID"
'linear index': 1}, # linear index for PDF and for graph orientation
'2948592': {
'inf rule': 'add X to both sides',
'inputs': {'9428': '4928923942'},
'feeds': {'3190': '9494829190'},
'outputs': {'3921': '9499959299'},
'linear index': 2},
A better data structure would be
dat['derivations'] = {
'fun deriv': { # name of derivation
'4928482': { # key is "step ID"
'inf rule': 'declare initial expr',
'inputs': {},
'feeds': {},
'outputs': {1: '9428'}, # key is index, value is "expr local ID"
'linear index': 1}, # linear index for PDF and for graph orientation
'2948592': {
'inf rule': 'add X to both sides',
'inputs': {1: '9428'},
'feeds': {1: '3190'},
'outputs': {1: '3921'},
'linear index': 2},
dat['expr local to global'] = {
'9428': '4928923942',
'3190': '9494829190',
'3921': '9499959299',
'9128': '1492842000'}
The reasons this second data structure is an improvement is
- the global expression ID does not appear in the 'derivations' dict
- the inputs, feeds, and outputs have an index. The index is relevant for both printing in a PDF and use in inference rules.
I'm slowly evolving towards the likelihood that there will be a "v8" based on tables. The backend database would be something like SQLite3, and the internal representation in Python would be dataframes.
I'm not going to switch to v8 yet; I'll continue to invest effort in v7 for a bit longer to explore a few challenges (like implementation of inference rules).