Saturday, September 19, 2015

converting the Physics Derivation Graph from CSV to a property graph

I started the Physics Derivation Graph as a plain-text CSV set of files. I then converted the content to XML files. The current instance of the Physics Derivation Graph is back to a set of CSVs. CSV is attractive for its universal parsability and human readability.

Recently I learned about property graphs. Expanding on the node/edge idea of graphs, a property graph adds types and properties:

Nodes have
  • node id
  • node type
  • node property (key-value pair)
Edges have
  • edge id
  • edge type
  • edge property (key-value pair)
Using syntax from Neo4j for a simple example,

CREATE ( 149832:Expression { latex:"k=m j" } )
CREATE ( 119831:Expression { latex:"k/j=m" } )
CREATE ( 149832 )-[:DIVIDEBOTHSIDESBY { feed_1:"j" } ]->( 119831 )

Each expression is a node, and inference rules are directed edges. 

Complications in translating: 
  • within each derivation, each expression has a local ID for latex labels. 
  • each expression belongs to one or more derivations
  • each expression has a different representation in various CASs
  • Latex uses a double quote ("), so parsing may break. Reference pictures instead?
CREATE ( 149832:Expression { picture: "149832.png", sympy:"k=m*j", local_tex_id:"59589", in_derivations:["maxwell","funky"] } )
CREATE ( 119831:Expression { picture: "119831.png", sympy:"k/j=m", local_tex_id:"58584", in_derivations:"maxwell" } )
CREATE ( 149832 )-[:DIVIDEBOTHSIDESBY { feed_1_picture: "4958.png" } ]->( 119831 )

Although property graphs could have an equivalent amount of information compared to the current CSV format, the complexity is not less than the current CSV method. Also, only Neo4j supports this context. Thus, I'm not currently motivated to switch to property graph representation.

No comments:

Post a Comment