Saturday, August 11, 2018

connecting Jupyter and Neo4j

Jupyter

$ cd v5_property_graph
$ jupyter notebook

Web browser opens to the URL
http://localhost:8888/tree
Then open a new Python notebook.

Neo4j

I'm running Neo4j Community version 3.2.3 on a Mac. I start the client GUI and then open a browser window to
http://127.0.0.1:7474/browser/

Connect Jupyter to Neo4j

from py2neo import authenticate,Graph, Node, Relationship
authenticate("127.0.0.1:7474", "neo4j", "asdf")
graph = Graph("http://127.0.0.1:7474/browser/")
graph.delete_all()

For more, see this notebook.

Saturday, August 4, 2018

Neo4j for the Physics Derivation Graph

I've been focusing my efforts on the interactive user prompt, a Python-based CLI for the Physics Derivation Graph. Effectively, I'm working through a finite state machine with associated actions for each option. (Tangential task: a pictorial representation of the state machine would be useful.)

I've use Neo4j for other tasks associated with knowledge representation, so I'm surprised I haven't considered property graphs for storing the PDG (there's no mention in my old notes or issues or anything meaningful besides a generic link on the wiki.)

One of the potential benefits of using a property graph over a normal graph is the labeling of edges. Currently when there are multiple input expressions or feeds to an inference rule, it's not clear which input is referenced. For example, consider "IntOverFromTo" which has the LaTeX expansion, "Integrate Eq.~\ref{eq:#4} over $#1$ from lower limit $#2$ to upper limit $#3$." There are three feeds. Without labeling which feed is which, the substitution is undetermined.

With a property graph, the inference rule would have pre-defined labeled edges, ie "lower_limit" and "upper_limit" and "integrate_wrt."

Benefits to using the property graph include
  • visualization tools are more likely to exist, rather than me having to code up a d3js-based web display.
  • querying and editing the graph uses standard syntax, rather than relying on me creating a Python-based CLI with pre-set abilities. 
  • the current data structure is a list of dictionaries in memory and a set of CSV files in directories; using Neo4j I wouldn't need to manage the data structure and could still translate back to plain text
  • adding additional properties (ie LaTeX for expressions versus SymPy, comments, weblinks) would be more scalable than the current data structure and schema which is manually crafted.
  • cross-platform compatibility is not lost



Thursday, August 2, 2018

cleaning up the code using pylint and flake8 and bandit

I realized with so much Python, there's a need to clean up the code.
https://www.youtube.com/watch?v=G1lDk_WKXvY
In this post I document a few software tools I used.

Pylint

$ pylint interactive_user_prompt.py --disable bad-whitespace,missing-docstring,superfluous-parens,bad-indentation,line-too-long,trailing-whitespace,len-as-condition,too-many-locals,invalid-name,too-many-branches,too-many-return-statements,too-many-statements --reports=n

and flake8

$ flake8 --ignore=E111,E225,E231,E501,E226,W291,E221,E115,E201,W293,E261,E302,E265 interactive_user_prompt.py

Not surprisingly, some of my functions are complicated (a score of greater than 10 is frowned upon)
$ python -m mccabe --min 9 interactive_user_prompt.py | wc -l
      15
$ python -m mccabe --min 15 interactive_user_prompt.py | wc -l
       4
So 15 functions scored 9 or greater; 4 functions were 15 or higher!

That's out of 50 functions and 1946 lines of Python (including comments and blank lines) $ cat interactive_user_prompt.py | wc -l
    1946
$ cat interactive_user_prompt.py | grep "^def " | wc -l
      50

Although I'm not concerned about security of a locally run Python script, I also tried bandit:
$ bandit -r interactive_user_prompt.py
which complained about my use of the shell.

I'm aware of autopep8 but haven't used it yet.