Thursday, April 30, 2020

prioritization of tasks: who is the audience? Answer: me in the role of a user

The Physics Derivation Graph has a backlog of tasks, including bugs and feature requests.
Having an objective would facilitate prioritization of the tasks.

I grappled with two candidate objectives:
  • The Physics Derivation Graph is a vanity project. I am the sole user and I don't expect any other users for the project
  • The Physics Derivation Graph is intended for use by other contributors. 
These two objectives lead to different prioritizations. For example, resolving the login capability is vital for the second objective and not for the first. In the first objective, I would be better off just running a local Docker image and posting static content to the website. 

I realized I could unify these two objectives as
  • I want to use the Physics Derivation Graph website as a "typical" user. 
I don't care whether there are other users at this point, but I do want to use the website as though I were a user.

This objective is written as a story in which value is provided to the user. Here I am the user and the value is documenting derivations on the web. 

Tuesday, April 28, 2020

noteworthy milestones in the website timeline


  • April 6, 2020: server was provisioned by DigitalOcean
  • April 26, 2020: turned on web server (nginx, gunicorn, flask)
  • April 28, 2020: purchased derivationmap.net from Google Domains
  • May 11, 2020: updated gunicorn logs to include IP forwarded by nginx
  • May 12, 2020: formatted gunicorn and nginx logs such that each line is a Python dictionary; also added timing entries in support of profiling page load times
  • May 19, 2020: first post to Reddit/r/Physics - https://www.reddit.com/r/Physics/comments/gmqc2l/visualizing_derivations/

Monday, April 27, 2020

analysis of web logs to understand how users use the website

Understanding how users use the website is important for improving the ease of use.

Suppose I have 3 website users (1, 2, 3), three webpages ("A", "B", and "C"), and each user visits three pages.

In this post I outline different data structures available for capturing user activity.

A matrix of pages and page visit order

page A page B page C
first page 1, 2 3
second page 3, 1 2
third page 1, 3 2

Here time moves from the top row towards the bottom row.
The steps taken by a single user are easy to trace.

A Markov model

A B C
A 0
B 0
C 0

A list per user

user 1: [A, B, A]
user 2: [A, C, B]
user 3: [C, B, A]

Same information as present in the matrix. 
Here time moves left-to-right
The list length can vary per user.

List of tuples per user

As a modification to the list, each element could include the page name, the render time, and the dwell time:

user 1: [(A, 0.2, 55), (B, 0.3, 20), (A, 0.4, 126)]
user 2: [(A, 0.1, 65), (C, 0.2, 234), (B, 0.4, 23)]
user 3: [(C, 0.3, 15), (B, 0.1, 53), (A, 0.3, 45)]


Saturday, April 25, 2020

disable DigitalOcean default gunicorn and nginx; start docker-compose

See this post
https://www.digitalocean.com/community/questions/how-to-stop-gunicorn-nginx-from-serving-up-the-default-project-on-ubuntu-django-droplet

$ service gunicorn stop
$ service nginx stop
$ sudo systemctl disable nginx.service
$ sudo systemctl disable gunicorn.service

Then I added "restart: always" to the docker-compose.yaml file
https://docs.docker.com/compose/compose-file/
based on this post:
https://stackoverflow.com/a/52955638/1164295

docker-compose version

At home on my Mac I use Docker Compose to build the combined nginx + gunicorn containers.
In the file proofofconcept/v7_pickle_web_interface/docker-compose.yaml
I had
version: "3.7"
which worked for me.
(The compatibility matrix is here: https://docs.docker.com/compose/compose-file/)

I ran docker-compose on DigitalOcean's 18.04 Ubuntu and got the message

$ docker-compose up --build --remove-orphans
ERROR: Version in "./docker-compose.yaml" is unsupported. 

The versions on DigitalOcean's Ubuntu 18.04 are

$ docker version
Client: Docker Engine - Community
 Version:           19.03.8

$ docker-compose version
docker-compose version 1.21.2, build a133471
docker-py version: 3.3.0
CPython version: 3.6.5
OpenSSL version: OpenSSL 1.0.1t  3 May 2016

while at home I have

$ docker version
Client: Docker Engine - Community
 Version:           19.03.8

$ docker-compose version
docker-compose version 1.25.4, build 8d51620a
docker-py version: 4.1.0
CPython version: 3.7.5
OpenSSL version: OpenSSL 1.1.1d  10 Sep 2019

The fix was to change the docker-compose.yaml line to
version: "3.7"

Friday, April 24, 2020

priorities for transitioning from a minimum viable product to a alpha version

What are the necessary aspects for creating the Physics Derivation Graph, and in what order?
  1. database structure
  2. MVC implementation
  3. features
  4. ease of use, intuitiveness
  5. https://en.wikipedia.org/wiki/Poka-yoke
  6. security
  7. latency
  8. presentation

Thursday, April 23, 2020

initial setup of docker and docker-compose on digialocean

I registered an account and paid $5 using Paypal to get a droplet.

I registered my SSH key.

I wanted to run docker-compose. The instructions for
https://www.digitalocean.com/community/tutorials/how-to-install-docker-compose-on-ubuntu-18-04
led me to
https://www.digitalocean.com/community/tutorials/initial-server-setup-with-ubuntu-18-04
where I logged in as root and set up a new user. I was then able to SSH in as the new user.

Then I completed the instructions for the Docker installation, including adding the non-root user to the docker group.
https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-18-04

Finally I was able to complete the docker-compose instructions
https://www.digitalocean.com/community/tutorials/how-to-install-docker-compose-on-ubuntu-18-04

Total time from initial registration to having docker-compose working: 25 minutes.

table or property graph?

Currently in v7 I'm using JSON to store a dictionary of nested dictionaries and lists. That design is somewhat fragile in that 1) it doesn't allow atomic operations like a database; 2) it is an approximation for a property graph.

Because what I care about is nodes and edges with attributes, my data structure and code is essentially a DSL property graph. I previously recognized this (see v5) but didn't go very deep because I wasn't satisfied by Neo4j's rendering of nodes in a web interface. Now that I've built a web interface, I am wondering whether a property graph would be a better backend.

Neo4j is open source (https://github.com/neo4j/neo4j) so I shouldn't be any more reluctant to use it than I would be with SQLite3 (https://sqlite.org/src/doc/trunk/README.md).

I've been thinking that the next iteration (v8) would be table based. The blocker for that approach is my lack of knowledge of SQL queries needed to replace the current functions that use nested dictionaries. If I were to use a property graph in v8, I'd need to learn Cypher.

Sunday, April 19, 2020

CSRF

https://flask-wtf.readthedocs.io/en/v0.12/csrf.html


If the template has a form, you don’t need to do any thing. It is the same as before:

<form method="post" action="/">
{{ form.csrf_token }}
</form>

But if the template has no forms, you still need a csrf token:

<form method="post" action="/">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}" />
</form>

useful git commands

I've started linking github issues with commits. To do that, paste the hash of the commit in Github's issue.

If you want to know the hash of the current git HEAD

$ git rev-parse HEAD

Saturday, April 18, 2020

milestones on the path to a live website

For the past few weeks I've always has "just a few more things" prior to me making the Physics Derivation Graph website live. As I progress, I keep adding more tasks that are necessary. This post is an effort to at least measure the scope creep.

For me to feel comfortable with making the website live, the ordered list of tasks is

  1. input multiple novel derivations to ensure the workflow is sensical, bug-free, and efficient
  2. aggregate logs from Nginx, gunicorn, flask
  3. implement sign-in capability
  4. perform security checks and web form fuzzing
  5. buy a domain
  6. sign in with Google
I don't have a timeline for these milestones.


Update 20200512: all these steps are completed

lesson learned about triggers for when to save and commit to git repo

Historically I've been committing changes to git whenever I complete a feature. That has worked because 1) I haven't lost any data and 2) the window for losing data was small because the features were minor.

Earlier this week I was working on implementing logins. That feature is complicated and involves lots of copy-pasting code from various sites to see what works. After a few hours I somehow typed a command in vim that deleted the first 100 lines of a 1000 line Python script. I didn't notice my mistake until I re-opened the file, at which point the backup file was already removed so my recovery using that approach was unavailable.

Because the feature of logins hadn't been completed, I lost about 4 hours of work. I was able to recover most of the missing 100 lines from the file from the previous git commit, but all the changes associated with logins had to be recreated.

The lesson I learned is that my trigger condition for committing to the git repo needed to be adapted to the change in feature size. Previously a feature took no more than an hour, so committing per-feature made sense. In the future I will try to commit every 20 to 30 minutes, even if the feature is incomplete.

Monday, April 13, 2020

analyze each commit in a repo

$ git --no-pager log --pretty=format:"%H %ad" > hash_and_date.log
$ cat hash_and_date.log | wc -l
     538

mkdir -p /tmp/my_bundle

cp hash_and_date.log /tmp/my_bundle/



$ git branch -a
* gh-pages
  master
  remotes/origin/HEAD -> origin/master
  remotes/origin/gh-pages
  remotes/origin/haskell
  remotes/origin/master
  remotes/origin/mike
  remotes/origin/test_branch

$ git bundle create pdg.bundle gh-pages
Enumerating objects: 6317, done.
Counting objects: 100% (6317/6317), done.
Delta compression using up to 4 threads
Compressing objects: 100% (5936/5936), done.
Writing objects: 100% (6317/6317), 92.21 MiB | 11.46 MiB/s, done.
Total 6317 (delta 3369), reused 150 (delta 68)
$ ls -hal pdg.bundle 
    92M Apr 13 11:44 pdg.bundle

$ cp pdg.bundle /tmp/my_bundle/
$ cd /tmp/my_bundle/
$ git clone --no-checkout pdg.bundle 
Cloning into 'pdg'...
Receiving objects: 100% (6317/6317), 92.21 MiB | 51.80 MiB/s, done.

Resolving deltas: 100% (3369/3369), done.

see https://gist.github.com/bhpayne/fb63fa0816be63733488162baebf9b14
for lcount function

$ while read -r line; do this_hash=`echo $line | cut -d' ' -f1`; this_date=`echo $line | cut -d' ' -f3-`; rm -rf *; git checkout $this_hash; lcount $this_hash "$this_date" >> ../record.log; done < ../hash_and_date.log

schema for tables

I previously proposed a schema for a relational database to hold the Physics Derivation Graph content but ended up using nested dictionaries and lists in v7.

One of the tasks associated with using the nest dictionaries was a desire to convert the content to a table design. I came up with two potential schemas for derivations but wasn't comfortable with either of them. In this post I outline a schema for tables that should work. The consequence is improved JSON-to-tables in v7 as well as enabling the transition from v7 to v8.

Columns per table:
  • "step ID", "inference rule", "derivation name", "linear index"
  • "step ID", "input local ID", "input index"
  • "step ID", "output local ID", "output index"
  • "step ID", "feed local ID", "feed index"
  • "local ID", "global ID"
  • "global ID", "expr latex", "AST as JSON"
  • "symbol ID", "symbol latex"
  • "symbol ID", "reference URL"
  • "operator name", "operator latex"
  • "operator name", "reference URL"
What's not clear to me is how to store an AST. The AST relates the use of symbols and operators in a tree structure. 
--> store AST as JSON

insertions and deletions in git versus time

I wanted to plot the changes in the code base with more detail than is shown on
https://github.com/allofphysicsgraph/proofofconcept/graphs/contributors

My first attempt was to use git log and grab the hash and date:

$ git --no-pager log --pretty=format:"%H %ad"
....
6cf2a0255e4e8ac5db4eabf086f119717e650306 Sun Jan 4 11:23:28 2015 -0500
db738d9b246a9592c9b5dc89407d7b2587df5b6f Fri Jan 2 09:06:13 2015 -0500
282a80b8b346294ef1c986d7c98f02daa3b2283d Fri Jan 2 08:58:41 2015 -0500
....

I'll save that for later,
$ git --no-pager log --pretty=format:"%H %ad" > hash_and_dat.log

Those two columns (hash, date) are necessary but not sufficient -- I also need the number of lines changed.

I saw that "git show" produces the specific changes per commit, so I could combine that with grep

$ git show 46d4649074e34019b336d13838564db90790eba6 | grep -v ^+++ | grep ^+ | wc -l
     130
$ git show 46d4649074e34019b336d13838564db90790eba6 | grep -v ^--- | grep ^- | wc -l
      20

It would be better to put the two numbers on the same line, something like

$ removed=`git show 46d4649074e34019b336d13838564db90790eba6 | grep -v ^--- | grep ^- | wc -l`; added=`git show 46d4649074e34019b336d13838564db90790eba6 | grep -v ^+++ | grep ^+ | wc -l`; echo $removed $added
20 130

Then I stumbled onto https://stackoverflow.com/a/53127502/1164295 which has almost what I wanted.  

I ran into a problem,
git diff --shortstat d2d48dcde6e04306d79f2270cdefbb846b0c6a4b | sed -E 's/[^[:digit:][:space:]]//g'

warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 2154 and retry the command.
91015

I found the fix on https://stackoverflow.com/a/28064699/1164295 and ran
git config  diff.renameLimit 2154

I made some alterations since I care about both the additions and removals
$ git diff --shortstat d2d48dcde6e04306d79f2270cdefbb846b0c6a4b | sed -E 's/[^[:digit:][:space:]]//g' | awk '{print $2 " " $3}' 66283 19430


Which can be written as a function,
$ function gcount() {
    git diff --shortstat $1 | sed -E 's/[^[:digit:][:space:]]//g' | awk '{ print $2 " " $3 }'
}

Then I ran this loop:
$ git log --pretty=format:"%H %ad" | while read hash
do
    this_hash=`echo $hash | cut -f1`
    this_date=`echo $hash | cut -d' ' -f3-`
    echo "$(gcount $this_hash)" $this_date
done > insertions_deletions_date.log

Sunday, April 12, 2020

validating that a user is human by validating steps

The Physics Derivation Graph doesn't currently support the existence of user accounts, but I expect that may be needed in the future.

There will be multiple problems to address associated with having users, and one of them is figuring out whether a user is human or not. There are many CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) methods to choose from; in this post I'll outline a CAPTCHA specific to the Physics Derivation Graph.

A challenge in the Physics Derivation Graph is to determine whether a step in a derivation is valid or not. Using a computer algebra system (CAS) like Sympy or Sage is viable for simple inference rules and simple expressions. No one CAS is capable of supporting all the PDG content, so manual intervention is necessary.

Idea: use the task of validating steps to measure whether a user is human or not.
This relies on the task of validation being challenging.

Roles:
  • step with known validity (either true or false)
  • step with unknown validity (either true or false)
  • known human user
  • computer algebra system (e.g., Sympy) capable of determining step validity
  • candidate user (either human or machine)
The steps that are validated by both a CAS and a known human will be referred to as "steps that are true" and steps that are not valid as verified by a known human will be referred to as "steps that are false." Both the CAS and the known human are fallible, but I'm going to assume a binary outcome. 

Similarly, the candidate user has been forced into a binary category of machine or human. There are gradients here (a good algorithm may be more effective than a dumb human), but I am going to focus on the humans that are smarter than algorithms. 

As with other Turing tests, a single binary question is insufficient because I need to be able to distinguish from a candidate user who merely flips a coin to answer the question.
The challenge relevant for the use of step validation can be reduced to the following:
Given N questions with a binary outcome, how certain can I be that the coin is biased?
The bias of the coin in this situation is the intelligence of the candidate user. A machine algorithm or a dumb user should have results similar to an unbiased coin, while a smart user should get more answers correct than incorrect. 

Instead of focusing on the binary question of "is the step valid or not," attention should be on "did the candidate user get the response correct or not?" with respect to a step where the outcome is known.

Coin flips are modeled by the https://en.wikipedia.org/wiki/Binomial_distribution, and the number of outcomes for N coin flips is given by https://en.wikipedia.org/wiki/Pascal%27s_triangle

Suppose I have N=3 coin flips. The unbiased coin with sides "correct" and "incorrect" will yield "correct, correct, correct" 1/8th of outcomes, just as the outcome of "incorrect, incorrect, incorrect" occurs 1/8th of the time. The other two outcomes (incor, incor, cor) (cor, cor, incor) have three permutations each. This distribution corresponsds to the "1 3  3  1" row in Pascal's triangle.

Now consider N=4 coin flips. The unbiased coin will yield "cor, cor, cor, cor" 1/16th of the time. There are 6 permutations of "cor, cor, incor, incor" which correspond with a 50% success rate -- the most common outcome for an unbiased coin. The "1 4 6 4 1" row of the triangle tells us how many permutations of each outcome there are.

Observations:
  • The "number of flips" corresponds to the second diagonal
  • There is always one permutation of "all incorrect" and one permutation of "all correct" -- these are the outermost "1" in the triangle
  • For an even number of flips, the middle number in the triangle's row is the most common successful outcomes for an unbiased coin. 
For the Physics Derivation Graph, if we provide a candidate user with N questions and they answer N-1 of them correctly, then we have the following likelihood that the coin was unbiased:
  • N=2 steps to validate, N-1=1 steps validated correctly: 50%
  • N=3 steps to validate, N-1=1 steps validated correctly: %
  • N=4 steps to validate, N-1=1 steps validated correctly: %
  • N=5 steps to validate, N-1=1 steps validated correctly: %
The motivation for using this approach is to support including an additional step for which the validation is unknown. If we have 4 steps for which the validation is known and 1 step for which the validation is unknown, then we can include the extra step and build a profile of whether candidate users think the step is valid or invalid. This extra step would need to be reviewed by many candidate users in order to build up a statistically significant ratio of votes as to the validity. 

Thursday, April 9, 2020

background tasks - considering celery+rabbitmq, rq with redis

Currently the first access to the "editor" page incurs a significant delay while pictures are generated in the background. While this task only runs the first time and could be moved elsewhere in the code to distribute the latency more evenly, I expect the need for a task queue to arise.
https://www.fullstackpython.com/task-queues.html

Celery has workers. RQ has both workers and a queue; RabbitMQ has just the queueing system. Source

Celery versus RQ:
RQ is simpler and Celery has more features. RQ only works with Redis.

Migrating from Celery to RQ: https://frappe.io/blog/technology/why-we-moved-from-celery-to-rq



migrating to tables in version 8

Currently the Physics Derivation Graph is "version 7: pkl and web interface". While v7 started as a Python Pickle data file, it then moved to JSON, and is now a JSON file stored as a string in Redis.

While I could rewrite the "JSON as string in Redis" into a proper Redis-based interface, my plan is to rewrite the code to support an SQLite3 backend. This would mean rewriting all the functions to use tables rather than nested dictionaries and lists.

While the in-memory data of Redis sounds attractive for low-latency, the downside is that the Redis server needs to be running in order to query the content. An SQLite3 database is available offline.

Three issues have held me back from implementing the database as tables. Two of the issues are about translating the nested dictionaries and lists to tables. I want the translation to be to a schema design that is compact (not too many tables) and tidy (no element should contain lists).

Issue: a symbol can be a constant or a variable, for constants there may be multiple values. Should this be one table with multiple rows per value for constants, or a table of symbols + a table for values?
The multiple tables is a better schema but not as intuitive for users.
Resolution: The HTML table displayed in the web interface doesn't have to be the same as the backend schema. I will use a single table for the web frontend and multiple tables in SQLite3.

Issue: the derivation table columns could be
['step id', 'inference rule', 'input expr1', 'input expr 2', 'input expr 3', 'feed 1', 'feed 2', 'feed 3', 'output expr 1', 'output expr 2', 'output expr 3']
which is one row per step and not tidy
or
['step id', 'in connection type', 'in id', 'out connection type', 'out id']
which has multiple rows per step and is tidy.
Resolution: use the tidy table schema and write a converter to the dictionary with lists?

Issue: I'm not comfortable with SQL
Resolution: learn SQL.

Wednesday, April 8, 2020

notes on learning redis

I'm running redis in a docker container and connecting to it from Python using
https://github.com/andymccurdy/redis-py

>>> from redis import Redis

I initially wasn't able to connect until I found this
https://stackoverflow.com/a/57681086/1164295

>>> rd.ping()
True
>>> rd = Redis(host='docker.for.mac.localhost', port=6379)

Then I used https://realpython.com/python-redis/#ten-or-so-minutes-to-redis

What keys exist?

>>> rd.keys()
[b'hits']

Look for a key that doesn't exist:

>>> print(rd.get('mykey'))
None
>>> rd.get('mykey')
>>>

Better method for looking for keys:

>>> rd.exists('mykey')
0
>>> rd.exists('hits')
1


a terrible hack to get JSON into a database

I've been using JSON to store Physics Derivation Graph content. The motive is that JSON is capable of storing data in a way that most closely reflects how I think of the data structure in Python (nested dictionaries and lists).

To support multiple concurrent users, JSON doesn't work. The multiple users with concurrent writes would require locks to ensure changes are not lost.
Migrating from JSON to a table-based data structure (e.g., MySQL, PostGRESQL, SQLite) incurs a significant rewrite. Another option would be to use Redis, specifically the ReJSON plugin that alters the flat hashes in Redis to a nested structure closer to JSON.

I'm wary of using a plugin for data storge, and I'm reluctant to rewrite the PDG as tables.
There is a terrible hack that allows me to stick with JSON while also resolving the concurrency issue that doesn't require a significant rewrite: I could serialize the JSON and store it in Redis as a very long string.

Redis has a maximum string length of 512 MB (!) according to
https://redis.io/topics/data-types

What I'm currently doing:
>>> import json
>>> path_to_db = 'data.json'
>>> with open(path_to_db) as json_file:
     dat = json.load(json_file)

Terrible hack:

Read the content as text, then save to redis
>>> with open(path_to_db) as jfil:
    jcontent = jfil.read()
>>> rd.set(name='data.json', value=jcontent)
True

which can be simplified to

>>> with open(path_to_db) as jfil:
    rd.set(name='data.json', value=jfil.read())

Then, to read the file back in, use

>>> file_content = rd.get('data.json')
>>> dat = json.loads(file_content)

Monday, April 6, 2020

data in JSON does not scale to multiple users

In version 7 of the Physics Derivation Graph I realized that I could use Python's Pickle format to serialize the data stored in memory without having to decide what storage format (CSV, XML, SQLite) is best.  That insight lead to use of JSON because everything needed fits in dictionaries and lists.

The use of Pickle and then then JSON enabled development of many features, so it was a worthwhile investment. However, some operations are not well suited to the nested dictionaries and lists. A set of tables might be better for some operations. Converting from the current dictionaries and lists to tables would be a big rewrite, so I haven't started that yet.

If I move away from JSON for storage, the current candidates are Redis and PostgreSQL and SQLite3.

Use of a relational database would require a significant rewrite since most of the functions in the PDG rely on the nested dictionaries.

As a potentially easier transition, Redis has a plugin that supports JSON:
https://redislabs.com/blog/redis-as-a-json-store/
https://redislabs.com/redis-best-practices/data-storage-patterns/json-storage/
However, I'm not comfortable with the PDG being dependent on a plugin.

If I go with a relational database, I'll need to choose which one.
MySQL or PostgreSQL versus SQLite
https://stackoverflow.com/a/5102105/1164295
"SQLite can support multiple users at once. It does however lock the whole database when writing, so if you have lots of concurrent writes it is not the database you want (usually the time the database is locked is a few milliseconds - so for most uses this does not matter)."
https://www.sqlite.org/whentouse.html
"Any site that gets fewer than 100K hits/day should work fine with SQLite."
To improve concurrency, reads can happen without blocking writes: https://www.sqlite.org/wal.html

Saturday, April 4, 2020

why implementing a single feature took 12 hours

Yesterday I started investigating how to get d3.js working for the Physics Derivation Graph.  I already had an implementation working on the live website, so I didn't expect the update to take too much effort or time.

Below is the sequence of challenges I encountered for this feature update.
  1. I learned that I had used v3; the current version is v5
  2. v5 doesn't support the .force() used in v3
  3. I found a v5-based force directed graph on https://observablehq.com/@d3/force-directed-graph
  4. Although I was able to get the code running locally, I found the files seemed to depend on remote resources. 
  5. I found a better instance that was pure d3.v5.js instead of relying on observable code: https://bl.ocks.org/mapio/53fed7d84cd1812d6a6639ed7aa83868
  6. Figured out how to get images associated with nodes
  7. The JSON file needs images to have distinct and consistent names
  8. Instead of temporary image file names, use expr_global_id and expr_name
  9. The functions using "return False, error_message" meant the errors didn't propagate to the web interface. The "right" method is to use "raise Exception" 
  10. With exceptions raised in compute, needed to add "try/except" in controller.py
  11. With Exceptions caught in controller.py, use flash() to tell the user there was a problem
  12. With Exceptions now sent to user via web interface, I learned that the PNG wasn't being created due to a missing command, "braket"
  13. I found that "braket" is a latex package available from CTAN
  14. I tried to install "braket" using "tlmgr install"; see https://tex.stackexchange.com/questions/73016/how-do-i-install-an-individual-package-on-a-linux-system
  15. I wasn't able to run "tlmgr" in Docker due to not having wget
  16. I wasn't able to install wget in Docker using "apt-get install -y wget", possible due to using phusion as a base image?
  17. Looked up instructions on installing packages manually; opened https://github.com/allofphysicsgraph/proofofconcept/issues/82
  18. In the process of debugging the PDF generation (notice that I strayed from the d3js effort), realized the migration of inference rules was incomplete -- new style is to have words separated by spaces in create_tmp_db.py
  19. Added an exception in compute.py to identify inconsistent inference rule names
  20. Manually fixed inference rule entries in create_tmp_db.py
  21. Altered the inference rule schema in compute.py -- use feeds+inputs+outputs
  22. Manually updated inference rules in create_tmp_db.py to reflect revised schema
  23. Compiling derivation PDF failed due to incorrect implementation of inference rule
  24. Realized that the "braket" issue wasn't a missing package, it was custom macros defined in an old version of the PDG
  25. Wrote function to generate JSON needed for d3js
  26. In the process of iterating that, added page latency measurement
Lessons learned:
  • In the process of implementing a new feature or updating a feature, I uncovered a few bugs and a lot of technical debt that lead to the implementation taking longer than expected
  • Some of the bugs were easy to fix (aka buy down the tech debt) as I discovered them, while others were sufficiently worthy of a new ticket. 
  • Some bugs were blockers -- I couldn't proceed with the desired work until I resolved architecture flaws; other issues were tangential and could be delayed.