Tuesday, July 31, 2018

mathjax for github.io site

For the project website hosted at http://allofphysicsgraph.github.io/proofofconcept/ I've been using static PNGs generated off-line to render the expressions in the graph using d3js.

In order to dynamically enter content on a webpage without resorting to off-line rendering, I used MathJax to display content.

The javascript for MathJax is at
https://github.com/mathjax/MathJax/blob/master/MathJax.js
with instructions for use here
https://docs.mathjax.org/en/latest/configuration.html

I was able to get a page that accepts input and renders Latex input:
http://allofphysicsgraph.github.io/proofofconcept/site/mjtest
Source code for the page is here:
https://github.com/allofphysicsgraph/proofofconcept/blob/gh-pages/site/mjtest.html

Next I ran scaling tests for latency as a function of the number of rendered expressions in Chrome.

25 expressions:
  • http://allofphysicsgraph.github.io/proofofconcept/site/mjtest_scaling_25
  • DOMContentLoaded: 203 ms; Load: 516 ms; Finish: 835 ms
50 expressions:
  • http://allofphysicsgraph.github.io/proofofconcept/site/mjtest_scaling_50
  • DOMContentLoaded: 202 ms; Load: 548 ms; Finish: 977 ms
100 expressions:
  • http://allofphysicsgraph.github.io/proofofconcept/site/mjtest_scaling_100
  • DOMContentLoaded: 220 ms; Load: 538 ms; Finish: 1140 ms

Sunday, July 22, 2018

Python: convert XML to dictionary

#https://stackoverflow.com/questions/13101653/python-convert-complex-dictionary-of-strings-from-unicode-to-ascii
def convert(input):
    if isinstance(input, dict):
        return {convert(key): convert(value) for key, value in input.iteritems()}
    elif isinstance(input, list):
        return [convert(element) for element in input]
    elif isinstance(input, unicode):
        return input.encode('utf-8')
    else:
        return input

#https://docs.python-guide.org/scenarios/xml/
import xmltodict
with open('sample.xml') as fd:
  doc = xmltodict.parse(fd.read())
#print(doc)

# doc is an ordered dict containing unicode. 

#https://stackoverflow.com/questions/3860813/recursively-traverse-multidimensional-dictionary-dimension-unknown
import pprint
#pprint.pprint(doc) # expects dict, not ordered dict

#https://stackoverflow.com/questions/20166749/how-to-convert-an-ordereddict-into-a-regular-dict-in-python3
import json
from collections import OrderedDict
output_dict = json.loads(json.dumps(doc))

# remove the unicode from keys and values
doc = convert(output_dict)

pprint.pprint(doc)

Friday, July 20, 2018

analyzing the text of Wikipedia posts

In a previous post, an outline for analyzing Wikipedia content was described. In this post, I document a few initial observations about the data collected from Wikipedia.

Searching for "derivation" as a section marker means searching for "=== Derivation ===". There are other meanings to derivation, so sometimes the results include non-mathematical content like "=== Derivation and other names ===". To filter out irrelevant content, only sections with mathematical expressions (ie ":<math>") are relevant.

In addition to the text, there are potentially relevant images like
https://en.wikipedia.org/wiki/File:Derivation_of_acoustic_wave_equation.png
which has dimensions 813 × 570 pixels. Pictures with "derivation" in the name and dimensions greater than 300 x 300 might be relevant.

In the "derivation" section, lines that start with ":<math>" in the text are expressions. The closing bracket "</math>" may occur on a following line. 

a differnet approach to generating content for the Physics Derivation Graph

I've been focused on creating the interface for the Physics Derivation Graph to enable manual entry of content. An alternative method to create content would be parsing large databases like Wikipedia.

The first step would be to extract pages that contain derivations. Pages with a section title containing "derivation" and containing at least three mathematical expressions in that section would be a useful set. Suppose there are a thousand pages containing derivations which contain text+Latex.

Side note: Wikipedia "text is available under the Creative Commons Attribution-ShareAlike License."

Given 1000 pages of text+Latex there are two nested challenges:

  1. Between any two adjacent expressions in your data set, there are likely a bunch of missing steps.
  2. Suppose all the expressions were present. Even in that situation, the inference rules are missing. Filling in these is a big challenge.

To address these challenges, text analysis would be useful. Suppose the sequence is
  • text1
  • expression1
  • text2
  • expression2
  • text3
  • expression3
  • text4
There are a few distinct categories of text to analyze:
  • s1 = the last two sentences in "text1" proceeding "expression1"
  • s(i) = if text2 and text3 are short (ie a few sentences), then they are potential inference rules
  • s(j) = if text2 and text3 are longer than a few sentences, then probably the two sentences following an expression and the two sentences proceeding an expression are relevant
  • sf = the first two sentences of the "text4" which is text after the last expression.
We now have 1000 instances of "s1" sentences. In this "s1" data set, what's the most common word? What's the most common two word phrase? What's the most common three word phrase? If there are things that look like inference rules, that would be interesting. I doubt that "declare initial expression" will appear, but some consistency would be validating.

Similarly, run the same word and phrase frequency analysis for the 1000 "sf" sentences. Also apply to each of "s(i)" and "s(j)."

Thursday, July 19, 2018

relevant posts on reddit

Open:

https://old.reddit.com/r/Physics/comments/8vurwq/derivation_for_dummies_a_quick_guide_to_one_of/

Closed:

https://old.reddit.com/r/Physics/comments/725dz1/keeping_track_of_derivations/

https://old.reddit.com/r/Physics/comments/1v0wap/derivations_of_equations/

https://old.reddit.com/r/Physics/comments/1c7uas/derivation_of_the_schrodinger_equation_in_under/

https://old.reddit.com/r/Physics/comments/7y9gkz/are_derivations_worth_it/

forums to contribute to

I don't expect people to serendipitous stumble upon either this blog or the github page or the project page. Therefore, part of my responsibility is to socialize the existence of my effort on forums that I think interested parties may already be a part of. Building a community, gaining a user base, and finding collaborators are potential outcomes.

I don't want to simply advertise on these channels. Instead, I intend to provide value to address challenges participants face. By demonstrating value, the community for the Physics Derivation Graph grows. If the PDG doesn't provide value, then I shouldn't expect a community to develop.

Brainstorming relevant channels,

Monday, July 16, 2018

Physics of Minecraft derivation - the graph is unwieldy

There's a video on the Physics of Minecraft which measures the gravitation in Minecraft. I wanted to see how well the projectile motion is described in the Physics Derivation Graph.
Screenshot from 0:35 in the video. Useful commentary is on news.ycombinator. The post on Wired.com was basic.

On paper, the derivation was 4 expressions and two lines of text. The Physics Derivation Graph yields a cumbersome 7 expressions and a total of 25 nodes (feeds, expressions, and inference rules).

Current output from the Physics Derivation

The reason the graph is large is because the "subXforY" is used three times. Analysis of the midpoint is really three concurrent substitutions: at y=y_mid, v_horizontal=0 and t=t_midpoint. Concurrent substitutions are not supported, so three steps are required.

Also, the current implementation lacks support for comments.

Wednesday, July 11, 2018

The Physics Derivation Graph is for workflow management

There's no complicated math underlying the Physics Derivation Graph. The code base is primarily about tracking numeric indices and strings in Python dictionaries and lists read from plain text CSVs.

Similarly, there are no fancy algorithms in the software.

The lack of complicated math and fancy algorithms is because the Physics Derivation Graph is for workflow management of mathematical Physics. Encoding the logic and processes is merely management of simple data (numerical indices, strings to represent math).

Saturday, July 7, 2018

snapshot of milestones for the Physics Derivation Graph


Each node is a milestone/task. The task points towards a follow-on task.

There are two components; one of which is focused on Latex-based input and the other to do with syntactically-meaningful content.

Wednesday, July 4, 2018

static analysis of function dependency in Python

With 1400 lines of Python, I wanted to find a way to visualize the static dependencies of functions internal to the script
https://github.com/allofphysicsgraph/proofofconcept/blob/gh-pages/v4_file_per_expression/bin/interactive_user_prompt.py

I looked at PyCallGraph but it only supports dynamic call graphs. In addition to Pyan and Snakefood, I found a blog post that included an AST parser as a single file.

python construct_call_graph.py -i ../../proofofconcept/v4_file_per_expression/bin/interactive_user_prompt.py > graph.dot

add "overlap=false;" in the graph.dot file

neato -Tpng graph.dot -o graph.png

which yields