Physics Derivation Graph: histogram of expression lengths in bash

Sunday, May 10, 2020

histogram of expression lengths in bash

Reading the JSON as text does not work since there are multiple entries that have the key "latex"

 cat data.json | grep "            \"latex\":"

So I decided to read JSON into Python on command line
https://www.cambus.net/parsing-json-from-command-line-using-python/

That worked but I learned that handling for loops in command line requires extra work
https://stackoverflow.com/questions/2043453/executing-multi-line-statements-in-the-one-line-command-line

Once I knew the length of the values, I added a leading zero
https://stackoverflow.com/questions/21620602/add-leading-zero-python

Then I used cut to eliminate the last digit (so the histogram bin size is 10).

cat data.json |\

   python -c "exec(\"import sys, json; expr=json.load(sys.stdin)['expressions'];\nfor i,d in expr.items(): print(str(len(d['latex'])).zfill(3))\")" |\

   sort -n |\

   cut -c1-2 |\

The longest expressions

cat data.json |\

   python -c "exec(\"import sys, json; expr=json.load(sys.stdin)['expressions'];\nfor i,d in expr.items(): print(len(d['latex']))\")" |\

   sort -n |\

The shortest expressions

cat data.json |\

   python -c "exec(\"import sys, json; expr=json.load(sys.stdin)['expressions'];\nfor i,d in expr.items(): print(len(d['latex']))\")" |\

   sort -n |\

   head -n 5
1
1
1
1
1

Similarly, we can get the popularity of inference rules

cat data.json |\

   grep "inf rule" |\

   sed 's/"inf rule": //' |\

   tr -s " " |\

   sort |\

   uniq -c |\

   sort -n |\

   tail -n 10
  11  "substitute X for Y",
  12  "declare identity",
  13  "subtract X from both sides",
  14  "declare variable replacement",
  20  "declare final expr",
  21  "divide both sides by",
  21  "substitute LHS of expr 1 into expr 2",
  31  "simplify",
  31  "substitute RHS of expr 1 into expr 2",
  54  "declare initial expr",

Physics Derivation Graph

Sunday, May 10, 2020

histogram of expression lengths in bash

No comments:

Post a Comment