Sunday, May 10, 2020

histogram of expression lengths in bash

Reading the JSON as text does not work since there are multiple entries that have the key "latex"
 cat data.json | grep "            \"latex\":"

So I decided to read JSON into Python on command line
https://www.cambus.net/parsing-json-from-command-line-using-python/

That worked but I learned that handling for loops in command line requires extra work
https://stackoverflow.com/questions/2043453/executing-multi-line-statements-in-the-one-line-command-line

Once I knew the length of the values, I added a leading zero
https://stackoverflow.com/questions/21620602/add-leading-zero-python

Then I used cut to eliminate the last digit (so the histogram bin size is 10).


cat data.json |\
   python -c "exec(\"import sys, json; expr=json.load(sys.stdin)['expressions'];\nfor i,d in expr.items(): print(str(len(d['latex'])).zfill(3))\")" |\
   sort -n |\
   cut -c1-2 |\
   uniq -c
 127 00
  63 01
  75 02
  54 03
  34 04
  28 05
  17 06
  18 07
  14 08
  15 09
  11 10
  10 11
   6 12
   5 13
   2 14
   1 15
   1 16
   1 18
   1 20
   2 23
   1 27

The longest expressions
cat data.json |\
   python -c "exec(\"import sys, json; expr=json.load(sys.stdin)['expressions'];\nfor i,d in expr.items(): print(len(d['latex']))\")" |\
   sort -n |\
   tail -n 5
186
201
231
233
271

The shortest expressions
cat data.json |\
   python -c "exec(\"import sys, json; expr=json.load(sys.stdin)['expressions'];\nfor i,d in expr.items(): print(len(d['latex']))\")" |\
   sort -n |\
   head -n 5
1
1
1
1
1


Similarly, we can get the popularity of inference rules
cat data.json |\
   grep "inf rule" |\
   sed 's/"inf rule": //' |\
   tr -s " " |\
   sort |\
   uniq -c |\
   sort -n |\
   tail -n 10
  11  "substitute X for Y",
  12  "declare identity",
  13  "subtract X from both sides",
  14  "declare variable replacement",
  20  "declare final expr",
  21  "divide both sides by",
  21  "substitute LHS of expr 1 into expr 2",
  31  "simplify",
  31  "substitute RHS of expr 1 into expr 2",
  54  "declare initial expr",

No comments:

Post a Comment