cat data.json | grep " \"latex\":"
So I decided to read JSON into Python on command line
https://www.cambus.net/parsing-json-from-command-line-using-python/
That worked but I learned that handling for loops in command line requires extra work
https://stackoverflow.com/questions/2043453/executing-multi-line-statements-in-the-one-line-command-line
Once I knew the length of the values, I added a leading zero
https://stackoverflow.com/questions/21620602/add-leading-zero-python
Then I used cut to eliminate the last digit (so the histogram bin size is 10).
cat data.json |\
python -c "exec(\"import sys, json; expr=json.load(sys.stdin)['expressions'];\nfor i,d in expr.items(): print(str(len(d['latex'])).zfill(3))\")" |\
sort -n |\
cut -c1-2 |\
uniq -c 127 00 63 01 75 02 54 03 34 04 28 05 17 06 18 07 14 08 15 09 11 10 10 11 6 12 5 13 2 14 1 15 1 16 1 18 1 20 2 23 1 27
The longest expressions
cat data.json |\
python -c "exec(\"import sys, json; expr=json.load(sys.stdin)['expressions'];\nfor i,d in expr.items(): print(len(d['latex']))\")" |\
sort -n |\
tail -n 5 186 201 231 233 271
The shortest expressions
cat data.json |\
python -c "exec(\"import sys, json; expr=json.load(sys.stdin)['expressions'];\nfor i,d in expr.items(): print(len(d['latex']))\")" |\
sort -n |\
head -n 5 1 1 1 1 1
Similarly, we can get the popularity of inference rules
cat data.json |\
grep "inf rule" |\
sed 's/"inf rule": //' |\
tr -s " " |\
sort |\
uniq -c |\
sort -n |\
tail -n 10 11 "substitute X for Y", 12 "declare identity", 13 "subtract X from both sides", 14 "declare variable replacement", 20 "declare final expr", 21 "divide both sides by", 21 "substitute LHS of expr 1 into expr 2", 31 "simplify", 31 "substitute RHS of expr 1 into expr 2", 54 "declare initial expr",
No comments:
Post a Comment