Sunday, June 26, 2016

build a link graph


The Physics Derivation Graph (PDG) site uses relative links. This complicates creating a graph of the site links using wget since -k can be used together with -O only if outputting to a regular file.

$ root_page=allofphysicsgraph.github.io/proofofconcept/
$ wget http://$root_page -q -O -  | grep -i -o '<a[^>]\+href[ ]*=[ \t]*"[^"]\+"' | sed 's/<a href="//' | sed 's/"//' | grep -v https | sort | uniq > list_of_pages
$ while read -r line; do this_page="$line"; wget $root_page$this_page -q -O - | grep -i -o '<a[^>]\+href[ ]*=[ \t]*"[^"]\+"' | sed 's/<a href="//' | sed 's/"//' | grep -v https | sort | uniq >> list_of_pages2; done < list_of_pages 
$ cat list_of_pages2 | grep -v http | sed 's/^/site\//' >> list_of_pages
$ cat list_of_pages | sort | uniq > list_of_pages_master
$ rm list_of_pages list_of_pages2
$ while read -r line; do this_page="$line"; wget $root_page$this_page -q -O - | grep -i -o '<a[^>]\+href[ ]*=[ \t]*"[^"]\+"' | sed 's/<a href="//' | sed 's/"//' | awk -v thispage="$this_page" '{print thispage" ->", $0";"}' >> graph_level2.gv; done < list_of_pages_master


Thursday, June 9, 2016

Python code validation

https://sourcegraph.com

https://github.com/yinwang0/pysonar2
https://yinwang0.wordpress.com/2010/09/12/pysonar/

http://pychecker.sourceforge.net/
not that useful for working code:

$ python bin/generate_new_random_index.py 
expression permanent index: 3135868900
expression temporary index: 3901417
feed temporary index      : 4031120
inf rule temporary index  : 2688703

$ pychecker bin/generate_new_random_index.py 
Processing module generate_new_random_index (bin/generate_new_random_index.py)...
expression permanent index: 1210104125
expression temporary index: 7904139
feed temporary index      : 7185822
inf rule temporary index  : 2695903

Warnings...

bin/generate_new_random_index.py:15: Imported module (random) not used