Monday, April 13, 2020

insertions and deletions in git versus time

I wanted to plot the changes in the code base with more detail than is shown on
https://github.com/allofphysicsgraph/proofofconcept/graphs/contributors

My first attempt was to use git log and grab the hash and date:

$ git --no-pager log --pretty=format:"%H %ad"
....
6cf2a0255e4e8ac5db4eabf086f119717e650306 Sun Jan 4 11:23:28 2015 -0500
db738d9b246a9592c9b5dc89407d7b2587df5b6f Fri Jan 2 09:06:13 2015 -0500
282a80b8b346294ef1c986d7c98f02daa3b2283d Fri Jan 2 08:58:41 2015 -0500
....

I'll save that for later,
$ git --no-pager log --pretty=format:"%H %ad" > hash_and_dat.log

Those two columns (hash, date) are necessary but not sufficient -- I also need the number of lines changed.

I saw that "git show" produces the specific changes per commit, so I could combine that with grep

$ git show 46d4649074e34019b336d13838564db90790eba6 | grep -v ^+++ | grep ^+ | wc -l
     130
$ git show 46d4649074e34019b336d13838564db90790eba6 | grep -v ^--- | grep ^- | wc -l
      20

It would be better to put the two numbers on the same line, something like

$ removed=`git show 46d4649074e34019b336d13838564db90790eba6 | grep -v ^--- | grep ^- | wc -l`; added=`git show 46d4649074e34019b336d13838564db90790eba6 | grep -v ^+++ | grep ^+ | wc -l`; echo $removed $added
20 130

Then I stumbled onto https://stackoverflow.com/a/53127502/1164295 which has almost what I wanted.  

I ran into a problem,
git diff --shortstat d2d48dcde6e04306d79f2270cdefbb846b0c6a4b | sed -E 's/[^[:digit:][:space:]]//g'

warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 2154 and retry the command.
91015

I found the fix on https://stackoverflow.com/a/28064699/1164295 and ran
git config  diff.renameLimit 2154

I made some alterations since I care about both the additions and removals
$ git diff --shortstat d2d48dcde6e04306d79f2270cdefbb846b0c6a4b | sed -E 's/[^[:digit:][:space:]]//g' | awk '{print $2 " " $3}' 66283 19430


Which can be written as a function,
$ function gcount() {
    git diff --shortstat $1 | sed -E 's/[^[:digit:][:space:]]//g' | awk '{ print $2 " " $3 }'
}

Then I ran this loop:
$ git log --pretty=format:"%H %ad" | while read hash
do
    this_hash=`echo $hash | cut -f1`
    this_date=`echo $hash | cut -d' ' -f3-`
    echo "$(gcount $this_hash)" $this_date
done > insertions_deletions_date.log

No comments:

Post a Comment