Friday, May 1, 2020

grepping nginx logs to observe user behavior

What IP addresses made page requests and how many pages did they request?

$ cat nginx_access.log | cut -d' ' -f1,7 | grep -v "\.xml\|\.js\|php\|cgi\|\.png\|\.txt\|/$\|400$" | cut -d' ' -f1 | sort | uniq -c | sort -nr
    431 71.244.214.232
    301 18.223.152.78
    131 66.249.79.109   - Googlebot
    106 96.245.195.226
     50 66.249.79.111   - Google crawler
     24 66.249.79.113   - Google crawler
     23 174.198.15.222
      9 35.197.133.35

That same list without the leading counts:
$ cat nginx_access.log | cut -d' ' -f1,7 | grep -v "\.xml\|\.js\|php\|cgi\|\.png\|\.txt\|/$\|400$" | cut -d' ' -f1 | sort | uniq -c | sort -nr | head -n 20 | tr -s " " | cut -d' ' -f3
which is handy for https://www.maxmind.com/en/geoip-demo


What were the page dwell times for a given IP address?

$ ip="18.223.152.78"
$ cat nginx_access.log | grep $ip | cut -d' ' -f4,7 | grep -v "\.png\|\.js"
[30/Apr/2020:19:19:29 /navigation
[30/Apr/2020:19:19:35 /list_all_expressions?referrer=navigation
[30/Apr/2020:19:19:42 /list_all_symbols?referrer=_table_of_expressions


What was the user agent strings for a given IP address?

$ cat nginx_access.log | grep $ip | cut -d' ' -f12- | sort | uniq -c
     60 "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
      3 "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.92 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
      8 "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"

No comments:

Post a Comment