My first step is to review logins on the site,
https://physicsderivationgraph.blogspot.com/2020/05/inspecting-list-of-users-who-have.html
My previous post on reviewing logs
https://physicsderivationgraph.blogspot.com/2020/05/grepping-nginx-logs-to-observe-user.html
was written prior to the current nginx format I'm using.
I haven't gotten around to a deeper analysis like
https://physicsderivationgraph.blogspot.com/2020/04/analysis-of-web-logs-to-understand-how.html
First I had to install supporting software
sudo apt install python3-pip pip3 install pandas
Inline Python in bash with Pandas is possible because every line is formatted like a Python dictionary. Here I want to review what columns are present in the logs
cat nginx_access.log | python3 -c "import sys import pandas pandas.options.display.max_rows = 999 # https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html list_of_lines = [] for line in sys.stdin: list_of_lines.append(eval(line)) df = pandas.DataFrame(list_of_lines) print(df.columns) "How many of each entry for a few columns?
cat nginx_access.log | python3 -c "import sys import pandas pandas.options.display.max_rows = 999 # https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html list_of_lines = [] for line in sys.stdin: list_of_lines.append(eval(line)) df = pandas.DataFrame(list_of_lines) threshold = 20 print('user:') vc = df['user'].value_counts() print(vc[vc>threshold]) print('IP:') vc = df['ip'].value_counts() print(vc[vc>threshold]) print('req:') vc = df['req'].value_counts() print(vc[vc>threshold]) #print(df.head()) "For IPs that have made multiple (e.g., 30) requests, what pages have been accessed?
cat nginx_access.log | python3 -c "import sys import pandas pandas.options.display.max_rows = 999 # https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html list_of_lines = [] for line in sys.stdin: list_of_lines.append(eval(line)) df = pandas.DataFrame(list_of_lines) threshold = 30 vc = df['ip'].value_counts() for ip, number_of_requests in vc[vc>threshold].items(): print('\nIP = ',ip, 'made',number_of_requests,'requests') df_this_ip = df[df['ip']==ip] #for request in df_this_ip['req'].values: # print(request) print(df_this_ip['req'].value_counts()) "