Thursday, December 4, 2025

website unresponsive: diagnostic steps and blocking the AI crawlers

I received an email alert this morning with subject "Wachete error notification" which indicated my website wasn't responsive. 

In a browser I verified that https://allofphysics.com/ is hanging. (No immediate error response.) After a few minutes I got "504 Gateway Time-out nginx/1.17.9"

I ssh'd into the VPS (virtual private server) and ran 
docker ps
to verify the containers were running.

I used
top
to verify the CPU load and memory load. Two instances of gunicorn are using 2% of the CPU each and 10% of the RAM each. That's expected.

Next I logged into the VPS web portal to review system usage for the past 7 days. There is certainly a noticeable change of metrics that started suddenly yesterday:



The last interaction I had with the server (more than a week ago) was to update the HTTPS certificates using Let's Encrypt. Although the website had returned a 504 error I could check the certificate expiration in the browser. The certs were valid.

In the logs directory on the server 

-rwxrwxrwx 1 usr  usr          0 Sep  3  2024 auth.log*
-rw-r--r-- 1 usr  usr    6194913 Dec  3 14:38 flask_critical_and_error_and_warning.log
-rw-r--r-- 1 usr  usr     383596 Nov 26 06:27 flask_critical_and_error_and_warning.log.1
-rw-r--r-- 1 usr  usr    9999931 Nov 23 18:02 flask_critical_and_error_and_warning.log.2
-rw-r--r-- 1 usr  usr    9945758 Dec  3 14:42 flask_critical_and_error_and_warning_and_info.log
-rw-r--r-- 1 usr  usr    9999983 Dec  2 16:14 flask_critical_and_error_and_warning_and_info.log.1
-rw-r--r-- 1 usr  usr    9999938 Dec  1 16:40 flask_critical_and_error_and_warning_and_info.log.2
-rw-r--r-- 1 usr  usr    1206714 Dec  3 14:42 flask_critical_and_error_and_warning_and_info_and_debug.log
-rw-r--r-- 1 usr  usr    9999916 Dec  3 12:46 flask_critical_and_error_and_warning_and_info_and_debug.log.1
-rw-r--r-- 1 usr  usr    9999926 Dec  3 00:53 flask_critical_and_error_and_warning_and_info_and_debug.log.2
-rw-r--r-- 1 usr  usr  125459598 Dec  3 14:42 gunicorn_access.log
-rw-r--r-- 1 usr  usr  166722892 Dec  3 14:42 gunicorn_error.log
-rw-r--r-- 1 root root 126147128 Dec  4 11:01 nginx_access.log
-rw-r--r-- 1 root root  28785863 Dec  4 11:01 nginx_error.log
Only nginx logs have today's date. That's consistent with the blocker being nginx. Using
tail -f nginx_access.log
I see the latest entries are associated with https://webmaster.petalsearch.com/site/petalbot which says the crawler
"establish an index database which enables users to search the content of your site in Petal search engine and present content recommendations for the user in Huawei Assistant and AI Search services"

Using Gemini 2.5 Flash from https://aistudio.google.com/ I ask

I'm running a webserver that uses nginx and runs on linux. I am interested in blocking certain IP address ranges. Should I configure nginx to filter IP ranges or should I filter using the linux firewall? I want to use the software I already have rather than add yet another tool for this blocking.
and learn that linux firewall is recommended over nginx.

Next question for Gemini 2.5 Flash LLM is

I'm using Ubuntu for a webserver. How do I determine what firewall is being used from the command line?

I then run the following on my VPS:

$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), deny (routed)
New profiles: skip

To                         Action      From
--                         ------      ----
22/tcp (OpenSSH)           ALLOW IN    Anywhere                  
443                        ALLOW IN    Anywhere                  
80                         ALLOW IN    Anywhere                  
22/tcp (OpenSSH (v6))      ALLOW IN    Anywhere (v6)             
443 (v6)                   ALLOW IN    Anywhere (v6)             
80 (v6)                    ALLOW IN    Anywhere (v6)             
I then also verify that nft exists using
$ sudo nft list ruleset
# Warning: table ip filter is managed by iptables-nft, do not touch!
table ip filter {
	chain ufw-before-logging-input {
	}
...long output, snipped...
From a few minutes of reviewing tail -f nginx_access.log the major offenders for this denial-of-service (DOS) attack appear to be
104.210.140.141 = OpenAI, observed 2025-12-04
114.119.147.137 = PetalBot (for Huawei), observed 2025-12-04
156.59.198.136 = bytedance, observed 2025-12-04

LLM query:

ufw block IP address range for web server
followed by
how to pick the correct CIDR value for IP blocking?
from which I learn /24 is the last octet (0 to 255)

I then run

$ sudo ufw deny from 156.59.198.136/24
WARN: Rule changed after normalization
Rule added
$ sudo ufw deny from 114.119.147.0/24
Rule added
$ sudo ufw deny from 104.210.140.0/24
Rule added
Check the results
$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
OpenSSH                    ALLOW       Anywhere                  
443                        ALLOW       Anywhere                  
80                         ALLOW       Anywhere                  
Anywhere                   DENY        156.59.198.0/24           
Anywhere                   DENY        114.119.147.0/24          
Anywhere                   DENY        104.210.140.0/24          
OpenSSH (v6)               ALLOW       Anywhere (v6)             
443 (v6)                   ALLOW       Anywhere (v6)             
80 (v6)                    ALLOW       Anywhere (v6)             
The LLM had warned me that "If you find that the new deny rules are at the bottom of the list, you may need to use the insert function to put them at the top (e.g., position 1 and 2)." 
Gemini 2.5 says the general best practice for firewall rules is:
  1. Specific DENY rules (blocking known bad actors).
  2. Specific ALLOW rules (allowing trusted hosts/networks).
  3. General ALLOW rules (allowing public services).
  4. General DENY rules (the default policy, often implied).
Gemini 2.5's advice was almost correct. The LLM got the rule indices wrong. Here are the commands I ran:
$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
OpenSSH                    ALLOW       Anywhere                  
443                        ALLOW       Anywhere                  
80                         ALLOW       Anywhere                  
Anywhere                   DENY        156.59.198.0/24           
Anywhere                   DENY        114.119.147.0/24          
Anywhere                   DENY        104.210.140.0/24          
OpenSSH (v6)               ALLOW       Anywhere (v6)             
443 (v6)                   ALLOW       Anywhere (v6)             
80 (v6)                    ALLOW       Anywhere (v6)             

$ sudo ufw status numbered
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] OpenSSH                    ALLOW IN    Anywhere                  
[ 2] 443                        ALLOW IN    Anywhere                  
[ 3] 80                         ALLOW IN    Anywhere                  
[ 4] Anywhere                   DENY IN     156.59.198.0/24           
[ 5] Anywhere                   DENY IN     114.119.147.0/24          
[ 6] Anywhere                   DENY IN     104.210.140.0/24          
[ 7] OpenSSH (v6)               ALLOW IN    Anywhere (v6)             
[ 8] 443 (v6)                   ALLOW IN    Anywhere (v6)             
[ 9] 80 (v6)                    ALLOW IN    Anywhere (v6)             

$ sudo ufw delete 4
Deleting:
 deny from 156.59.198.0/24
Proceed with operation (y|n)? y
Rule deleted
$ sudo ufw insert 1 deny from 156.59.198.0/24 to any
Rule inserted
$ sudo ufw delete 4
Deleting:
 allow 80
Proceed with operation (y|n)? n
Aborted
$ sudo ufw status numbered
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] Anywhere                   DENY IN     156.59.198.0/24           
[ 2] OpenSSH                    ALLOW IN    Anywhere                  
[ 3] 443                        ALLOW IN    Anywhere                  
[ 4] 80                         ALLOW IN    Anywhere                  
[ 5] Anywhere                   DENY IN     114.119.147.0/24          
[ 6] Anywhere                   DENY IN     104.210.140.0/24          
[ 7] OpenSSH (v6)               ALLOW IN    Anywhere (v6)             
[ 8] 443 (v6)                   ALLOW IN    Anywhere (v6)             
[ 9] 80 (v6)                    ALLOW IN    Anywhere (v6)             

$ sudo ufw delete 5
Deleting:
 deny from 114.119.147.0/24
Proceed with operation (y|n)? y
Rule deleted
$ sudo ufw insert 2 deny from 114.119.147.0/24 to any
Rule inserted
$ sudo ufw status numbered
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] Anywhere                   DENY IN     156.59.198.0/24           
[ 2] Anywhere                   DENY IN     114.119.147.0/24          
[ 3] OpenSSH                    ALLOW IN    Anywhere                  
[ 4] 443                        ALLOW IN    Anywhere                  
[ 5] 80                         ALLOW IN    Anywhere                  
[ 6] Anywhere                   DENY IN     104.210.140.0/24          
[ 7] OpenSSH (v6)               ALLOW IN    Anywhere (v6)             
[ 8] 443 (v6)                   ALLOW IN    Anywhere (v6)             
[ 9] 80 (v6)                    ALLOW IN    Anywhere (v6)             

$ sudo ufw delete 6
Deleting:
 deny from 104.210.140.0/24
Proceed with operation (y|n)? y
Rule deleted
$ sudo ufw insert 3 deny from 104.210.140.0/24 to any
Rule inserted
$ sudo ufw status numbered
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] Anywhere                   DENY IN     156.59.198.0/24           
[ 2] Anywhere                   DENY IN     114.119.147.0/24          
[ 3] Anywhere                   DENY IN     104.210.140.0/24          
[ 4] OpenSSH                    ALLOW IN    Anywhere                  
[ 5] 443                        ALLOW IN    Anywhere                  
[ 6] 80                         ALLOW IN    Anywhere                  
[ 7] OpenSSH (v6)               ALLOW IN    Anywhere (v6)             
[ 8] 443 (v6)                   ALLOW IN    Anywhere (v6)             
[ 9] 80 (v6)                    ALLOW IN    Anywhere (v6)            

 

In a web browser I visited https://allofphysics.com/ and the page loaded immediately. Yay!