This is an old revision of the document!
By default , LSCWP built-in crawler will add an URI into blacklist if following conditions are met:
1. the page is not cache by design or default , in other word, any pages that sends response header `x-litespeed-cache-control: no-cache` will be added into blacklist after initial crawling.
2. If the page is not responding the following headers:
HTTP/1.1 200 OK HTTP/1.1 201 Created HTTP/2 200 HTTP/2 201
One real debug case:
Problem:
a user reports some pages are always being added into blacklist after first crawling, but manually use curl or Chrome browser , it always shows x-litespeed-cache header and 200 OK status code, but there are always dozens of URIs being added into blacklist when doing crawl.
Analyze:
So as mentioned above , we know the condition why it is blacklist , so we just need to figure what happened to trigger crawler to add it into blacklist.
Investigation:
Upon the checking debug log , but apparently it didn't log the response header, so we will need a little modification.
So we add a line to log more by inserting following code into file `litespeed-cache/lib/litespeed/litespeed-crawler.class.php` at line 273
LiteSpeed_Cache_Log::debug( 'crawler logs headers', $headers ) ;
This way , we will get the `$headers` when crawler deals it.