Custom Cache Crawler with Curl

If you use a custom integration with LSCache where there's no crawler available, curl can often be an easy and very flexible method to warm up your cache.

Simple curl commands, executed either via bash (or similar), or programming languages, allow you to make your own crawler that fits your unique needs.

We recently came by a case where a client was trying to crawl a page with a specific cookie, the rewrite roles would look something like this

RewriteCond %{HTTP_USER_AGENT} "iPhone|iPod" [NC]
RewriteCond %{HTTP_COOKIE} cookie_test [NC]
RewriteRule .* - [E=Cache-Control:vary=is_mobile]

The client wanted to check simply if the user agent contained “iPhone” or “iPod” and if a specific cookie (in this case cookie_test) was set.

The RewriteCond simply checks for whether cookie_test was present or not.

However, when warming up with curl it's important to follow the RFC6265 Section 5.2.2.

One would think that it's simple to satisfy the RewriteCond by doing --cookie cookie_test in the curl command, however this is against RFC6265, so curl will simply ignore the cookie and not send it at all.

To send an empty cookie with curl, it's important to add the equal sign (=), like so:

curl --cookie cookie_test= -A "iPod" https://example.com/

Without the =, RewriteCond %{HTTP_COOKIE} cookie_test [NC] will get the result -1 (false) and thus not set the vary to is_mobile.

  • Admin
  • Last modified: 2019/10/25 19:10
  • by Lisa Clarke