Preflight request for enhanced vary support

#1
Hello folks,

I'm one of the core developers of Contao (contao.org), an Open Source CMS and I got pretty interested in caching solutions over the last year.
I have to admit that I always found all these special plugins for Wordpress, Magento and lists of rules for certain pages in .htaccess etc. pretty odd because the HTTP specification actually contains everything needed to create reverse proxy functionality so if your application sends the correct headers etc. you can place any proxy in front of it and you don't need any special plugin or complicated rules. I know that a lot of applications unfortunately don't send these headers but if they did, all you needed to do is to just enable caching and let LiteSpeed read the response headers and cache them accordingly:

<IfModule LiteSpeed>
CacheEnable public /
</IfModule>

Am I right that this would check all responses and consider their cache control headers (expire, max age, shared max age, private etc.)?

At Contao we're working on top of Symfony and for caching purposes we use a combination of bundles but the FOSCache library and bundle (https://github.com/FriendsOfSymfony/FOSHttpCache) being the building block of it. We try hard to really send the correct HTTP response headers and started to move to ESI requests for partials with different caching requirements. This all works pretty nice with using Symfony's built in reverse proxy but obviously this one is pretty basic and slow. The FOSCache library provides support for Varnish but I really like LiteSpeed and thus I was looking into if I can bring support for it to that library as well. I don't really want to bother you with PHP and specific Symfony bundles but I'll link to some of their documentation because the problem is described a few times there already so please bear with me.
When reading through the wiki I found that I can achieve almost everything which is pretty cool :) There's one issue though I seem to be unable to solve and I guess it's a pretty common one: Varying a cache entry on data stored in the session so we can cache pages of logged in users (e.g. for the same user groups). I guess we all agree that varying on "Cookie" is not the best idea because it would basically render caching useless as every user has its own session and thus have their own cache entries instead of sharing them with other users of the same user group.
I noticed you have a solution to vary on a given Cookie key which is cool but this again would force me update my .htaccess on a regular basis and know exactly what cookies might possibly be relevant. For me as a developer that might even work but it is impossible for regular CMS users. Especially when they start installing lots of bundles/extensions/plugins.
So I tried to find out another solution and I ended up with a "preflight request" as I call it. It's similar to nginx' ngx_http_auth_request_module in case you're familiar with that. Just for splitting up the "Cookie" header into multiple headers instead of checking for authorization. So whenever a request with a "Cookie" or "Authorization" header comes in, before checking if an entry exists in the cache, a preflight request to the application is executed, giving it the possibility to sort of split up the "Cookie" header into other headers that then are "replayed" on the original request which is only then checked for in the cache. This allows to "Vary" on any header you like.
See my illustration and the description on https://github.com/terminal42/header-replay-bundle.
There's even a way to ignore the cache completely in such a case which is important for features such as previewing a page (= same URL so it would load from the cache but if you're logged in you should maybe see staged changes).

Coming to my question now. In Symfony's Reverse Proxy I can code whatever I like because it's PHP so we have support for such a preflight request (that's actually what my bundle with the illustration does). In Varnish there's VCL so you can sort of "code" it yourself and it works as well (you can read more here http://foshttpcache.readthedocs.io/en/stable/user-context.html).
In LiteSpeed, however, I couldn't find any equivalent feature.
Would you be interested in working on that concept as it is a very generic one and absolutely not vendor specific, creating a solution for any developer with that problem?

Thanks in advance, Yanick
 

NiteWave

Administrator
#2
here's lscache api document:
https://www.litespeedtech.com/support/wiki/doku.php/litespeed_wiki:cache:developer_guide
not sure if Cache Vary and/or esi can implement the "preflight request" feature or close to the feature ?

I mainly from an end user's standpoint. can you give an example which is easy to understand(for an end user) which nearly not possible in current lscache way but ok or easy with "preflight request" method ?

this is just a quick response to try to make it easier to understand for me and other average readers. although maybe it's very easy for a cache developer to catch the idea.
 
#3
Hi NiteWave,

Thank you for the reply, I'm happy to see end users being interested in the idea too. I do think that a cache developer would understand my issue but I'm happy to find another example. As Albert Einstein said "if you cannot explain it simply, you don't understand it well enough" :)

So here's the situation: You have a website with a lot of information - say - about cancer treatment for different target groups. You identify yourself as being member of a target group by logging in. So let's say we have two user groups, the patients and the doctors. They both want to know about some medication called "Super drug". So they both call www.domain.com/super-drug.html but as we're good website developers we add target group specific information to that page. So if you're logged in as a patient there are hints that you should call your doctor, info about health insurance etc. If you're logged in as a doctor, you get more in-depth information on how this medication works and all the information written in latin no ordinary mortal would understand :)

Now the thing is that there are thousands of patients and thousands of doctors. So caching that information makes a lot of sense. A cache entry is identified by the URL so we need a way to tell the cache that even though both ask for www.domain.com/super-drug-html, they should get different cache entries. That's what the "Vary" header does. It essentially extends the unique key in the cache from only the URL to URL plus all the headers "Vary" contains. So the response of our application should look like this:

For doctors:

200 OK
Content-Type: text/html
User-Group: doctors
Vary: User-Group

For patients:

200 OK
Content-Type: text/html
User-Group: patients
Vary: User-Group

You can invent any custom header (don't be confused if you're used to X- prefixes for custom headers, that has been deprecated, you don't need that :)).
So that's our ultimate goal, we need to have a Vary header and the User-Group header to it to have proper caching.
Now the problem is this: when you as a client - no matter if doctor or patient - visit the page, your browser doesn't know what you are nor does it know that your page requires some User-Group header. How should it? So when the request to www.domain.com/super-drug.html is being processed by LiteSpeed, it would actually fail to find something in the cache because there are two entries for User-Group = patients and User-Group = doctors but none for a request with no User-Group header.
The information to what group a user belongs, is stored in our session and the session associated with the client/user is a some key sent via the Cookie header. So for PHP this would look something like this:

GET /super-drug.html
Host: www.domain.com
Cookie: PHPSESSID=veryrandomkeyformysession

Based on that key my session is kept over multiple requests. So how can we achieve different cache entries for users when the browser does not send our User-Group header? Well, we could Vary on "Set-Cookie" in the first place (when sending the response). Like so:

200 OK
Content-Type: text/html
Set-Cookie: PHPSESSID=veryrandomkeyformysession
Vary: Set-Cookie

However, as you can see, none of two users would share the same Set-Cookie header because the session id is private and so we would end up filling up our LiteSpeed cache with essentially nonsense cache entries. So what do we do?
That's where the preflight concept (note: that's something I came up with, it's not something you could find in some specification. I just call it "preflight request" because in CORS a request before the real request is called a preflight request :)) comes into play. Someone needs to analyze the Cookie header or in our case the PHP session and tell the reverse proxy what User-Group this request belongs to. So instead of checking the cache directly, LiteSpeed would send a preflight request to the app (with e.g. some special Accept header). So we have this:

Original request from client to LiteSpeed:

GET /super-drug.html
Host: www.domain.com
Cookie: PHPSESSID=veryrandomkeyformysession
Accept: text/html

LiteSpeed recognizes a Cookie header, so preflight request to the real application:

GET /super-drug.html
Host: www.domain.com
Cookie: PHPSESSID=veryrandomkeyformysession
Accept: application/vnd.litespeed.preflight-request (should be configurable)

The real application notices "Accept: application/vnd.litespeed.preflight-request" and thus does not do the whole work but extracts whatever it wants from the session and replies with a response containing the headers:

200 OK
Content-Type: application/vnd.litespeed.preflight-request
X-LiteSpeed-Preflight-Headers: User-Group (could contain more than just one)
User-Group: patients

LiteSpeed now checks the "X-Lite-Speed-Preflight-Headers" header and maps all of them to the original request. So our original request becomes this now:

GET /super-drug.html
Host: www.domain.com
Cookie: PHPSESSID=veryrandomkeyformysession
User-Group: patients
Accept: text/html

And then it continues normally which means, it will find a cache entry for the patients user group and return that to you.

Was that clear enough? :)
 

NiteWave

Administrator
#4
Thanks much for the detailed explanation. much clear for me now.
so all clients much log in first ? so the backend php know which group the client belong to ?
then different cache to serve different groups(or log in users) ?
 
#5
Yes, I think you understood the issue. Most applications ignore the cache as soon as a user is logged in but there are many use cases where caching still makes sense and in most of those cases, the application is needed to give the relevant information to LiteSpeed. That's what this preflight concept would solve in a generic way. I couldn't find anything related to that in the docs and thus I'd love to hear from the LiteSpeed devs if anyone is interested in pursuing this :)
 

Lauren

LiteSpeed Staff
Staff member
#6
Toflar,
You can achieve all these in current LSCache. LSCache is using its own cache control header and cache vary header.
So if you can build a custom module for LSCache, and your CMS user can enable it when they use LSWS, that will be great.
I understand your issue, it's same as backend detection for mobile view or geolocation. As the htaccess detection may not match what backend say. so the best way is to let first request hit backend to have proper environment vary set.
If you are willing to build this module, we can help you to make it work. I guess for you, probably just one or two weeks work. You PM me your email that I can add to our slack team for further discussion.
Lauren
 
Top