Pages

Monday 6 October 2008

Invalidating Page Caches in Mason

We have a number of websites that use Mason, a (relatively) old Perl library reportedly used by Amazon and Salon among other sites. It has built-in and flexible support for caching at pretty much any level the developer needs, but often the easiest thing to do is to cache the output of an entire component, which looks like this:

<%init>
return if $m->cache_self(key => $key,
expires_in => '3 hours' );
[...]
<%/init>


What I wanted to do was invalidate the cache when the users browser had a Cache-Control or Pragma directive that indicated they did not want cached content (for example, when the user holds down the Shift key and clicks "Refresh" or "Reload").

The naive implementation does not work:

<%init>
# Somewhere, $cache_ok is set to 0 if Cache-Control
# indicates no caching wanted
if ( $cache_ok ) {
return if $m->cache_self(key => $key,
expires_in => '3 hours' );
}
[...]
</%init>



I mean, it does work for the initial request but all subsequence requests will continue to get the old cached content because the cache has not been invalidated, only skipped. The very next request for that component will get the old, cached content -- not a cached copy of the freshly calculated content.

Slightly less naive implementations didn't work either because of the way cache_self works. I was basically trying variations of "expires yesterday" but this parameter was not being consulted when Mason considered whether or not to return the cached content.

Cutting a long story short (well, not that long but kinda boring) here's how I got the page cache to be invalidated ("expired") when the user hits Shift-Refresh:

    if (  $cache_ok ) {
$m->cache_self(key => $page_cache_key, expire_if => sub { 1 }  );
} else  {
return if $m->cache_self(key =>  $page_cache_key, 
expires_in => '4 hours'  );
}
[... continue with component...]



The variable $cache_ok defaults to "1" but is set to "0" if no-cache is found in the Cache-Control request header (for completeness, also check the Pragma directive althought that might indicate the presence of a browser so old it wears flares).

What this code does is returns the cached content (provided it's younger than 4 hours) most of the time. But if the requesting client has indicated it does not want cached content then it invalidates Mason's copy of the cached output and then falls through to the bottom of the if block where normal. non-cached processing occurs.

Post-script: Since publishing this post I've changed and reformatted the source code slightly.

No comments: