Creating cachable content using selectors

The major difference between between static object and dynamically created object is that the static ones can be stored in caches; their content they contain does not depend on user or login data, date or other parameters. They look the same on every request. So caching them is a good idea to move the load off the origin system and to accelerate the request-response cycle.

A dynamically created object is influenced by certain parameters (usually username/login, permissions, date/time, but there are countless other) and therefor their content may be different from request to request. These parameters are usually specified as query parameters and must not be cached (see the HTTP 1.1 specification in RFC 2616).

But sometimes it would be great, if we could combine these 2 approaches. For example you want to offer images in 3 resolutions: small (as a preview image e.g in folder view), big (full screen view) and original (the full resolution delivered by the picture-taking device). If you decide to deliver it as static object, it’s cachable. But you need then 3 names (one for each resolution), one for each resolution. Choosing this will blur the fact that these 3 images are the same and differ only in the fact of the image resolution. It creates 3 images instead having only one in 3 instances. Amore practical drawback is that you always have to precompute these 3 pictures and place them on a reachable location. Lazy generation is hard also.

If you choose the dynamic approach, the image would be available as one object for which the instance can be created dynamically. The drawback is here that it cannot be cached.

Day Communique has the feature (the guys of Day ported it also to Apache Sling) to use so-called selectors. They behave like the query parameters one used since the stoneage of the HTTP/HTML era. But they are not query parameters, but merely encoded in the static part of the URL. So the query part of the ULR (as of HTTP 1.1) is no longer needed.

So you can use the URLs /etc/medialib/trafficjam.preview.jpg, /etc/medialib/trafficjam.big.jpg and /etc/medialib/trafficjam.original.jpg to adress the image in the 3 required resolutions. If your dispatcher doesn’t find them in its cache, it will forward the request to your CQ, which can then scale the requested image on demand. Then the dispatcher can store the image and deliver it then from its cache. That’s a very simple and efficient way to make dynamic objects static and offload requests from your application servers.

Caching the right way

I sometimes notice that there is some kind of confusion about how content is transferred from a CQ system to the enduser, mostly regarding caches, cache invalidation and content expiration.

We must make a difference between 2 separate mechanisms:

  1. Caching as in “Communique dispatcher cache”. As already described the dispatcher cache gets only invalidated when a replication agent triggers the invalidation. There isn’t a mechanism which invalidates content after a certain amount of time.
  2. Caching as in “make use of the browser cache”. A RFC to the HTTP standard describes several mechanism to specify the timeframe in which objects are valid. Here is a more informal introduction.

So this 2 mechanism doesn’t collide; if you want to distribute your content effectivly you should use both: The dispatcher cache to lower the load on your CQ systems, and the right HTTP headers to move traffic off your systems (and your internet connection) to downstreamd proxies and browser caches.

Some remarks to the right HTTP headers:

  • If you don’t have any HTTP headers for caching, most proxies and browsers guess how long they consider an object as “live” or “valid”. Do not rely on these, control it yourself! Add the headers.
  • CQ doesn’t add any caching header by itself.
  • An very easy way to add HTTP caching headers is to configure your webserver to add them (for Apache: mod_expires is quite easy to use). Then every time your webserver delivers a object through the dispatcher (either by fetching it from CQ or by retrieving from cache) it will add these headers.

Hints on performance (part 2)

For curiosity I often take a look into the the HTTP headers of websites I visit (I use the great Firefox plugin HTTP Live Headers for it). On some major websites I discovered that these don’t make use of HTTP pipelining, which is nowadays a major performance drawback, since today’s website include much more items (images, CSS, Javascripts) than a website of 1998.

Quoting http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html:

For all our tests, a pipelined HTTP/1.1 implementation outperformed HTTP/1.0, even when the HTTP/1.0 implementation used multiple connections in parallel, under all network environments tested. The savings were at least a factor of two, and sometimes as much as a factor of ten, in terms of packets transmitted.

This point isn’t directly related to Day Communique, but can be aplied to all webpages. Take your browser and check if your site makes use of HTTP 1.1 pipelining. How?

Well, that’s pretty easy: Take your browser (I suggest Firefox and the above mentioned plugin Live HTTP Headers), open the plugin and then goto your website. Then check the answers if they contain the line “Connection: closed”; whenenver you see this line, it means that your browser must open a new TCP connection to fetch another file from the server. In the best case you should not find this header at all. If you find such a line, you should really sit down and try to get rid of it.

2 remarks to the dispatcher:

  1. The apache webserver can deliver files from cache or from the dispatcher without breaking the HTTP pipelining. So here it doesn’t matter if a file is taken from the cache or fetched from CQ; if you configured your Apache correctly, you’ll never get a “Connection: closed”.
  2. The dispatcher itself also fetches files using HTTP pipelining by default. You can force it not to do so, but I don’t recommend it. In a version before dispatcher 4.0 this behaviour was broken, but in the most recent versions it works perfectly. And of course: The servlet engine bundled with CQ 3.5.5 and newer supports HTTP pipelining out of the box.

For further reading I recommend Aaron Hopkins’ “Optimizing Page Load Times” and for general performance hints the Best Practices for Speeding Up Your Website by Yahoo.