Hints on performance (part 2)

For curiosity I often take a look into the the HTTP headers of websites I visit (I use the great Firefox plugin HTTP Live Headers for it). On some major websites I discovered that these don’t make use of HTTP pipelining, which is nowadays a major performance drawback, since today’s website include much more items (images, CSS, Javascripts) than a website of 1998.

Quoting http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html:

For all our tests, a pipelined HTTP/1.1 implementation outperformed HTTP/1.0, even when the HTTP/1.0 implementation used multiple connections in parallel, under all network environments tested. The savings were at least a factor of two, and sometimes as much as a factor of ten, in terms of packets transmitted.

This point isn’t directly related to Day Communique, but can be aplied to all webpages. Take your browser and check if your site makes use of HTTP 1.1 pipelining. How?

Well, that’s pretty easy: Take your browser (I suggest Firefox and the above mentioned plugin Live HTTP Headers), open the plugin and then goto your website. Then check the answers if they contain the line “Connection: closed”; whenenver you see this line, it means that your browser must open a new TCP connection to fetch another file from the server. In the best case you should not find this header at all. If you find such a line, you should really sit down and try to get rid of it.

2 remarks to the dispatcher:

The apache webserver can deliver files from cache or from the dispatcher without breaking the HTTP pipelining. So here it doesn’t matter if a file is taken from the cache or fetched from CQ; if you configured your Apache correctly, you’ll never get a “Connection: closed”.
The dispatcher itself also fetches files using HTTP pipelining by default. You can force it not to do so, but I don’t recommend it. In a version before dispatcher 4.0 this behaviour was broken, but in the most recent versions it works perfectly. And of course: The servlet engine bundled with CQ 3.5.5 and newer supports HTTP pipelining out of the box.

For further reading I recommend Aaron Hopkins’ “Optimizing Page Load Times” and for general performance hints the Best Practices for Speeding Up Your Website by Yahoo.

The output-cache

Day Communique offers another level of caching for content: the so-called output-cache. It caches already rendered pages and stores them in the CQ instance itself. The big advantage over the already mentioned dispatcher-cache is that the output-cache knows about the dependencies between handles because it’s part of CQ and can access these informations.

You can combine both dispatcher-cache and the output-cache. If no invalidation happens, the dispatcher-cache can answer most requests for static content and doesn’t bother the CQ with them. An invalidation may invalidate a larger part of the dispatcher-cache, but a lot of the requests, which are then forwarded to CQ, can be answered by the output-cache. So it sounds quite good to use both dispatcher-cache and output-cache to cache all content. In some cases it is.

But the output-cache has some drawbacks:

All dependencies are kept in memory (in the java heap). If you have a huge amount of content, the dependency tracking can eat up your memory. There is a hotfix available for CQ 4.2, which limits the cache-size, but it has some problems.
The rendered pages are kept on disk. So beside the content itself you need disk space for the rendered output.
Probably the most important point: The output-cache makes CQ to use only one CPU! Obviously there is a mutex within the output-cache code, which serializes all rendering requests when the results should be cached. I cannot say that for sure, but this is what I observed in CQ instances using the output-cache. After we disabled the output-cache — which means adding a “deny all” rule to its configuration — this phenomenon was gone and CQ used all available CPUs.

Of course you don’t need to configure the output-cache to cache all rendered pages. You may want to cache only the pages which are often requested and also often invalidated on the dispatcher-cache but weren’t changed at all.

Dispatcher caching and content structure

The dispatcher behaviour as described before has one major drawback which may have an implication on the design of your content structure.

Invalidation is performed on a whole directory and all its subtrees!

So if you invalidate a file A in a directory, the file B in the same directory is also invalidated. Also file C in the subdirectory D is invalidated. All these invalidated files need to be refetched from your CQ instance again, until the caching will work again. This happens, although you invalidated only file A. Files B and C aren’t changed at all, but they are refechted! So a single invalidation can create a lot of load on your CQ instance(s).

But this may be a side-effect, which you want to have. Because when the change of handle A may also affect handle B and C (maybe your changed the title of handle A which is required by handle B and C to render a correct navigation). This is the easiest way to invalidate all dependent handles when you change a single one.

So the point is to find a balance between permanently flushing your whole cache when you activate a single handle and taking care of your dependencies and get files marked invalid when an update should happen.

And here comes the already mentioned parameter \statfileslevel. If we assume that you have a well-designed content structure you set this parameter as high as possible to minimize the number of files which are invalidated if a single file is invalidated. On the other hand you should set it to a level so that all files are invalidated which depend on the invalidated file.

Knowing this you should arrange your content as follows:

Group your content and use hierarchies. This allows you increase your statfileslevel.
Minimize the number of dependencies between your handles so you can keep the statfileslevel high. Keep dependent handles in the same directory and try to avoid dependencies to handles in your parent directory (or even higher), so you don’t need to decrease the statfileslevel.

Note: The parameter \statfileslevel has a global scope.

Dispatcher cache content delivery and invalidation

In contrary to a CQ instance which can handle and resolve specific dependencies between handles, the CQ dispatcher cache works quite simple. It uses so-called “stat-files” to indicate wheter cached files are still valid or need to refetched from the configured CQ.

The basic principle is that the last modification of a stat-file is compared to the last modification of content file. If the stat-file has been modified after the content file, the content file is stale and needs to be refetched. This refetching includes an update of the last-modifcation date to the current date. Otherwise the cached file can be delivered as up-to-date content.

In detail

The cache mirrors the content structure of the content in CQ. Using the statement “\statfileslevel” in the dispatcher.any file you can specify the depth for which these stat-files are created. If the dispatcher receives an invalidation request, it calculates the directory in which the files are which should be invalidated. If the directory in which this file is doesn’t contain a stat-file the dispatcher goes up in the directory structure until a stat-file is found. Then this stat-file is touched setting the last modification time to now. (If there are also stat-files in directories below the one of the file which is invalidated, they are also invalidated.)

If a requested file is contained in the cache, the dispatcher looks for a stat-file. If there is no stat-file in the directory of the to-be-delivered file, the dispatcher checks recursively the higher directories for a stat-file until it finds one. Then the last-modification times are compared and acted accordingly (as described above).

For correctness: There isn’t any recursionin the code anymore, since with deep nested content structures it produced Stack-Overflows killing the delivering webserver process. If you encounter this problem update to latest available dispatcher, currently version 4.0.2.

Update (2009-01-26): I just fell over the presentation by Dominique Jaeggi he gave on Tech Summit 2007. He talked over performance tuning and covers the proper use of the dispatcher in depth and explains the proper configuration of the invalidation

Why use the dispatcher?

The dispatcher is a major part of every CQ installation. It allows one to cache data in front of the application in the webserver. Unlike an application server a webserver is designed to deliver files at a high speed. They are blazing fast when delivering static files. An application server (either the servlet engine provided by Day or any other one like Tomcat, jboss or Websphere and Bea Weblogic) is designed to be extensible and to run custom code. Application servers provide an infrastructure to run custom code which operates on request data and delivers a customized request. They does a lot more, but for the sake of simplicity I stop here.
Application servers don’t achieve the performance of a webserver when delivering files, it’s not their primary job and they aren’t optimized to deliver files from disk to the net as fast as possible.

As a content management system often produces static content elements, which do not change very often or not at all. Such content elements (may them be HTML pages, images or other media elements) should be placed on a webserver to profit from its ability to deliver files really fast. But it also should be possible to have control of such files from the content management system.

Here comes the CQ Dispatcher in the play. The dispatcher acts as a plugin to the webserver and allows one to have administrative control over files which are placed on the webserver. The dispatcher fetches on demand files from the CQ instance and invalidates them on a specific signal.
So you can get the best of both worlds: The flexibility of a content management system with control over all parts and the performance of static pages delivered by a webserver.