Visualize your requests

In the last year, customers often complained about our bad performance. We had just fixed a small memory leak (which crashed our publishing instances about every hour or so), so we were quite interested in getting reliable data to confirm or deny their anger. That time I thought that we need to have a possibility to get a quick overview of the performance of our CQ instances. One look to see “Ok, it must be the network, our systems perform great!”

So I dug out my perl knowhow and wrote a little script which parses through a request.log and prints out data which which is understood by gnuplot. And gnuplot draws then some nice graphs of it. It displays the number of requests per minute and also the average request duration for these requests.

request-graph-all(Click on the image for a larger version.)

These images proved themselves as pretty useful, because you show them to your manager (“Look, the average response went down from 800 miliseconds to 600 although the number of requests went up by 30%.”) and they help you in daily bussiness, because you can spot problems quite well. When at a certain time the response times go up, you better had a look at the system and find the reason for it.

request-graph-html-ukBecause this scripts is quite fast (it parses 300 megabytes of request.log in about 15 seconds on a fast Opteron-based machine), we usually render these images online and integrate the resulting images in a small web application (no CQ but a small hacked up PHP script). For some more interactivity I added the possibility to display only the requests which matches a certain string  (click on the image to view a larger version). So it’s very easy to answer questions such “Is the performance of my landing page that bad as customer report?”

You can download this little perl script here. Run it with “–help” first and it will display a little help screen. Give a number of request.log files as parameter to it, pipe the output directly into gnuplot (I tested with version 4.0, but will probably also work with newer versions) and it will output a png file. Adjust the scripts to your needs and contribute back, I released it under GPL version 2.

(For the hackers: Some things can probably be performed better and I also have some new functionality already prepared in it, but not active. Patches are welcome :-))

Everything is content (part 2)

Recently I pointed out some differences in the handling of the “everything is content” paradigma of Communique. A few days ago I found a posting of David Nüscheler over at dev.day.com, in which he explained the details of performance tuning.

(His 5 rules do not apply exclusively to Day Communique, but to every performance tuning session).

In Rule 2 he states:

Try to implement an agile validation process in the optimization phase rather than a heavy-weight full blow testing after each iteration. This largely means that the developer implementing the optimization has a quick way to tell if the optimization actually helped reach the goal.

In my experience this isn’t viable in many cases. Of course the developer can check quickly, if his new algorithm performs better ( = is faster) than the older one. But in many cases the developer doesn’t have all the ressources  and infrastructure available and doesn’t have all the content in his test system; which is the central point why I do not trust tests performed on developer systems. So the project team relies on central environments which are built for load testing, which have loadbalancers, access to directories, production-ready sized machines and content which is comparable to the production system. Once the code is deployed, you can do loadtesting. Either using commercial software or just using something like jmeter. If you can use Continious integration and and a autodeployment system, you can do such tests every day.

Ok, where have I started? Right, “everything is content”. So you run your loadtest. You create handles, modify them, activiate and drop them, you just request a page to view, you perform acitivies in your site, and so on. Afterwards you look at your results and hopefully they are better than before. Ok. But …

But Communique is not built to forget data — of course it does sometimes, but that’s not the point here :-), so all these activities are stored. Just take a look at the default.map, the zombie.map, the cmgr.hist file, … So all your recent actions are persisted and CQ knows of them.

Of course handling more of such information doesn’t make CQ faster. In case you have long periods of time between your template updates: Check the performance data directly after a template update and compare them to the ones a few months after (assuming you don’t have CQ instances which aren’t used at all). You will see a  decrease in performance, it can be small and nearly unmeasurable, but it is there. Some actions are slower.

Ok, back to our loadtest. If you run the loadtest again and again and again, a lot of actions are persisted. When you reproduce the code and the settings of the very first loadtest and run the very first loadtest again, and you do that on a system which already faced 100 loadtests, you will see a difference. The result of this 101st loadtest is different from the first one,although the code, the settings and the loadtest are essentially the same. All the same except CQ and its memory (default.map and friends).

So, you need a mechanism which allows you to undo all changes made by such a loadtest. Only then you can perfectly reproduce every loadtest and run them 10 times without any difference in the results. I’ll try to cover such methods (there are some of them, but not all equally suitable) in some of the next posts.

And to get back to the title: Everything is content, even the history. So in contrary to my older posting,where I said:

Older versions of a handle are not content.

They are indeed content, but only when it comes to slowing down the system 🙂

Caching the right way

I sometimes notice that there is some kind of confusion about how content is transferred from a CQ system to the enduser, mostly regarding caches, cache invalidation and content expiration.

We must make a difference between 2 separate mechanisms:

  1. Caching as in “Communique dispatcher cache”. As already described the dispatcher cache gets only invalidated when a replication agent triggers the invalidation. There isn’t a mechanism which invalidates content after a certain amount of time.
  2. Caching as in “make use of the browser cache”. A RFC to the HTTP standard describes several mechanism to specify the timeframe in which objects are valid. Here is a more informal introduction.

So this 2 mechanism doesn’t collide; if you want to distribute your content effectivly you should use both: The dispatcher cache to lower the load on your CQ systems, and the right HTTP headers to move traffic off your systems (and your internet connection) to downstreamd proxies and browser caches.

Some remarks to the right HTTP headers:

  • If you don’t have any HTTP headers for caching, most proxies and browsers guess how long they consider an object as “live” or “valid”. Do not rely on these, control it yourself! Add the headers.
  • CQ doesn’t add any caching header by itself.
  • An very easy way to add HTTP caching headers is to configure your webserver to add them (for Apache: mod_expires is quite easy to use). Then every time your webserver delivers a object through the dispatcher (either by fetching it from CQ or by retrieving from cache) it will add these headers.

Hints on performance (part 2)

For curiosity I often take a look into the the HTTP headers of websites I visit (I use the great Firefox plugin HTTP Live Headers for it). On some major websites I discovered that these don’t make use of HTTP pipelining, which is nowadays a major performance drawback, since today’s website include much more items (images, CSS, Javascripts) than a website of 1998.

Quoting http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html:

For all our tests, a pipelined HTTP/1.1 implementation outperformed HTTP/1.0, even when the HTTP/1.0 implementation used multiple connections in parallel, under all network environments tested. The savings were at least a factor of two, and sometimes as much as a factor of ten, in terms of packets transmitted.

This point isn’t directly related to Day Communique, but can be aplied to all webpages. Take your browser and check if your site makes use of HTTP 1.1 pipelining. How?

Well, that’s pretty easy: Take your browser (I suggest Firefox and the above mentioned plugin Live HTTP Headers), open the plugin and then goto your website. Then check the answers if they contain the line “Connection: closed”; whenenver you see this line, it means that your browser must open a new TCP connection to fetch another file from the server. In the best case you should not find this header at all. If you find such a line, you should really sit down and try to get rid of it.

2 remarks to the dispatcher:

  1. The apache webserver can deliver files from cache or from the dispatcher without breaking the HTTP pipelining. So here it doesn’t matter if a file is taken from the cache or fetched from CQ; if you configured your Apache correctly, you’ll never get a “Connection: closed”.
  2. The dispatcher itself also fetches files using HTTP pipelining by default. You can force it not to do so, but I don’t recommend it. In a version before dispatcher 4.0 this behaviour was broken, but in the most recent versions it works perfectly. And of course: The servlet engine bundled with CQ 3.5.5 and newer supports HTTP pipelining out of the box.

For further reading I recommend Aaron Hopkins’ “Optimizing Page Load Times” and for general performance hints the Best Practices for Speeding Up Your Website by Yahoo.

Hints on performance

A lot of things which affect performance can be changed within CQ (without major configuration or even coding just by adjusting the things you want to get displayed or used.

If you’re looking for some performance on your authoring systems, you may consider to disable some information, which is permanently shown to the authors. One of these pieces of information are the number of items in the inbox. This will create every once in a while a background query which slows down your system. It’s essentially the same query which is executed when you open the “Inbox” Tab.

Change user settings
Change user settings

If you’re not using this feature very often, you should disable it for your authors (or at least for the ones who don’t use it).

Open the user settings of the author and un-ceck “Inbox”. If you don’t use notifications, uncheck them too. This will leave you only the link to the impersonation feature which isn’t expensive in terms of rendering perfomancee, but often very useful.usersettings11