“We have an urgent performance issue” (part 2)

As a reaction of the last post I got the question by Oswaldo about specific recommendations on performance. Actually, there are a lot. But that’s material for another blog post 🙂 or skip to the bottom of this post.

Instead I want to give you a recommendation on how to handle situations when you did not have time nor capacity you can spend on thinking about performance and response times. But as an experienced technical leader you know that at some point this question will arise for sure. You might get a few hours to spend on that question, but how do you spend it most efficiently?

Clearly not for performance optimization! Because it’s not enough time to analyze and improve substantial parts of the application. And tomorrows changes might render these improvements useless…

Instead I would recommend you to spend this time in communication and building rapport with people who can help you in case such a performance problem arises. Get in contact with the operations people which are operating your system and application. Understand how they work and what tools they use. Understand how they can help you in case of performance issues, what information they can provide to you. Ask for an account on their monitoring system, just to demonstrate interest in their work and problems. And potentially give them some tips what they can additionally do to improve the quality of the information (for example asking if they can also provide the raw data and not only the visualization based on aggregated data). Or show them some hints how they improve their work with your application.

The biggest value in that activity is the fact, that in case the dreaded performance issue is noticed on an exec level, you already know who to talk to. You know a bit how the others are working and how you can help them. As a tech lead it’s then much easier to ask for logfiles, traffic patterns, CPU usage graphs and I/O latencies, threaddumps etc. You know upfront what information it operations already collects by default. You might have direct access to a monitoring system to get more information. You can even get a warning from the ops people in advance that some real big escalation is imminent. For me this is the best you can get if you have just a few hours to spend.

You might ask why is that important. Because it reduces the TTAD (time to actionable data) dramatically in case of such performance issues. You know who to get on the phone and into calls to start investigation. You already know what information is already available or you can even access it directly. You can report “We are analyzing data and can come up with first suggestions within the day” instead of “we are talking to IT and see how they can support us to get data”.

That’s much more important than spending some hours on random performance tuning. And in case you ever run into performance issues, these hours are one of the best investment you made in the whole project.

(And as random recommendation to improve AEM request rendering times: Disable the MobileRedirectFilter (PID: com.day.cq.wcm.mobile.core.impl.redirect.RedirectFilter) by setting the configuration parameter “redirect.enabled” to “false”. In the age of responsive websites it’s purpose is no longer given. And under load its performance impact can be significant.)

“We have an urgent performance problem!”

“Customer situation is heating up because they have urgent performance problems in their production environment” … That’s something I heard quite often in my consulting career. And in many cases it was an outcry for help, because this problem did not turn out during tests, but most times in production environments.

I read once a nice quote: “Everyone has a performance test environment. But only a few have a production environment!” So true.

Is that really inevitable that performance issues occur? Given the number of cases I’ve seen, I am inclined to believe it. But it is not.

I think, that if all of a sudden a performance problem is put on priority 1, it has a history. If you are in a project team and one morning your project lead/PO tells you that the priorities have shifted, and that you need to push that little unknown item “Performance tuning” from the bottom of the backlog to the top of the list, you were aware of performance as topic. But other features were considered more important.

Or if your customer is starting to escalate with your account team that a really bad application (the one you developed for them) performance is affecting their business, then you often know, that this is not a new issue.

In both cases the priority of this problem just hit a certain level, that executives are getting concerned and start to escalate this topic, because it is hurting their and/or the company’s goals. Here you go, yesterday everything was fine, today it’s all screwed.

All of these problems have a history; performance problems rarely just rise out of nowhere, but they evolve under the radar of ignorance. As long as noone complains, who cares about performance? Features are more important. Until the complaints get loud enough they cannot be overheard anymore.

But when you get at the point where you need to care about performance, you are often in a very bad position. Because now you need information, tools and processes to improve that situation very fast. Because are in the focus everyone is looking to your problem (it’s yours!). “Deliver a fast improvement! Within this week!

But if you have never prepared for that situation, you lack all the necessary things for such an operation:

  • You don’t have KPIs to know what is “acceptable” performance. You just have everyone’s feeling “it’s slow!”
  • You don’t have the tools to measure the current performance. You just have logfiles (hopefully you have them …)
  • Is your system actually able to deliver that performance?
  • “Has anyone got the reports from the latest performance tests?”

That’s a hard situation and there is barely any other way than just to take a bunch of good people and start a surgery in production: Analyzing lot of data, making guesses, increasing logging, deploying hotfixes, and do all the things you already accumulated list on your list of “things which might improve performance” which you maintained over the last months.

But let’s be honest: This is chaos, unplanned, affecting a lot of other teams and people, and hurting careers. But does it really have to be that way?

No. Application performance is no magic, but must be managed. Performance should always be an important aspect in your project, and resources should be spent on it as you spend resources on testing. Otherwise performance is ignored 90% of the time, and in the remaining 10% you are escalating because of the lack of it.

“Concurrent users” and performance tests

A few years ago when I was still working in application management of a large website we often had the case, that the system was reported to be slow. Whenever we looked into the system with our tooling we did not found anything useful, and we weren’t able to explain this slowness. We had logs which confirmed the slowness, but no apparent reason for it. Sadly the definition of performance metrics was just … well, forgotten. (I once saw the performance requirement section in the huge functional specification doc: 3 sentences vs 200 pages of other requirements.)

It was a pretty large system and rumors reported, that it was some of the biggest installations of this application. So we approached the vendor and asked how many parallel users are supported on the system. Their standard answer “30” (btw: that’s still the number on their website, although they have rewritten the software from scratch since then) wasn’t that satisfying, because they didn’t provide any way to actually measure this value on the production system.

The situation improved then a bit, got worse again, improved, … and so on. We had some escalations in the meantime and also ran over months in task force mode to fight this and other performance problems. Until I finally got mad, because we weren’t able to actually measure the how the system was used. Then I started to define the meaning of “concurrent users” for myself: “2 users are considered concurrent users, when for each one a request is logged in the same timeframe of 15 minutes”. I wrote a small perl script, which ran through the web server logs and calculated these numbers for me. As a result I had 4 numbers of concurrent users per hour. By far not exact, but reasonable to an extent, that we had some numbers.

And at that time I also learned, that managers are not interested in the definitions of a term, as long as they think they know what it means. So actually I counted a user as concurrent, when she once logged in and then had a auto-refresh of a page every 10 minutes configured in her web browser. But hey, no one questioned my definition and I think the scripts with that built-in definition are still used today.

But now we were able to actually compare the reported performance problems against these numbers. And we found out, that it was related sometimes (my reports showed that we had 60 concurrent users while the system was reported to be slow), but often not (no performance problems are reported but my reports show 80 concurrent users; and also performance problems with 30 reported users). So, this definition was actually useless… Or maybe the performance isn’t related to the “concurrent users” at all? I alread suspected that, but wasn’t able to dig deeper and improve the definition and the scripts.

(Oh, before you think “count the active sessions on the system, dude!”: That system didn’t have any server side state, therefor no sessions. And the case of the above mentioned auto-reload of a the startpage of a logged in user will result in the same result: She has an active session. So be careful.)

So, whenever you get a definition of “the system should support X concurrent users with a maximum response time of 2 seconds”, question “concurrent”. Define the activities of these users, build according performance tests and validate the performance. Have tools to actually measure the metrics of a system hammered with “X concurrent users” while these performance tests. Apply the same tooling to the production. If the metrics deliver the same values: cool, your tests were good. If not: Something’s wrong: either your tests or the reality…

So as a bottom line: Non-functional requirements such as performance should be defined with the same attention as functional requirements. In any other case you will run into problems. And

Basic performance tuning: Caching

Many CQ installations I’ve seen start with the default configuration of CQ. This is in fact a good decision, because the default configuration can handle small and middle installations very well. And additionally you don’t have to maintain a bunch of configuration files and settings; and finally most CQ hotfixes (which are delivered without the QA) are only tested with default installations.

So when you start with your project and you have a pristine CQ installation, the performance of both publishing and authoring instances are usually very good, the UI is responsive, page load times in the 2-digit miliseconds. Great. Excellent.

When your site grows, when the content authors start their work, you need to do your first performance and stress tests using numbers provided by the requirements (“the site must be able to handle 10000 concurrent requests per second with a maximal response time of 2 seconds”). You either can overcome such requirements by throwing hardware on the problem (“we must use 6 publishers each on a 4-core machine”) or you just try to optimize your site. Okay, let’s try it with optimization first.

Caching is a thing which comes to mind first. You can cache on several layers of the application, be it application level (caches builtin into the application, like the outputcache of CQ 3 and 4), the dispatcher cache (as described here in this blog), or on the users system (using the browser cache). Each cache layer should decrease the number of requests in the remaining caches, so that in the end only the requests get through, which cannot be handled in a cache, but must be processed in CQ. Our goal is to move the files into a cache which is nearest to the enduser; then loading of these files is faster than if the load is performed from a location which is 20 000 kilometers away.

(A system engineer may also be interested in that solution, because it will offload data traffic from the internet connection. Leaves more capacity for other interesting things …)

If you start from scratch with performance tuning, grasping for the low-hanging fruits is the way to go. So you start into an iterative process, which contains of the following steps:

  1. Identify requests which can be handled by a caching layer which is placed nearer to the enduser.
  2. Identify actions, which allows to cache these requests in a cache next to the user.
  3. Perform these actions
  4. Measure the results using appropriate tools
  5. Start over from (1)

(For a more broader view to performance tuning, see David Nueschelers post on the Day developer site)

As an example I will go through this cycle on the authoring system. I start with a random look at the request.log, which may look like this:

09/Oct/2009:09:08:03 +0200 [8] -> GET /libs/wcm/content/welcome.html HTTP/1.1
09/Oct/2009:09:08:06 +0200 [8] <- 200 text/html; charset=utf-8 3016ms
09/Oct/2009:09:08:12 +0200 [9] -> GET / HTTP/1.1
09/Oct/2009:09:08:12 +0200 [9] <- 302 - 29ms
09/Oct/2009:09:08:12 +0200 [10] -> GET /index.html HTTP/1.1
09/Oct/2009:09:08:12 +0200 [10] <- 302 - 2ms
09/Oct/2009:09:08:12 +0200 [11] -> GET /libs/wcm/content/welcome.html HTTP/1.1
09/Oct/2009:09:08:13 +0200 [11] <- 200 text/html; charset=utf-8 826ms
09/Oct/2009:09:08:13 +0200 [12] -> GET /libs/wcm/welcome/resources/welcome.css HTTP/1.1
09/Oct/2009:09:08:13 +0200 [12] <- 200 text/css 4ms
09/Oct/2009:09:08:13 +0200 [13] -> GET /libs/wcm/welcome/resources/ico_siteadmin.png HTTP/1.1
09/Oct/2009:09:08:13 +0200 [14] -> GET /libs/wcm/welcome/resources/ico_misc.png HTTP/1.1
09/Oct/2009:09:08:13 +0200 [15] -> GET /libs/wcm/welcome/resources/ico_useradmin.png HTTP/1.1
09/Oct/2009:09:08:13 +0200 [15] <- 200 image/png 8ms
09/Oct/2009:09:08:13 +0200 [16] -> GET /libs/wcm/welcome/resources/ico_damadmin.png HTTP/1.1
09/Oct/2009:09:08:13 +0200 [16] <- 200 image/png 5ms
09/Oct/2009:09:08:13 +0200 [13] <- 200 image/png 17ms
09/Oct/2009:09:08:13 +0200 [14] <- 200 image/png 17ms
09/Oct/2009:09:08:13 +0200 [17] -> GET /libs/wcm/welcome/resources/welcome_bground.gif HTTP/1.1
09/Oct/2009:09:08:13 +0200 [17] <- 200 image/gif 3ms

Ok, it looks like that some of such requests must not be handled by CQ: the PNG files and the CSS files. These files usually never change (or at least change very seldom, maybe on a deployment or when a hotfix is deployed). But for the usual daily work of an content author they can be assumed to be static, but we must of course provide a way that we enable the authors to fetch a new one, when an update to one them occurs. Ok, that was step 1: We want to cache the PNG and the CSS files which are placed below /libs.

Step 2: How can we cache these files? We don’t want to cache them within CQ (that wouldn’t bring any improvement), so remains dispatcher and browser cache. In this case I recommend to cache them in the browser cache for 2 reasons:

  • These files are requested more than once during a typical authoring session, so it makes sense to cache directly in the browser cache.
  • Latency of the browser cache is ways lower than the latency of any load from the network.

As an additional restriction which speaks against the dispatcher:

  • There are no flusing agents for authoring mode, so we cannot use the dispatcher that easily. So in the case of tuning an authoring instance we cannot use the dispatcher cache.

And to make any changes to these files made on the server visible to the user, we can use the expiration feature of HTTP. This allows us to specify a time-to-live, which basically tells any interested party, how long we consider this file up-to-date. When this time is reached, every party, which cached it, should remove it from cache and refetch.
This isn’t the perfect solution, because a browser will drop the file from its cache and refetch it from time to time, although the file is still valid and up-to-date.
But there’s still an improvement, if the browser fetches this files every hour instead of twice a minute (when a page load occurs).

Our prognose is, that the browser of an authoring user won’t perform that much requests on files anymore; this will increase the rendering performance of the page (the files are fetched from the fast browsercache instead from the server), and additionally the load on the CQ will decrease, because it doesn’t need to handle that much requests. Good for all parties.

Step 3: We implement this feature in the apache webserver, which we have placed in front of our CQ authoring system and add the following statements:

<LocationMatch /libs>
ExpiresByType image/png "access plus 1 hour"
ExpiresByType text/css "access plus 1 hour"
</LocationMatch>

Instead of relying on file extensions we specify here the expiration by the MIME-type in these rules. The files are considered to be up-to-date for an hour, so the browser will reload these files every hour. This value should be ok also in case these files are changed once. And if everything fails, the authoring users can drop their browser cache.

Step 4: We measure the effect of our changes using 2 different strategies: First we observe the request.log again and check if these requests appear further on. If the server is already heavy loaded, we can additionally check for a decreasing load and an improved response times for the remaining requests. As a second option we take a simple use case of an authoring user and run it with Firefox’ Firebug extension enabled. This plugin can visualize how and when the load of the parts of a page happen, and display the response times quite exactly. You should see now, that the number of files requested over the network has decreased and the load of a page and all its emnbedded objects is faster than before.

So with an quick and easy-to-perform action you have decreased the page load times. When I added expiration headers to a number of static images, javascripts and css files on a publishing instance, the number of requests which went over the wire went down to 50%, the pageload times also decreased, so that even during a stress test the site still had a good performance. Of course, dynamic parts must be handled by their respective systems, but if we can offload requests from CQ, we should do this.

So as a conclusion: Some very basic changes to the system (some configuration adjustments to the apache config) may increase the speed of your site (publishing and authoring) dramatically. Such changes as described are not invasive to the system and are highly adjustible to the specific needs and requirements of your application.

Permission sensitive caching

In the last versions of the dispatcher (starting with the 4.0.1 release) Day added a very interesting feature to the dispatcher, which allows one to cache also content on dispatcher level which are not public.
Honwai Wong of the Day support team explained it very well on the TechSummit 2008. I was a bit suprised, but I even found it on slideshare (the first half of the presentation)
Honwai explains the benefits quite well. From my experience you can reduce the load on your CQ publishers (trading a request which requires the rendering of a whole page to to a request, which just checks the ACLs of a page).

If you want to use this feature, you have to make sure that for every group or user, who has to have a individual page, the dispatcher delivers the right one. Imagine you want the present the the logged-in users the latest company news, but not logged-in users shouldn’t get them. And only the managers get the link to the the latest financial data on the startpage. So you need a startpage for 3 different groups (not-logged-in users, logged-in users, managers), and the system should deliver it appropriatly. So having a single home.html isn’t enough, you need to distinguish.

The easiest way (and the Day-way ;-)) is to use a selector denoting the group the user belongs to. So home.group-logged_in.html or home.managers.html would be good. If no selector is given, we assume the user to be an anonymous user. You have to configure the linkchecker to rewrite all links to contain the correct selector. So if a user belongs to the logged_in group and requests the home.logged_in.html page, the dispatcher will ask the CQ ” the user has the following http header lines and is requesting the home.logged_in.html, is it ok?”. CQ then checks if the given http header lines do belong to a user of the group logged_in; because he is, it responses with “200 OK, just go on”. And then the dispatcher will deliver the cached file and there’s no need for the CQ to render the same page again and again. If the users doesn’t belong to that group, CQ will detect that and send a “403 Permission denied”, and the dispatcher forwards this answer then to the user. If a user is member of more than one group, having multiple “group-“selectors is perfectly valid.

Please note: I speak of groups, not of (individual) users. I don’t think that this feature is useful when each user requires a personalized page. The cache-hit ratio is pretty low (especially if you include often-changing content on it, e.g daily news or the content of an RSS feed) and the disk consumption would be huge. If a single page is 20k and you have a version cached for 1000 users, you have a disk usage of 20 MB for a single page! And don’t forget the performance impact of a directory filled up with thousands of files. If you want to personalize pages for users, caching is inappropriate. Of course the usual nasty hacks are applicable, like requesting the user-specific data via an AJAX-call and then modifying the page in the browser using Javascript.

Another note: Currently no documentation is available on the permission sensitive caching. Only the above linked presentation of Honwai Wong.

Creating cachable content using selectors

The major difference between between static object and dynamically created object is that the static ones can be stored in caches; their content they contain does not depend on user or login data, date or other parameters. They look the same on every request. So caching them is a good idea to move the load off the origin system and to accelerate the request-response cycle.

A dynamically created object is influenced by certain parameters (usually username/login, permissions, date/time, but there are countless other) and therefor their content may be different from request to request. These parameters are usually specified as query parameters and must not be cached (see the HTTP 1.1 specification in RFC 2616).

But sometimes it would be great, if we could combine these 2 approaches. For example you want to offer images in 3 resolutions: small (as a preview image e.g in folder view), big (full screen view) and original (the full resolution delivered by the picture-taking device). If you decide to deliver it as static object, it’s cachable. But you need then 3 names (one for each resolution), one for each resolution. Choosing this will blur the fact that these 3 images are the same and differ only in the fact of the image resolution. It creates 3 images instead having only one in 3 instances. Amore practical drawback is that you always have to precompute these 3 pictures and place them on a reachable location. Lazy generation is hard also.

If you choose the dynamic approach, the image would be available as one object for which the instance can be created dynamically. The drawback is here that it cannot be cached.

Day Communique has the feature (the guys of Day ported it also to Apache Sling) to use so-called selectors. They behave like the query parameters one used since the stoneage of the HTTP/HTML era. But they are not query parameters, but merely encoded in the static part of the URL. So the query part of the ULR (as of HTTP 1.1) is no longer needed.

So you can use the URLs /etc/medialib/trafficjam.preview.jpg, /etc/medialib/trafficjam.big.jpg and /etc/medialib/trafficjam.original.jpg to adress the image in the 3 required resolutions. If your dispatcher doesn’t find them in its cache, it will forward the request to your CQ, which can then scale the requested image on demand. Then the dispatcher can store the image and deliver it then from its cache. That’s a very simple and efficient way to make dynamic objects static and offload requests from your application servers.

Joergs rules for loadtests

In the article “Everything is content (part 2)” I discussed the problems of doing proper loadtests with CQ with respect to your CQ which gets (a bit) lower by every loadtest. In a comment Jan Kuźniak proposed to disable versioning and to restore your loadtest environment for every loadtest. Thinking about Jans contribution revealed a number of topics I consider as crucial for loadtests. I collected some of them and would like to share them.

  • Provide a reasonable amount of data in the system. This amount should be kind of equal to your production system, so the numbers are comparable. Being 20% off doesn’t matter, but don’t expect good results if your loadtest runs on 1000 handles but your production system heads directly to 50k handles. You may optimize the wrong parts of your code then.
    When you benchmarked a speedup of 20% in the loadtests but got nothing in production system, you already saw it.
  • When your loadtest environment is ready to run, create a backup of it. Drop the CQ loadtest installation(s) from time to time, restore it from the backup and re-run your loadtest a clean installation to verify your results. The point I already mentioned.
  • Always have the same configuration in the production and loadtest environment. That’s the reason why I disagree to disable versioning on the loadtesting environment. The effect of diverging configuration may be the same as in the above point: You may optimize the wrong parts of your code.
  • No error messages during loadtest. If an error messages indicates a code problem, it’s probably reproducable by re-running the loadtest (come on, reproducable bugs are the easiest ones to fix :-)). If it’s a content problem you should adjust your content. A loadtest is also a very basic regression test, so take the results (errors belong there also!) seriously.
  • Be aware of ressource virtualization! Today’s hype is to run as much applications as possible on virtualized environments (VMWare, KVM, Solaris zones, LPARs, …) to increase the efficency of the hardware usage and lower costs. Doing so very often removes some guarantees you need for comparing results of different loadtests. For exmple on one loadtest you have 4 CPUs for you, while on the second one you have 6 CPUs available. Are the results comparable? Maybe they are, maybe not.
    Being limited to always 4 CPUs offers comparable loadtests, but if your production systems requires 8 CPUs, you cannot load your loadtest system with production level numbers. Getting a decent loadtest environment is a hard job …
  • Have good test scenarios. Of course the most basic requirement. Don’t just grab the access.log and throw it at your load injector. Re-running GET requests is easy, but forget about POSTs. Modelling good scenarios is hard and needs much time.

Of course there are a lot of more things to consider, but I will limit myself to these points at the moment. Eventually there will be a part 2.

META: Being linked from Day

Today this blog was presented as link of the day on dev.day.com. Thanks for the kudos 🙂

Welcome to all who read this blog for the first time. I’ve collected some experiences with Day CQ here and will share my experience and others thoughts furtheron. Don’t hesitate to comment or drop me an email if you have a question to an post.

Please note: I am not a developer, so I cannot help you with specific question to template programming. In that case you can contact one of the groups mentioned on the Day Developer website.

Visualize your requests

In the last year, customers often complained about our bad performance. We had just fixed a small memory leak (which crashed our publishing instances about every hour or so), so we were quite interested in getting reliable data to confirm or deny their anger. That time I thought that we need to have a possibility to get a quick overview of the performance of our CQ instances. One look to see “Ok, it must be the network, our systems perform great!”

So I dug out my perl knowhow and wrote a little script which parses through a request.log and prints out data which which is understood by gnuplot. And gnuplot draws then some nice graphs of it. It displays the number of requests per minute and also the average request duration for these requests.

request-graph-all(Click on the image for a larger version.)

These images proved themselves as pretty useful, because you show them to your manager (“Look, the average response went down from 800 miliseconds to 600 although the number of requests went up by 30%.”) and they help you in daily bussiness, because you can spot problems quite well. When at a certain time the response times go up, you better had a look at the system and find the reason for it.

request-graph-html-ukBecause this scripts is quite fast (it parses 300 megabytes of request.log in about 15 seconds on a fast Opteron-based machine), we usually render these images online and integrate the resulting images in a small web application (no CQ but a small hacked up PHP script). For some more interactivity I added the possibility to display only the requests which matches a certain string  (click on the image to view a larger version). So it’s very easy to answer questions such “Is the performance of my landing page that bad as customer report?”

You can download this little perl script here. Run it with “–help” first and it will display a little help screen. Give a number of request.log files as parameter to it, pipe the output directly into gnuplot (I tested with version 4.0, but will probably also work with newer versions) and it will output a png file. Adjust the scripts to your needs and contribute back, I released it under GPL version 2.

(For the hackers: Some things can probably be performed better and I also have some new functionality already prepared in it, but not active. Patches are welcome :-))

Everything is content (part 2)

Recently I pointed out some differences in the handling of the “everything is content” paradigma of Communique. A few days ago I found a posting of David Nüscheler over at dev.day.com, in which he explained the details of performance tuning.

(His 5 rules do not apply exclusively to Day Communique, but to every performance tuning session).

In Rule 2 he states:

Try to implement an agile validation process in the optimization phase rather than a heavy-weight full blow testing after each iteration. This largely means that the developer implementing the optimization has a quick way to tell if the optimization actually helped reach the goal.

In my experience this isn’t viable in many cases. Of course the developer can check quickly, if his new algorithm performs better ( = is faster) than the older one. But in many cases the developer doesn’t have all the ressources  and infrastructure available and doesn’t have all the content in his test system; which is the central point why I do not trust tests performed on developer systems. So the project team relies on central environments which are built for load testing, which have loadbalancers, access to directories, production-ready sized machines and content which is comparable to the production system. Once the code is deployed, you can do loadtesting. Either using commercial software or just using something like jmeter. If you can use Continious integration and and a autodeployment system, you can do such tests every day.

Ok, where have I started? Right, “everything is content”. So you run your loadtest. You create handles, modify them, activiate and drop them, you just request a page to view, you perform acitivies in your site, and so on. Afterwards you look at your results and hopefully they are better than before. Ok. But …

But Communique is not built to forget data — of course it does sometimes, but that’s not the point here :-), so all these activities are stored. Just take a look at the default.map, the zombie.map, the cmgr.hist file, … So all your recent actions are persisted and CQ knows of them.

Of course handling more of such information doesn’t make CQ faster. In case you have long periods of time between your template updates: Check the performance data directly after a template update and compare them to the ones a few months after (assuming you don’t have CQ instances which aren’t used at all). You will see a  decrease in performance, it can be small and nearly unmeasurable, but it is there. Some actions are slower.

Ok, back to our loadtest. If you run the loadtest again and again and again, a lot of actions are persisted. When you reproduce the code and the settings of the very first loadtest and run the very first loadtest again, and you do that on a system which already faced 100 loadtests, you will see a difference. The result of this 101st loadtest is different from the first one,although the code, the settings and the loadtest are essentially the same. All the same except CQ and its memory (default.map and friends).

So, you need a mechanism which allows you to undo all changes made by such a loadtest. Only then you can perfectly reproduce every loadtest and run them 10 times without any difference in the results. I’ll try to cover such methods (there are some of them, but not all equally suitable) in some of the next posts.

And to get back to the title: Everything is content, even the history. So in contrary to my older posting,where I said:

Older versions of a handle are not content.

They are indeed content, but only when it comes to slowing down the system 🙂