Thoughts on performance testing on AEM CS

Performance is an interesting topic on its own, and I already wrote a bit about it in this blog (see the overview). I have not written yet about performance testing in the context of AEM CS. It’s not that it is fundamentally different, but there are some specifics, which you should be aware of.

Perform your performance tests on the Stage environment. The stage environment is kept at the same sizing as the production environment, so it should deliver the same behavior. and your PROD environment, if you have the same content and your test is realistic.
Use a warmup phase. As the Stage environment is normally downscaled to the minimum (because there is no regular traffic), it can take a bit of time until it has upscaled (automatically) to the same number of instances as your PROD is normally operating with. That means that your test should have a proper warmup period, during you which increase the traffic to the normal 100% level of production. This warmup phase should take at least 30 minutes.
I think that any test should take at least 60-90 minutes (including warmup); even if you see early that the result is not what you expect to be, there is often something to learn even from such incorrect/faulty situations. I had the case that a customer was constantly terminating the after about 20-25 minutes, claiming that something was not working server-side as they expected it to be. Unfortunately the situation has not yet settled, so I was not able to get any useful information from the system.
AEM CS comes with a CDN bundled to the environment, and that’s the case also for the Stage environment. But that also means that your performance test should contain all requests, which you expect to be delivered from the CDN. This is important because it can show if your caching is working as intended. Also only then you can assess the impact of the cache misses (when files expire on the CDN) on the overall performance.
While you are at it, you can run a stage pipeline during the performance test and deploy new code. You should not see any significant change in performance during that time.
Oh yes, also do some content activations in that time. That makes your test much more realistic and also reveal potential performance problems when updating content (e.g. because you constantly invalidate the complete dispatcher cache).
You should focus on a large content set when you do the performance test. If you only test a handful of pages/assets/files, you are mostly testing caches (at all levels).
“Campaign-traffic” is rarely tested. This is traffic, which has some query strings attached (e.g. “utm_source”, “gclid” and such) to support traffic attribution. These parameters are ignored while rendering, but they often bypass all caching layers, hitting AEM. And while a regular performance test only tests without these paramters, if you marketing department runs a facebook campaign, the traffic from that campaign looks much different, and then the results of your performance tests are not valid anymore.

Some words as precaution:

A performance test can look like a DOS, and your requests can get blocked for that reason. This can happen especially if these requests are originating from a single source IP. For that reason you should distribute your load injector and use multiple source IP addresses. In case you still get blocked, please contact support so we can adapt accordingly.
AEM CS uses an affinity cookie to indicate that requests of a user-agent are handled by a specific backend system. If you use the same affinity cookie throughout all your performance tests, you just test a single backend system; and that effectively disables any loadbalancing and renders the performance test results unusable. Make sure that you design your performance tests with that in mind.

I general I prefer it much if I can help you during the performance phase, than to handle escalations for of bad performance and potential outages because of it. I hope that you think the same way.

Why you should do performance tests

With this posting I want to start a series of small articles which deal with performance tests and performance tests in projects with CQ5. It can be applied to many other web projects.

This topic is one of my all-time favorites in a project plan. According to my experience with CQ projects in the last yearsperformance tests are usually planned to be executed 3 weeks before the golive, with some person days of preparation planned. Noone knows, what should be tested to what extent, and also noone knows what the results should be. And performance tests are also one the first activities removed from the project plan if time is getting short.

But of course most members (mostly the technical ones) of the project team know, that dropping the performance tests will cause heavy problems later in production. But the “told you so” afterwards isn’t really helpful …

Some reasons, why you should do a performance test:

During development the focus is on coding and functionality. Functional tests are executed by developers or small test teams. In a production use case the number of users is usually much higher. Just working on the assumption, that the system works properly, is highly risky.
When you have done a performance test, you should know some conditions, under which the system performs well or fails. In both cases you learn about your system and you can either address deficits early or reduce the risk of performance assumptions.
Operation will have an easier job, if you can tell them about the capabilities or limits of the system. Then they can act accordingly in case of problems and don’t need to waste effort to determine root causes and issues of your shiny new application. (In the end they cannot do much about it, especially on a production system. They need to bring it back online as soon as possible, they won’t debug it.)
Performance deficits sometimes go hand in hand with stability issues. Consider a request, which requires some memory to execute. If you have 5 of such requests in parallel, it is not a problem, but in case you have 100 concurrently of them (due to the slow processing of them), the stability of your JVM might be impacted.
The business aspect: A bad performance can slow down your business performance and have a huge impact on your KPIs like successful conversions. If you know, that certain parts of your site are slow, you can address these deficits early in the project, long before they impact the KPIs.

So from a risk-perspective it is very helpful to conduct performance tests. If the discussion is about cutting a feature or a performance test, strongly recommend the feature (unless it’s one if the features of the minimum-viable product). From an overall stress-level it is usually much easier to add a feature after golive than to have a release, which has performance and/or stability problems.

Joergs rules for loadtests

In the article “Everything is content (part 2)” I discussed the problems of doing proper loadtests with CQ with respect to your CQ which gets (a bit) lower by every loadtest. In a comment Jan Kuźniak proposed to disable versioning and to restore your loadtest environment for every loadtest. Thinking about Jans contribution revealed a number of topics I consider as crucial for loadtests. I collected some of them and would like to share them.

Provide a reasonable amount of data in the system. This amount should be kind of equal to your production system, so the numbers are comparable. Being 20% off doesn’t matter, but don’t expect good results if your loadtest runs on 1000 handles but your production system heads directly to 50k handles. You may optimize the wrong parts of your code then.
When you benchmarked a speedup of 20% in the loadtests but got nothing in production system, you already saw it.
When your loadtest environment is ready to run, create a backup of it. Drop the CQ loadtest installation(s) from time to time, restore it from the backup and re-run your loadtest a clean installation to verify your results. The point I already mentioned.
Always have the same configuration in the production and loadtest environment. That’s the reason why I disagree to disable versioning on the loadtesting environment. The effect of diverging configuration may be the same as in the above point: You may optimize the wrong parts of your code.
No error messages during loadtest. If an error messages indicates a code problem, it’s probably reproducable by re-running the loadtest (come on, reproducable bugs are the easiest ones to fix :-)). If it’s a content problem you should adjust your content. A loadtest is also a very basic regression test, so take the results (errors belong there also!) seriously.
Be aware of ressource virtualization! Today’s hype is to run as much applications as possible on virtualized environments (VMWare, KVM, Solaris zones, LPARs, …) to increase the efficency of the hardware usage and lower costs. Doing so very often removes some guarantees you need for comparing results of different loadtests. For exmple on one loadtest you have 4 CPUs for you, while on the second one you have 6 CPUs available. Are the results comparable? Maybe they are, maybe not.
Being limited to always 4 CPUs offers comparable loadtests, but if your production systems requires 8 CPUs, you cannot load your loadtest system with production level numbers. Getting a decent loadtest environment is a hard job …
Have good test scenarios. Of course the most basic requirement. Don’t just grab the access.log and throw it at your load injector. Re-running GET requests is easy, but forget about POSTs. Modelling good scenarios is hard and needs much time.

Of course there are a lot of more things to consider, but I will limit myself to these points at the moment. Eventually there will be a part 2.

Everything is content (part 2)

Recently I pointed out some differences in the handling of the “everything is content” paradigma of Communique. A few days ago I found a posting of David Nüscheler over at dev.day.com, in which he explained the details of performance tuning.

(His 5 rules do not apply exclusively to Day Communique, but to every performance tuning session).

In Rule 2 he states:

Try to implement an agile validation process in the optimization phase rather than a heavy-weight full blow testing after each iteration. This largely means that the developer implementing the optimization has a quick way to tell if the optimization actually helped reach the goal.

In my experience this isn’t viable in many cases. Of course the developer can check quickly, if his new algorithm performs better ( = is faster) than the older one. But in many cases the developer doesn’t have all the ressources and infrastructure available and doesn’t have all the content in his test system; which is the central point why I do not trust tests performed on developer systems. So the project team relies on central environments which are built for load testing, which have loadbalancers, access to directories, production-ready sized machines and content which is comparable to the production system. Once the code is deployed, you can do loadtesting. Either using commercial software or just using something like jmeter. If you can use Continious integration and and a autodeployment system, you can do such tests every day.

Ok, where have I started? Right, “everything is content”. So you run your loadtest. You create handles, modify them, activiate and drop them, you just request a page to view, you perform acitivies in your site, and so on. Afterwards you look at your results and hopefully they are better than before. Ok. But …

But Communique is not built to forget data — of course it does sometimes, but that’s not the point here :-), so all these activities are stored. Just take a look at the default.map, the zombie.map, the cmgr.hist file, … So all your recent actions are persisted and CQ knows of them.

Of course handling more of such information doesn’t make CQ faster. In case you have long periods of time between your template updates: Check the performance data directly after a template update and compare them to the ones a few months after (assuming you don’t have CQ instances which aren’t used at all). You will see a decrease in performance, it can be small and nearly unmeasurable, but it is there. Some actions are slower.

Ok, back to our loadtest. If you run the loadtest again and again and again, a lot of actions are persisted. When you reproduce the code and the settings of the very first loadtest and run the very first loadtest again, and you do that on a system which already faced 100 loadtests, you will see a difference. The result of this 101st loadtest is different from the first one,although the code, the settings and the loadtest are essentially the same. All the same except CQ and its memory (default.map and friends).

So, you need a mechanism which allows you to undo all changes made by such a loadtest. Only then you can perfectly reproduce every loadtest and run them 10 times without any difference in the results. I’ll try to cover such methods (there are some of them, but not all equally suitable) in some of the next posts.

And to get back to the title: Everything is content, even the history. So in contrary to my older posting,where I said:

Older versions of a handle are not content.

They are indeed content, but only when it comes to slowing down the system 🙂