AEM micro-optimization (part 1)

As a followup on the previous article I want to show you, how a micro-optimization can look like.  My colleague Miroslav Smiljanic found that there is a significant difference in the time it takes to compute these statements (1) and (2).

Node node = …
Session session = node.getSession();
String parentPath = node.getParent().getPath();

Node p1 = node.getParent(); // (1)
Node p2 = session.getNode(parentPath); // (2)

assertEquals(p1,p2);

He did the whole writeup in the context of a suggested improvement in Sling, and proved it with impressive numbers.

Is this change important? Just by itself it is not, because going the resource/node tree upwards is not that common compared to going downwards the tree. So replacing a single call might yield only in an improvement of a fraction of a milisecond, even if the case (2) is up to 200 times faster than (1)!

But if we can replace the code in all cases where the getParent() can be used with the performant getParent() call, especially in the lowlevel areas of AEM and Sling, all areas might benefit from it. And then we don’t execute it only once per page rendering, but maybe a hundred times. And then we might end up with tens of miliseconds of improvement already, for any request!

And in special usecases the effect might be even higher (for example if your code is constantly traversing the tree upwards).

Another example of such an micro-optimization, which is normally quite insignificant but can yield huge benefits in special cases can be found in SLING-10269, where I found that a built-in caching of the isResourceType() results reduces the rendering times of some special requests by 50%, because it is done thousands of times.

Typically micro-optimizations have these properties:

  • In the general case the improvement is barely visible (< 1% improvement of performance)
  • In edge cases they can be a life saver, because they reduce execution time by a much larger percentage.

The interesting part is, that these improvements accumulate over time, and that’s where it is getting interesting. When you have implemented 10 of these in low-level routines the chances are high that your usecase benefits from it as well. Maybe by 10 times 0.5% performance improvement, but maybe also a 20% improvement, because you hit the sweet spot of one of these.

So it is definitely worth to pay attention to these improvements.

My recommendation for you: Read the entry in the Oak “Do’s and Don’ts” page and try to implement this learning in your codebase. And if you find more of such cases in the Sling codebase the community appreciates a ticket.

(Photo by KAL VISUALS on Unsplash)

The effect of micro-optimizations

Optimizing software for speed is a delicate topic. Often you hear the saying “Make it work, make it right, make it fast”, implying performance optimization should be the last step you should do when you code. Which is true to a very large extent.

But in many cases, you are happy if your budget allows to you to get to the “get it right” phase, and you rarely get the chance to kick off a decent performance optimization phase. That’s a situation which is true in many areas of the software industry, and performance optimization is often only done when absolutely necessary. Which is unfortunate, because it leaves us with a lot of software, which has performance problems. And in many cases a large part of the problem could be avoided if only a few optimizations were done (at the right spot, of course).

But all this statement of “performance improvement phase” assumes, that it requires huge efforts to make software more performant. Which in general is true, but there are typically a number of actions, which can be implemented quite easily and which can be beneficial. Of course these rarely boost your overall application performance by 50%, but most often it just speeds up certain operations. But depending on the frequency these operations are called it can sum into a substantial improvement.

I did once a performance tuning session on an AEM publish instance to improve the raw page rendering performance of an application. The goal was to squeeze more page responses out of the given hardware. Using a performance test and a profiler I was able to find the creation of JCR sessions and Sling ResourceResolvers to take 1-2 milliseconds, which was worth to investigate. Armed with this knowledge I combed through the codebase, reviewed all cases where a new Session is being created and removed all cases where it was not necessary. This was really a micro-optimization, because I focussed on tiny pieces of the code (not even the areas which are called many times) , and the regular page rendering (on a developer machine) was not improving at all. But in production this optimization turned out to help a lot, because it allowed us to deliver 20% more pages per second out of the publish at peak.

In this case I spend quite some amount of time to come to the conclusion, that opening sessions can be expensive under load. But now I know that and spread that knowledge via code reviews and blog posts.

Most often you don’t see the negative effect of these anti-patterns (unless you overdo it and every Sling Models opens a new ResourceResolver), and therefor the positive effects of applying these micro-optimizations are not immediately visible. And in the end, applying 10 micro-optimizations with a ~1% speedup each sum up to a pretty nice number.

And of course: If you can apply such a micro-optimization in a codepath which is heavily used, the effects can be even larger!

So my recommendation to you: If you come across such a piece of code, optimize it. Even if you cannot quantify and measure the immediate performance benefit of it, do it.

Same as:

(for int=0;i<= 100;i++) {
  othernumber += i;
}

I cannot quantify the improvement, but I know, that

othernumber += 5050;

is faster than the loop, no questions asked. (Although that’s a bad example, because hopefully the compiler would do it for me.)

In the upcoming blog posts I want to show you a few cases of such micro-optimizations in AEM, which I personally used with good success. Stay tuned.

(Photo by Michael Longmire on Unsplash)

Writing integration tests for AEM, part 5

This a part of my ongoing series about writing integration tests with AEM.

Integration tests help you to keep control
Photo by Chris Leipelt on Unsplash

Writing tests seems to be a recurring topic 🙂 This week I wrote some integration tests which included one of the most important workflows in AEM: Activation of pages. Right now haven’t blogged about the handling of both author and publish in an integration test. I will show you how to do it.

So let’s assume that you want to do some product testing and validate that replication is working and also writes correct audit log entries. This should be covered with an integration test. You can find the complete sourcecode in the ActivatePageIT at the integrationtests github project.

Before we dig into the code itself, a small hint for the development phase of tests. If you can want to execute only a single integration tests, you can instruct maven to do this with the parameter “-Dit.test=<Name of the testclass>”. So in our case the complete maven command line looks like this:

mvn clean install -Peaas-local -Dit.test=ActivatePageIT -Dit.author.url=http://localhost:4502

(assuming that you don’t run your AEM author on same port as I do … if you want to change that, modify the parameters in the pom.xml).

On the coding side, the approach follows of every integration test: we need to get the correct clients first:

As we want to use replication, we use a ReplicationClient, which is provided by the testing client library.

Next we define use a custom Page class, which allows us to define the parentPath:

Then the actual test case is straight forward.

I used some more features of the testing clients to just test the existence or absence of the page, plus the doGetJson() method to get the JSON representation of the pages (in the getAuditEntries() method).

So, writing integration tests with this tooling at hand is easy and actual fun. Especially if the test code is straight forward to implement like here.

AEM as a Cloud Service and the handling of binaries

When you are long-time user of AEM 6.x (and even CQ5), you are probably familiar with the Asset Update workflow. The primary task of it is the extraction of metadata from the binary asset and the creation of (smaller) renditions for it. This workflow is normally executed on the AEM authoring instance.

“Never underestimate the bandwidth …!” (symbolic photo)
Photo by Massimo Botturi on Unsplash

But since the begin of this approach it is plagued with problems:

  • The question of supported filetypes. Given the almost unlimited amount of file formats and their often proprietary implementation, it’s not always possible to perform these operations. In many cases, the support of these file types within Java is poor.
  • Additionally, depending on the size and the type of the asset and the quality of the library which provides support for this filetype, the processing can be very time consuming and also consume a lot of heap. Imagine that you can want to create renditions of a TIFF file which has dimensions of 10k * 10k pixels (assuming that you have a 24bit resolution) this requires 300 megabyte of contininous heap to store an uncompressed version of it. You have to size the heap size accordingly, otherwise you will run out of memory (OOM).
  • To avoid these issues, for many filetypes external tools like imagemagick were used, which both come with support of various image types (in many cases much better than the Java Image library), plus the ability not to blow the AEM process when the process fails (because imagemagick runs in a dedicated process). But also the capabilities of imagemagick are limited, and the support for more exotic (non-image) file types could be better.
  • In all cases you need to size your hardware for a worst case scenario. For example you need to provision a lot of heap, if your authors might start to ingest large images. And you need to provision enough CPU to mitigate negative impacts on all other operations.
  • Another big problem is the latency. Assuming that your asset is very large (it’s not uncommon to have assets larger than 1 Gigabyte), it takes time to copy the binary from the (remote) datastore to a location where the processing takes place. Even if you can transfer 100 MiB per second, it needs 10 seconds to have the file transferred to the local disk; normally this process runs through the AEM JVM, which is problematic in terms of heap usage, and also can cause performance problems. Not to mention code, which is not aware of the possible sizes and tries to load the complete stream into memory.

In AEM as a Cloud Service this is offloaded, and that’s what AssetCompute is for. It performs all these steps on its own; also not using imagemagick for image handling, but high quality and optimized routines which also power other Adobe products.

But what does that mean for you as developer for AEM as a Cloud Service? In the first place, it does not have any impact. But you should learn a few things from it:

  • Do not create any renditions on your own, use assetCompute instead. This service is extensible (checkout Project Firefly), so you can do all kind of asset operations there. There is no need anymore to use the java image library code.
  • Avoid streaming binary data through AEM. AEM as a Cloud Service itself (the JVM) should not be bothered with streaming binary data into and out of the JVM. If you want to upload files into AEM, you should use the aem-upload library

In general, think twice before you open an InputStream in AEM (either via Rendition.getStream() or also via the JCR API). Normally you never know how much data is behind it, and for almost all transformation cases it makes sense to use AssetCompute to perform these.

META: domain switch

After some 12 years I finally switched over the domain name of this blog to something which is more closely attached to me. Don’t be surprised if you end up on “cqdump.joerghoh.de”. But of course the old domain name will continue to work, and I don’t plan to remove it.

CRX DE driven development

A recurring problem I see in AEM project implementations is the problem of missing abstraction. A lot of code deals passes around resources, ValueMaps and even Strings (paths). And because we are supposed to build software the proper way, the called method checks (or more often: not checks) that the provided resource parameter is not null, and that the resource is of the correct type.

But instead of dealing with resources, the class names and comments suggest that the code actually dealing with products. Or website structures. Or assets. But instead of using a “product” classes (or website class, or the provided asset class) still resources are used. The abstraction is missing!

For me the root cause of this problem is the CRXDE Lite. Exactly that thing which you can open on your local AEM instance at /crx/de/. Because it shows you a very nice hierarchical view to the repository, it shows you paths, and properties. And if a developer starts to build a mental model of something, this tool comes in quite handy. Because you can reach everything via path, which is a String! So instead of expressing relations between concepts I see often this:

String path = …
String pathResource = resourceResolver.getResource(path);

And because we know it’s an existing, and we want to determine the parent resource, I see

String path = …
int lastSlash = path.lastIndexOf("/");
String parentPath = path.substring(0,lastSlash);
Resource parentResource = resourceResolver.getResource(parentPath);

Which is hilarious, because

pathResource.getParent();

is much easier to use (and did you spot the off-by-one bug in the String operation example? And what does happen if the path ends already with a slash?). But that still leaves the question, why you need to get the parent resource. Maybe a

ProductCategory category = myProduct.getCategory();

is a more expressive way to describe the same. I would definitely prefer it.

So CRXDE is your biggest enemy when designing your application. If you are a seasoned AEM developer, my recommendation to you: Don’t explain your application with CRXDE. Rather use proper abstractions. Don’t do CRXDE driven development!

If that topic sounds familiar to you: I did a talk on the AdaptTo() conference 2020 regarding this topic, you can find the recording here. There I explain the problem in more detail, also including some better examples 🙂

Writing integration tests for AEM (part 4)

This a part of my ongoing series about writing integration tests with AEM.

In the last post I mentioned that the URL provided to our integration tests allows us to test our dispatcher rules as well, a kind of “unit testing” the dispatcher setup. That’s what we do now.

This is the German way of saying “Stop here if you don’t have the right user-agent^Wvehicle”
Photo by Julian Hochgesang on Unsplash

As a first step we need to create a new RequestValidationClient, because we need to customize the underlying HTTP client, so it does not automatically follow HTTP redirects; otherwise it would be impossible for us to test redirects. And while we are on it, we want to customize the user-agent header as well, so it’s easier to spot the requests we do during the ingration tests. The way to customize the underlying HTTP client is documented, but a bit clumsy. But besides that this RequestValidationClient is not different from the SlingClient it’s derived from. Maybe we change that later.

The actual integration tests are in PublishRedirectsIT. Here I use this RequestValidationClient to perform unauthenticated requests (as end-users typically do) against the publish instance. To illustrate the testing of the client, there are 3 tests:

  • In the testInitialRedirectAndHomepage method it is validated, that a request to “/” will result in a permanent redirect to /en/us.html. Additionally it is made sure that /us/en.html is actually present and returns a 200.
  • A second test is hitting /system/console, which must never be exposed to the internet.
  • A third test ensures, that the default get servlet is properly secured, so that the infamous “infinity” selector for the JSON extension is returning a 404.

With this approach it is possible to validate that that complete security checklist of the dispatcher is actually implemented and that all “invalid” urls are properly blocked.

Some remarks to the PublishRedirectIT implementation itself:

  • Also here the tests are a bit clumsier than they could be. First, because the recommended ways to perform a HTTP request always have a “expectedReturnCode” parameter, which is unfortunate because we want to perform this test ourself. For that reason I build a small workaround to accept all status codes. The testing clients should offer that natively though.
  • And secondly, I encountered problems with the authentication on the publish. And that’s the reason why the creation of the anonymousPublish is how it is.

But anyway, that’s a neat approach to validate that your dispatcher setup is properly done. And of course you could also use the JsoupClient to test a page on publish as well.

Some remarks if you want to execute these tests in your system: I adjusted the configuration of the “dispatcher” module of the repository as well, so you can easily use it together with the dispatcher docker image (check out this fantastic documentation).

That’s it for today, happy testing!

Writing integration tests for AEM (part 3)

This a part of my ongoing series about writing integration tests with AEM.

In the last post on writing integration tests with AEM I quickly walked you through a simple test case for authoring instances, but I didn’t provide much context, what is going on exactly, and how it will be executed in Cloud Manager. That’s what I want to talk about today.

As we have seen, some relevant parameters for integration tests are provided are provided externally, most notable the URLs for the environment plus credentials.

In the pom.xml it looks like this:

Here you can see defaults, but you can simply override them by providing the exact values with the command line, as you already did in the previous post with overriding the URL of the authoring instance. The POM just introduces another indirection via properties which is technically not really necessary.

CloudManager works the same way: It invokes the maven-failsafe-plugin to execute the integration tests and provides overrides these default values with the correct data specific for that environment (including the admin credentials).

In detail, the urls are configured like this:

This means that your tests acess the loadbalanced author cluster and the loadbalanced publish farm (including dispatcher!).

This has 2 implications:

  • On your local installation you should have as well a dispatcher configured in front of the publish instance to have an identical setup
  • You can use integration tests also to validate your publish dispatcher rules!

And armed with this knowledge I will show you in the next post how you can validate with integration tests, that your domain setup is configured correctly.

Writing integration tests for AEM (part 2)

This a part of my ongoing series about writing integration tests with AEM.

In the last blog post I gave you a quick overview over the integration test framework you have at hand and what chances it gives you.

Now let’s get our hands dirty and create our first integration test. We will write a simple test which is connecting to the local author instance and tests that the wknd homepage is completely loaded and that all referenced files (images, javascript, css, …) are present.

This is where we start — just us and a lot of space to fill with good tests
Photo by Neven Krcmarek on Unsplash

Prerqusite is that you have the wknd-package fully installed (clone the wknd github repo, build it and install the package in the “all” module should do the trick). There is no specific requirement on AEM itself, so AEM 6.4 or newer should suffice.

Basic structure

When you have started with the maven archetype for AEM, you should have a it.tests maven module, which contains all integration tests. Although they are tests, they are stored in src/java. That means that the whole test suite is created as build-artifact, and thus can be easily executed also outside of the maven build process.

Another special thing to remember: All test class names must end with “IT” (like “IntegrationTest”), otherwise they are ignored.

A custom client

(I have all that code ready on github, so you can just clone it and start playing.)

As a first step we will create a custom test client, which is able to handle the parse a rendered page. As a basis I started with HtmlUnit, but that turned out to be a bit unflexible regarding multiple calls, so I switched over to jsoup for that.
That means our first piece of code is a JsoupClient. It extends the standard CQClient, and for that we are able to use the “doRequest()” method to fetch the page content.

That’s the basis, because from now on we just deal with jsoup specific structures (Document, Node). Then we add the actual test class (AuthorHomepageValidationIT), which has first some boilerplate code:

The basis for all is the CQAuthorClassRule, and based on that we create a jsoupClient object, which is itself using an “AdminClient” (that means using the admin user for the tests). And now we can easily start and create simple tests with this jsoupClient instance.

(Please check the files in the github repo to get the complete picture, I omitted here quite a bit for brevity.)

We are using the standard tooling for unit tests here to create an integration test, that means using the @Test annotation plus the usual set of asserts. But we are doing integration tests, that means that we are just validating the operation which is executed by AEM. If you are start to use a mocking framework here, you are wrong!

OK, how do I run this integration test?

Now, as we have written our integration test, we need to execute it. To do that, use your command line and execute this command in the it.tests module:

mvn clean install -Peaas-local -Dit.author.url=http://localhost:4502

(You need to specify the author url as parameter because my personal default of port 6602 for my local authoring instance might not work on your local instance. Check the pom.xml for all details, it is not that complicated.)

The output will look like this:

[INFO] --- maven-failsafe-plugin:2.21.0:integration-test (default-integration-test) @ de.joerghoh.aem.it.tests ---
[INFO]
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running integrationtests.it.tests.AuthorHomepageValidationIT
[main] INFO com.adobe.cq.testing.junit.rules.ConfigurableInstance - Using Basic Auth as default. Index lane detection: false
[main] INFO org.apache.sling.testing.junit.rules.instance.util.ConfigurationPool - Reading initial configurations from the system properties
[main] INFO org.apache.sling.testing.junit.rules.instance.util.ConfigurationPool - Found 1 instance configuration(s) from the system properties
[main] INFO org.apache.sling.testing.junit.rules.instance.ExistingInstanceStatement - InstanceConfiguration (URL: http://localhost:6602, runmode: author) found for test integrationtests.it.tests.AuthorHomepageValidationIT
[main] WARN com.adobe.cq.testing.client.CQClient - Cannot resolve path //fonts.googleapis.com/css?family=Source+Sans+Pro:400,600|Asar&display=swap: Illegal character in query at index 57: //fonts.googleapis.com/css?family=Source+Sans+Pro:400,600|Asar&display=swap
[main] INFO integrationtests.it.tests.AuthorHomepageValidationIT - skipping linked resource from another domain: https://wknd.site/content/wknd/language-masters/en.html
[main] INFO integrationtests.it.tests.AuthorHomepageValidationIT - validated 148 linked resources
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.787 s - in integrationtests.it.tests.AuthorHomepageValidationIT
[INFO] Running integrationtests.it.tests.GetPageIT
[main] INFO com.adobe.cq.testing.junit.rules.ConfigurableInstance - Using LoginToken Auth as default. Index lane detection: false
[main] INFO com.adobe.cq.testing.junit.rules.ConfigurableInstance - Using LoginToken Auth as default. Index lane detection: false
[main] INFO org.apache.sling.testing.junit.rules.instance.ExistingInstanceStatement - InstanceConfiguration (URL: http://localhost:6602, runmode: author) found for test integrationtests.it.tests.GetPageIT
[WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.002 s - in integrationtests.it.tests.GetPageIT
[INFO] Running integrationtests.it.tests.CreatePageIT
[main] INFO com.adobe.cq.testing.junit.rules.ConfigurableInstance - Using LoginToken Auth as default. Index lane detection: false
[main] INFO org.apache.sling.testing.junit.rules.instance.ExistingInstanceStatement - InstanceConfiguration (URL: http://localhost:6602, runmode: author) found for test integrationtests.it.tests.CreatePageIT
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.3 s - in integrationtests.it.tests.CreatePageIT
[INFO]
[INFO] Results:
[INFO]
[WARNING] Tests run: 3, Failures: 0, Errors: 0, Skipped: 1

I marked the relevant output with blue. It shows that my test was reaching out to my local AEM instance at port 6602 and validated 148 resources in total. If you want to get more details what exactly was validated, add an info log message here.

Congratulations, you have just run your first integration test!

I leave it to you to provoke a failure of that integration tests; all you have to do is to have a image or a clientlib referenced on the wknd homepage (specified here) which does not return a HTTP status code 200. And of course this test is quite generic, as it does not mandate that a specific clientlibrary is there or that even the page footer is working. But as you have the power of JSOUP at hand, it should not be too hard to write even more assertions to check these additional requirements.

In the next blog post I will elaborate a bit more around running integration tests and configuring them properly, before we start to explore the possibilities offered to us by the AEM testing clients.

(Update 2020-12-18: Changed the profile name to match CloudManager behavior)

Writing integration tests for AEM (part 1)

This a part of my ongoing series about writing integration tests with AEM.

Building tests is an integral part of software development, and does not only include unit test but also integration and frontend tests. With AEM as a Cloud Service integration tests are getting more and more important, as it allows you to run automated tests on “real” cloud service instances as part of the Cloudmanager Pipeline. See the documentation of CloudManager.

If you check the details, you will find that the overall structure for integration tests are part of all projects which are created based on the AEM Project Archetype since at least version 11 (April 2017). So technically everyone is able to implement integration tests based on that structure yet, but I haven’t seen them to have received proper attention. I ignored these integration tests also for most of the time…

A vintage implementation of a HTTP Client with 3 threads (symbol photo)
Photo by Pavan Trikutam on Unsplash

Recently I worked with my colleague Valentin Olteanu on creating a small integration test suite, and I was honestly surprised how easy it can be. And because integration tests are now an official part of the Cloud Manager pipeline and the first place where your code can be tested on an real CM instance.

So I want to give you a short overview of the capabilities of the Integration-Test framework for AEM. In the next blog post I will show a real-life usecase where such Integration tests can really help.

Ok, what are these integration tests and what can we do with these tests?

Integration tests are running outside of AEM, as part of the deployment/test pipeline. They test the interaction of your custom application (which you have validated with your unittests) with everything else, most prominently AEM itself. You can test the complete page rendering, you can test custom integrations, background processes and everything where you need the full AEM stack, and where mocks are not sufficient.

The test framework itself provides you proper abstraction to perform a lot of operations in very convenient way.  For example

  • There is an AssetClient which allows you to upload assets into AEM
  • Functionality to create/delete/modify pages (as part of the CQClient)
  • Functionaity to replicate content
  • and much more (see the whole list of clients)

And everything wrapped in java, so you don’t have to deal with underlying HTTP requests. So this is an effective way to remote  control AEM from within java code. But of course there’s also a raw preconfigured HTTP Client (with hostnames, authentication etc alreay set) which you can use to perform custom actions. And the testing framework around is still the junit framework we are all used to.

But be aware: This integration test suite cannot directly access the JCR and Sling API, because it is running externally. If you want to create nodes or read their status, you have to rely on other means.

It is also no Selenium Test! If you want to do proper UI testing, please check the documentation on UI testing (still in beta, expect the general availability soon). I plan to create a blog post about it.

A very simple integration (basically just a validation of a page which has been created with a Page rule) can look like this (the full code)

    @Test
    public void testCreatePageAsAuthor() throws InterruptedException {
        // This shows that it exists for the author user
        userRule.getClient().pageExistsWithRetry(pageRule.getPath(), TIMEOUT);
    }

This integration test class itself comes with a bit of boilerplate code, mostly Junit rules to setup the connection and prepare the environment (for example to create the page for which we test the existence).

And the best thing: You don’t need to take care of URLs and authentication, because these parameters are specified outside of your code and are normally provided via Maven properties. This keeps the code very portable, and gives you the chance to execute it both locally and as part of the Cloudmanager pipeline.

Iin the next blog post I want to demonstrate how easy it can be to validate that a page in the AEM author renders correctly.