How does Sling resolve an AEM Page to the correct resource type?

On the AEM forum there’s an interesting question:

Does this mean that all requests to the page node are internally redirected to _jcr_content for HTML and XML calls but not for other extensions ?

Without answering the question right here, it raises the question: How is a cq:Page node actually resolved? Or how does Sling know, that it should resolve a call to a cq:Page node to the resourcetype specified in its jcr:content node? Especially since the “cq:Page” does not have a “sling:resourceType” property at all?

A minimal page can look like this (JSON dump):

{
   "jcr:primaryType": "cq:Page",
   "jcr:createdBy": "admin",
   "jcr:created": "Mon Dec 03 2018 19:09:44 GMT+0100",
   "jcr:content": {
     "jcr:primaryType": "cq:PageContent",
     "jcr:createdBy": "admin",
     "cq:redirectTarget": "/content/we-retail/us/en",
     "jcr:created": "Mon Dec 03 2018 19:09:44 GMT+0100",
     "cq:lastModified": "Tue Feb 09 2016 00:05:48 GMT+0100",
     "sling:resourceType": "weretail/components/structure/page",
     "cq:allowedTemplates": ["/conf/we-retail/settings/wcm/templates/.*"]
     }
}

A good tool to understand how a page is rendered is the “Recent Requests” tool available in the OSGI webconsole (/system/console/requests).

When I request /content/we-retail.html and check this request in the recent requests tool, I get this result (reduced to the relevant lines):

    1967 TIMER_END{65,ResourceResolution} URI=/content/we-retail.html resolves to Resource=JcrNodeResource, type=cq:Page, superType=null, path=/content/we-retail
    1974 LOG Resource Path Info: SlingRequestPathInfo: path='/content/we-retail', selectorString='null', extension='html', suffix='null'
    1974 TIMER_START{ServletResolution}
    1976 TIMER_START{resolveServlet(/content/we-retail)}
    1990 TIMER_END{13,resolveServlet(/content/we-retail)} Using servlet /libs/cq/Page/Page.jsp
    1991 TIMER_END{17,ServletResolution} URI=/content/we-retail.html handled by Servlet=/libs/cq/Page/Page.jsp

The interesting part is here, that /content/we-retail resource has a type (= resource type) of “cq:Page”, and when we look down in the snippet, we see that it is resolved to /libs/cq/Page/Page.jsp; that means the resource type is actually “cq/Page”.

And yes, in case no resource type property is available, Sling’s first fallback strategy is to try the JCR nodetype as a resource type. And in case this fails as well, it goes back to the default servlets.

OK, now we have a point to investigate. The /libs/cq/Page/Page.jsp is very simple:

<%@include
file="proxy.jsp" %>

And in the proxy.jsp there’s a call to the RequestDispatcher to include the jcr:content resource; and from there one the standard resolution starts which we all are used to.

And to get back to the forum question: If you look around in /libs/cq/Page, you can find more scripts to deal with extensions and also some selectors. And they all include only the call to the proxy.jsp. If your selector & extension combination do not work as expected, you might want to add an overlay there.

OSGi DS & Metatype & SCR properties

When I wrote the last blog post on migrating to OSGi annotations, I already mentioned that the property annotations you used to use with SCR annotations cannot be migrated 1:1, but instead you have to decide if you add it the OCD or if you add them as properties to the @Component annotation.

I was remembered of that when I worked on the fixing the process label properties for the workflow steps contained in ACS AEM Commons (PR 1645). And the more I think about it, the more I get the impression that this might be causing some confusion in the adoption of the OSGI annotations.

Excursion to the OSGi specification

First of all, in OSGi there 2 specifications, which are important in this case: Declarative Services (DS, chapter 112 in the OSGI r6 enterprise specification, sometimes also referenced as Service Component Model) and the Metatype Services (chapter 105 in the OSGI r6 enterprise specification). See the OSGI website for download (unfortunately there is no HTML version available for r6, only PDFs).

Declarative Services deals with the services and components, their relations and the required things around it. Quoting from the spec (112.1):

The service component model uses a declarative model for publishing, finding and binding to OSGi services. This model simplifies the task of authoring OSGi services by performing the work of registering the service and handling service dependencies. This minimizes the amount of code a programmer has to write; it also allows service components to be loaded only when they are needed.

Metatype Services care about configuration of services. Quoting from chapter 105.1

The Metatype specification defines interfaces that allow bundle developers to describe attribute types in a computer readable form using so-called metadata. The purpose of this specification is to allow services to specify the type information of data that they can use as arguments. The data is based on attributes, which are key/value pairs like properties.

Ok, how does this relate to the @Property annotation in the Felix SCR annotations? I would say, that this annotation cannot be clearly attributed to either DS or Metatype, but it served both.

You could add @Properties as annotation to class, or you could add it as with an @Property to a field. You could add a property “metatype=true” to the annotation, and then it appeared in the OSGI console (then it was a “real” metatype property in the sense of the Metatype specification).

But in either way, all the properties were provided through the ServiceContext.getProperties() method; that means, in reality it never really made a difference how you defined the property, if a the class level or to properties, if you added the metatype=true property or not. That was nice and in most times also very convenient.

This changed with the OSGI annotations. Because now the properties are described in a class annotated with the @ObjectClassDefinition annotation. Type-safe and named. But here it’s clearly a Metatype thing (it’s configuration), and it cannot be used with the Declarative Service (the services and components thing) in parallel. Now you have to make the decision: Is it a configuration item (something I use in the code) or is it a property which influences the component itself?

As an example, with SCR annotations you could write

@Component @Service
@Properties({
@Property(name="sling.servlets.resourcetypes",value="project/components/header"),
@Property(name="sling.servlets.selector",value="foo")
})
public class HeaderServlet implements SlingSafeMethodsServet { ...

Now, as these properties were visible via metatype as well, you could also overwrite them using some OSGI configuration and register the servlet on a different selector just by configuration. Or you could read these properties from the ServiceContext. That was not a problem (and hopefully noone ever really used it …).

With OSGI annotations this is no longer possible. You have configuration and component properties. You can change the configuration, but not the component properties any more.

What does that mean for you?

Mostly, don’t change all properties blindly to properties of the @ObjectClassDefinition object. For example the label of a workflow step is not configuration, but rather a property. That means there you should use something like this:

@Component(properties= {
"process.label=My process label"
}
public class MyWorkflowProcess implements WorkflowProcess {...

Disclaimer: I am not an OSGI expert, this is just my impression from dealing with that stuff a lot. Carsten, David, feel free to to correct me 🙂

From SCR annotations to OSGI annotations

Since the beginning of AEM development we used annotations to declare OSGI services; @Component, @Service, @Property and @Reference should be known to everyone how has ever developed backend stuff for AEM. The implementation behind these annotations came from the Apache Felix project, and they were called the SCR annotations (SCR = Service Component Runtime). But unlike the Service Component Runtime, which is part of the OSGI standard for quite some, these annotations were not standardized. This changed with OSGI Release 6.

With this release annotations were also standardized, but they are 100% compatible to the SCR annotations. And there are a lot of resources out there, which can help to explain the differences:

Carsten Ziegeler wrote some blog posts about: Overview, part 1, part 2, part 3
Feike wrote about them in the context of AEM with a focus on OSGI configuration
Also Nate Yolles wrote about it
(And there are definitly much more resources out there).

I recently worked on migrating a lot of the code from ACS AEM Commons from SCR annotations to OSGI annotations, and I want to share some learning I gained on the way. Because in some subtle areas the conversion isn’t that easy.

Mixed use of SCR annotations and OSGI annotations

You can mix SCR annotations and OSGI annotations in a project, you don’t need to migrate them all at once. But you can to be consistent on a class level, you cannot mix SCR and OSGI annotations in a single class. This is achieved by an extension to the maven-bundle-plugin (see below).

Migrating properties

SCR property annotations give you a lot of freedom. You can annotate them on top of the class (using the @Properties annotation as container with nested @Property annotations), you can annotate individual constant values to be properties. You can make them visible in the OSGI webconsole (technically you are creating a metatype for them), or you can mark them as private (no metatype is created).

With OSGI annotations this is different.

Metatype properties are handled in the dedicated configuration class marked with @ObjectClassDefinition. They cannot be private.
Properties which are considered to be private are attached to the @Component annotation. They cannot be changed anymore.

A limitation from a backward compatibility point of view: With SCR annotations you are not limited in the naming of properties, next to characters often the “.” (dot) and the “-” (dash, minus) was used. With OSGI r6 annotations you can easily create a property with a “.” in it

String before_after() default "something";

will result in the property with the name “before.after”; but with OSGI r6 annotations you cannot create properties with a “-” in it. Only OSGI r7 (which is supported in AEM 6.4 onwards) supports it with a construct like this:

String before$_$after() default "something";

If you want to keep compatibility with AEM 6.3, expect the breakage of property names or you need to investigate in workarounds (see #1631 of ACS AEM Commons). But my recommendation is to avoid the use of the “-” in property names alltogether and harmonize this in your project.

Update: I posted an additional blog post specifically on migrating SCR properties, mostly in the context of OSGI DS and OSGI Metatypes.

Labels & description

All the metatype stuff (that means, how OSGI configurations appear in the /system/console/configMgr view) is handled on the level of the @ObjectClassDefinition annotation and the method annotated with it. With the SCR annotations this was all mixed up between the @Component annotation and the @Property fields.

Update the tooling to make it work

If you want to work with OSGI annotations, you should update some elements in your POM as well:

Update the maven-bundle-plugin to 4.1.0
Remove the dependency to the maven-scr-plugin
Add a dependency to org.osgi:org.osgi.annotations:6.0.0 to your POM.
Then you need to add an additional execution to your maven-bundle-plugin (it’s called “generate-scr-metadata-for-unittests“) and update its configuration (see it on ACS AEM Commons POM).

The interesting part is here is the plugin to the maven-bundle-plugin, which can also handle SCR annotations; this statement allows you to mix both types of annotations.

This blog post should have given you some hints how you migrate the SCR annotations of an existing codebase to OSGI annotations. It’s definitly not a hard task, but some details can be tricky. Therefor it’s cool if you have the chance to mix both types of annotations, so you don’t need a big-bang migration for this.

Do I need a dedicated instance for page preview?

Every now and then there is this question about how to integrate a dedicated preview instance into the typical “author – publish” setup. Some seem to be confused why there is no such instance in the default setups, which allows you to preview content exactly as ob publish, but just not visible yet to the public.

The simple answer to this is: There should be no need to have such a preview instance.

When creating content in AEM, you work in an full WYSIWYG environment, which means that you always should have perfect view of the context your content lives in.Everything should be usable, and even more complex UI interfaces (like single page applications) should allow you to have a proper preview. Even most integrations should work flawlessly. So getting the full picture should alwaysbe possible on authoring itself, and this must not be the reason to introduce a preview publish.

Another reason often brought up in these discussions are approvals. When authors finish their work, they need to get an approval by someone who is not familiar with AEM. The typical workflow is then outlined like “I drop herthe link, she clicks the link, checks the page and then responds with an OK or not. And then I either implement her remarks or activate the page directly”.

The problem here is that this is an informal workflow, which happens on a differnet medium(chat, phone, email) and which is not tracked within AEM. You don’t use the ways which are offered by the product (approval workflows), which leaves you without any audit trail. One could ask the question if you have a valid approval process at all then…

Then there’s the aspect of “Our approvers are not familar and not trained with AEM!”.Well, you don’t have to train them much of AEM. If you have SSO configured and the approvers get email notifications, approving itself is very easy: Click to the link on the inbox, select the item you want to preview, open it, review it and then click approve or reject in the inbox for it. You can definitely explain that workflow in a 5 minute video.

Is there no reason to justify a dedicated preview instance? I won’t argue that there will be never the need for such a preview instance, but in most cases you don’t need it. I am not aware of any right now.

If you think you need a preview instance: Please create a post over at the AEM forum, describe your scenario, ping me and I will try to show you that you can do it easier without it 🙂

Try-with-resource or “I will never forget to close a resource resolver”

In Java 7 the idiom “try-with-resource” has been introduced in the java world. It helps to never forget to close a resource. And since Sling9 (roughly 2016) the ResourceResolver interface implements the AutoCloseable marker interface, so the try-with-resource idiom can be used.

That means, that you can and should use this approach:

try (ResourceResolver resolver = resourceResolverFactory.getServiceResourceResolver(…)) {
  // do something with the resolver
  // no need to close it explicitly, it's closed automatically
}

With this approach you omit the otherwise obligatory finally block to close the resource resolver (which can be forgotten …).

This approach helps to reduce boilerplate code and eliminates some potential for errors. If you are developing for AEM 6.2 and newer, you should be able to use it.

ResourceResolvers and Sessions — “you open it, you close it”

Update February 2023: Updated the list of API calls, which create a new Sling ResourceResolver (see this Forum thread). Thanks NitroHazeDev!

I have already written about how to use resource resolvers and JCR sessions; the basic pattern to remember is always “you open it; you close it” (2nd rule).

While this stanza seems to be quite commons sense, the question is always: When is a session or a resource resolver opened/created? What API calls are responsible for it? Let me outline this today.

API calls which open a JCR resource:

SlingRepository.loginAdminstrative() , DEPRECATED!
SlingRepository.loginService()

API calls, which create a Sling ResourceResolver

ResourceResolverFactory.getAdministrativeResourceResolver(), DEPRECATED!
ResourceResolverFactory.getResourceResolver()
ResourceResolverFactory.getServcieResourceResolver()
WorkflowSession.adaptTo(ResourceResolver.class)

These are the only API calls which open a JCR Session or a Sling ResourceResolver. And whenever you use one of these, you are responsible to close them as well.

And as corollary to this rule: if you have other methods or APIs which return a ResourceResolver or Session: Do not close these.

Some examples:

Session jcrSession = resourceResolver.adaptTo(Session.clase);

This just exposes the internal JCR Session of the ResourceResolver and because it’s not using one of the above APIs: Do not close this session! It’s closed automatically when you close the resource resolver.

Session adminSession = slingRepository.loginAdministrative(null);
Map authInfo = new HashMap();
authInfo.put(org.apache.sling.jcr.resource.api.JcrResourceConstants.AUTHENTICATION_INFO_SESSION, session);
ResourceResolver adminResourceResolver = resolverFactory.getResourceResolver(authInfo);

This code creates a resource resolver which wraps an already existing JCR Session. You have to close both adminSession and adminResourceResolver, because you created them both using the above mentioned API calls.

Validating AEM content-packages

A typical task when you run AEM as a platform is deployment. As platform team you own the platform, and you own the admin passwords. It’s your job to deploy the packages delivered by the various teams to you. And it’s also your job to keep the platform reliable and stable.

With every deployment you have the chance to break something. And not only the part of the platform which belongs to the team which code you deploy. That’s not a problem, if their (incorrect) code breaks their parts. But you break the system of other tenants, which are not involved at all in the deployment.

This is one of the most important tasks for you as platform owner. A single tenant must not break other tenants! Never! The problem is just, that it’s nearly impossible to guarantee. You typically rely on trust towards the development teams and that they earn that trust.

To help you a little bit with this, I created a simple maven plugin, which can validate content-packages against a ruleset. In this ruleset you can define, that a content-package delivered by tenant A will only contain content paths which are valid for tenant A. But the validation should fail, if the content-package would override clientlibraries of tenant-B. Or which will introduce new overlays in /apps/cq. Or which introduces a new OSGI setting with a non-project PID. Or anything else which can be part of a content-package.

Check out the the github repo and the README for its usage.

As already noted above, it can help you as a platform owner to ensure a certain quality of the packages you are supposed to install. On the other hand it can help you as project team to establish a set of rules which you want to follow. For examples you can verify a “we don’t use overlays” policy with this plugin as part of the build.

Of course the plugin is not perfect and you still can easily bypass the checks, because it does not parse the .content.xml files in there, but just checks the file system structure. And of course I cannot check bundles and the content which comes with them. But we all should assume that no team wants to break the complete system when deployment packges are being created (there are much easier ways to do so), but we just want to avoid the usual errors, which just happens when being under stress. If we catch a few of them upfront for the cost of configuring a rulset once, it’s worth the effort 🙂

Detecting JCR session leaks

A problem I encounter every now and then are leaking JCR sessions; that means that JCR sessions are opened, but never closed, but just abandoned. Like Files, JCR sessions need to be closed, otherwise their memory is not freed and they cannot be garbage collected by the JVM. Depending on the number of sessions you leave in that state this can lead to serious memory problems, ultimately leading to a crash of the JVM because of an OutOfMemory situation.

(And just to be on the safe side: In AEM ootb all ResourceResolvers use a JCR session internally; that means whatever I just said about JCR sessions applies the same way to Sling ResourceResolvers.)

I dealt with this topic already a few times (and always recommended to close the JCR sessions), but today I want to focus how you can easily find out if you are affected by this problem.

We use the fact that for every open session an mbean is registered. Whenever you see such a statement in your log:

14.08.2018 00:00:05.107 *INFO* [oak-repository-executor-1] com.adobe.granite.repository Service [80622, [org.apache.jackrabbit.oak.api.jmx.SessionMBean]] ServiceEvent REGISTERED

That’s says that an mbean service is registered for a JCR session; thus a JCR session has been opened. And of course there’s a corresponding message for unregistering:

14.08.2018 12:02:54.379 *INFO* [Apache Sling Resource Resolver Finalizer Thread] com.adobe.granite.repository Service [239851, [org.apache.jackrabbit.oak.api.jmx.SessionMBean]] ServiceEvent UNREGISTERING

So it’s very easy to find out if you don’t have a memory leak because of leaking JCR sessions: The number of log statements for registration of these mbeans must match the number of log statements for unregistration.

In many cases you probably don’t have exact matches. But that’s not a big problem if you consider:

On AEM startup a lot of sessions are opened and JCR observation listeners are registered to them. That means that a logfile with AEM starts and stops (and the number of starts do not match the number of stops) it’s very likely that these numbers do not match. Not a problem.
The registration (and also the unregistration) of these mbeans often happens in batches; if this happen during logfile rotation, you might have an imbalance, too. Again, not per se a problem.

It’s getting a problem, if the number of sessions opened is always bigger than the number of sessions closed over the course of a few days.

$ grep 'org.apache.jackrabbit.oak.api.jmx.SessionMBean' error.log | grep "ServiceEvent REGISTERED" | wc -l 212123 $ grep 'org.apache.jackrabbit.oak.api.jmx.SessionMBean' error.log | grep "ServiceEvent UNREGISTERING" | wc -l 1610 $

Here I just have the log data of a single day, and it’s very obvious, that there is a problem, as around 220k sessions are opened but never closed. On a single day!

To estimate the effect of this, we need to consider that for every of these log statements these objects are retained:

A JCR session (plus objects it reaches, and depending on the activities happening in this session it might also include any pending change, which will never going to be persisted)
A Mbean (referencing this session)

So if we assume that 1kb of memory is associated with every leaking session (and that’s probably an very optimistic assumption), this would mean that the system above would loose around 220M of heap memory every day. This system probably requires a restart every few days.

How can we find out what is causing this memory leak? Here it helps, that Oak stores the stack trace when opening sesions as part of the session object. Since around Oak 1.4 it’s only done if the number of open sessions exceeds 1000; you can tune this value with the system property “oak.sessionStats.initStackTraceThreshold”; set it to the appropriate value. This is a great help to find out where the session is opened.

And then go to /system/console/jmx, check for the “SessionStatistics” mbeans (typically quite at the bottom of the list) and select on the most recent ones (they have the openening date already in the name)

And then you can find in the “initStackTrace” the trace where this session has been opened:

With the information at hand where the session has been opened it should be obvious for you to find the right spot where to close the session.
If you spot a place where a session is opened in AEM product code but never closed, please check that with Adobe support. But be aware, that during system startup sessions are opened and will stay open while the system is running. That’s not a problem at all, and please do not report them!

It’s only a problem if you have a at least a few hundreds session open with the very same stack trace, that’s a good indication of such a “leaking session” problem.

A good followup reading on AEM HelpX pages with some details how you can fix it.

Referencing runmodes in Java

There was a question at this year’s AdaptTo, why there is no Java annotation to actually limit the scope of a component (imagine a servlet) to a specific runmode. This would allow you to specify in Java code, that a servlet is only supposed on run on author.

Technically it is easily possible to implement such an annotation. But it’s not done for a reason. Because runmodes have been developed as deployment vehicle to ship configuration. That means your deployment artefact can contain multiple configurations for the same component, and the decision which one to use is based on the runmode.
Runmodes are also not meant to be used as differentiator so code can operate differently based on this runmode. I would go so far to say, that the use of slingSettings.getRunModes() should be considered bad practice in AEM project code.

But of course the question remains, how one would implement the requirement that something must only be active on authoring (or any other environment, which can be expressed by runmodes). For that I would like to reference an earlier posting of mine. You still leverage runmodes, but this time via an indirection of OSGI configuration. This avoids hardcoding the runmode information in the java code.

Content architecture: dealing with relations

In the AEM forums I recently came across a question about slow queries. After some back and forth I understood that the poster wanted to do thousands of such queries to render a page. When rendering a product page he wanted to references the assets associated to it.

For me the approach used by the poster was straight forward, based on the assumption that the assets can reside anywhere within the repository. But that’s rarely the case. The JCR repository is not a relational database, where all you have are queries. With JCR you can also iterate through the structure. It’s a question about your content architecture and how you map it to AEM.

That means, that for such requirements like described you can easily design your application in a way, that all assets to a product are stored below the product itself.

Or for each product page there is a matching folder in the DAM where all the assets reside. So instead of a JCR query you just do a lookup of a node at a fixed location (in the first example below the subnode “assets”) or you can compute the path for the assets (/content/dam/products/prodcut_A/assets). That single lookup will always be more performant than a query, plus it’s also easier for an author to spot and work with all assets belonging to a product.

Of course this is a very simplified case. Typically requirements are more complex, also asset reuse is often required. This approach does not work that easy anymore.
And there is no real recipe for it, but ways how to deal with it.

In case of creating such relations between content we often use tags. Content having the same tag are related, and can be added automatically in the list of related content or assets. Using tags as a level of indirection is ok and in the context of the forum post also quite performant (albeit the resolution itself is powered by a single query).

Another approach to come up with modelling the content structure is to look at the workflows the authoring users are supposed to use. Because they also need to understand the relationship between content, which normally leads to something intuitive. Looking at these details might also give you a hint how it can be modeled; maybe just having the referenced assets as paths as part of the product is already enough.

So, as already said in an earlier post, there are many ways to come up with a decent content architecture, but rarely recipies. In most cases it pays of to invest time into it and consider the effects it has on the authoring workflow, performance and other operational aspects.