How does Sling resolve an AEM Page to the correct resource type?

On the AEM forum there’s an interesting question:

Does this mean that all requests to the page node are internally redirected to _jcr_content for HTML and XML calls but not for other extensions ?

Without answering the question right here, it raises the question: How is a cq:Page node actually resolved? Or how does Sling know, that it should resolve a call to a cq:Page node to the resourcetype specified in its jcr:content node? Especially since the “cq:Page” does not have a “sling:resourceType” property at all?

A minimal page can look like this (JSON dump):

{
  "jcr:primaryType": "cq:Page",
  "jcr:createdBy": "admin",
  "jcr:created": "Mon Dec 03 2018 19:09:44 GMT+0100",
  "jcr:content": {
    "jcr:primaryType": "cq:PageContent",
    "jcr:createdBy": "admin",
    "cq:redirectTarget": "/content/we-retail/us/en",
    "jcr:created": "Mon Dec 03 2018 19:09:44 GMT+0100",
    "cq:lastModified": "Tue Feb 09 2016 00:05:48 GMT+0100",
    "sling:resourceType": "weretail/components/structure/page",
    "cq:allowedTemplates": ["/conf/we-retail/settings/wcm/templates/.*"]
    }
}

A good tool to understand how a page is rendered is the “Recent Requests” tool available in the OSGI webconsole (/system/console/requests).

When I request /content/we-retail.html and check this request in the recent requests tool, I get this result (reduced to the relevant lines):

    1967 TIMER_END{65,ResourceResolution} URI=/content/we-retail.html resolves to Resource=JcrNodeResource, type=cq:Page, superType=null, path=/content/we-retail
1974 LOG Resource Path Info: SlingRequestPathInfo: path='/content/we-retail', selectorString='null', extension='html', suffix='null'
   1974 TIMER_START{ServletResolution}
   1976 TIMER_START{resolveServlet(/content/we-retail)}
   1990 TIMER_END{13,resolveServlet(/content/we-retail)} Using servlet /libs/cq/Page/Page.jsp
   1991 TIMER_END{17,ServletResolution} URI=/content/we-retail.html handled by Servlet=/libs/cq/Page/Page.jsp

The interesting part is here, that /content/we-retail resource has a type (= resource type) of “cq:Page”, and when we look down in the snippet, we see that it is resolved to /libs/cq/Page/Page.jsp; that means the resource type is actually “cq/Page”.

And yes, in case no resource type property is available, Sling’s first fallback strategy is to try the JCR nodetype as a resource type. And in case this fails as well, it goes back to the default servlets.

OK, now we have a point to investigate. The /libs/cq/Page/Page.jsp is very simple:

<%@include
file="proxy.jsp" %>

And in the proxy.jsp there’s a call to the RequestDispatcher to include the jcr:content resource; and from there one the standard resolution starts which we all are used to.

And to get back to the forum question: If you look around in /libs/cq/Page, you can find more scripts to deal with extensions and also some selectors. And they all include only the call to the proxy.jsp. If your selector & extension combination do not work as expected, you might want to add an overlay there.

Take care of your selectors!

Recently I have shown two scenarios, where selectors can be used as a way to cache several different views of a single page. This allows one to avoid HTTP parameters quite often, reducing the load on your machines and speeding up your website.

Let’s assume that you have the already mentioned handle /etc/medialibrary/trafficjam.html and your templates support to display the image in 3 different sizes “preview”,”big” and “original”. So what does happen, if somebody chooses to request the URL “/etc/medialibrary/trafficjam.tiny.html”?

I checked some CQ-based websites and tested this behaviour. Just adding a dummy-selector. In most cases you get a proper page rendered, looking the same way as without the selector. So most templates (and also template developer) ignore the selector, if the that specific template isn’t expected to handle them. So it is good, isn’t it?

Well, in combination with the dispatcher cache it isn’t good. Because the dispatcher caches everything which is returned with an HTTP statuscode of 200 from CQ. So just adding a “foo”-selector will place another copy of the page to the dispatcher cache. This happens also with a “foo1” selector and so on. In the end the disk is full and the dispatcher cannot write any more files to the disk, but will forward every request to your CQ.

So, how can you bypass this problem? As said, the dispatcher caches only, when it receives an HTTP statuscode 200. So you need to add some code to your templates which always check the selectors. If this specific template doesn’t support any selector, fine. If called with a selector, don’t return a statuscode 200, but a 302 (permanent redirect) to the same page without any selectors or just a plain 404 (“file not found”); because calling this page with selectors isn’t a valid action and should never happen, such a statuscode is ok. The same applies when the templates supports a limited set of selectors (“preview”, “big” and “original” in the example above); just add them as a whitelist and if the given selector doesn’t match, return a 302 or 404 code.

So you don’t pollute your cache and still have the flexibility to use selectors. I think that this will outweigh the cost of adjusting your templates.

Permission sensitive caching

In the last versions of the dispatcher (starting with the 4.0.1 release) Day added a very interesting feature to the dispatcher, which allows one to cache also content on dispatcher level which are not public.
Honwai Wong of the Day support team explained it very well on the TechSummit 2008. I was a bit suprised, but I even found it on slideshare (the first half of the presentation)
Honwai explains the benefits quite well. From my experience you can reduce the load on your CQ publishers (trading a request which requires the rendering of a whole page to to a request, which just checks the ACLs of a page).

If you want to use this feature, you have to make sure that for every group or user, who has to have a individual page, the dispatcher delivers the right one. Imagine you want the present the the logged-in users the latest company news, but not logged-in users shouldn’t get them. And only the managers get the link to the the latest financial data on the startpage. So you need a startpage for 3 different groups (not-logged-in users, logged-in users, managers), and the system should deliver it appropriatly. So having a single home.html isn’t enough, you need to distinguish.

The easiest way (and the Day-way ;-)) is to use a selector denoting the group the user belongs to. So home.group-logged_in.html or home.managers.html would be good. If no selector is given, we assume the user to be an anonymous user. You have to configure the linkchecker to rewrite all links to contain the correct selector. So if a user belongs to the logged_in group and requests the home.logged_in.html page, the dispatcher will ask the CQ ” the user has the following http header lines and is requesting the home.logged_in.html, is it ok?”. CQ then checks if the given http header lines do belong to a user of the group logged_in; because he is, it responses with “200 OK, just go on”. And then the dispatcher will deliver the cached file and there’s no need for the CQ to render the same page again and again. If the users doesn’t belong to that group, CQ will detect that and send a “403 Permission denied”, and the dispatcher forwards this answer then to the user. If a user is member of more than one group, having multiple “group-“selectors is perfectly valid.

Please note: I speak of groups, not of (individual) users. I don’t think that this feature is useful when each user requires a personalized page. The cache-hit ratio is pretty low (especially if you include often-changing content on it, e.g daily news or the content of an RSS feed) and the disk consumption would be huge. If a single page is 20k and you have a version cached for 1000 users, you have a disk usage of 20 MB for a single page! And don’t forget the performance impact of a directory filled up with thousands of files. If you want to personalize pages for users, caching is inappropriate. Of course the usual nasty hacks are applicable, like requesting the user-specific data via an AJAX-call and then modifying the page in the browser using Javascript.

Another note: Currently no documentation is available on the permission sensitive caching. Only the above linked presentation of Honwai Wong.

Creating cachable content using selectors

The major difference between between static object and dynamically created object is that the static ones can be stored in caches; their content they contain does not depend on user or login data, date or other parameters. They look the same on every request. So caching them is a good idea to move the load off the origin system and to accelerate the request-response cycle.

A dynamically created object is influenced by certain parameters (usually username/login, permissions, date/time, but there are countless other) and therefor their content may be different from request to request. These parameters are usually specified as query parameters and must not be cached (see the HTTP 1.1 specification in RFC 2616).

But sometimes it would be great, if we could combine these 2 approaches. For example you want to offer images in 3 resolutions: small (as a preview image e.g in folder view), big (full screen view) and original (the full resolution delivered by the picture-taking device). If you decide to deliver it as static object, it’s cachable. But you need then 3 names (one for each resolution), one for each resolution. Choosing this will blur the fact that these 3 images are the same and differ only in the fact of the image resolution. It creates 3 images instead having only one in 3 instances. Amore practical drawback is that you always have to precompute these 3 pictures and place them on a reachable location. Lazy generation is hard also.

If you choose the dynamic approach, the image would be available as one object for which the instance can be created dynamically. The drawback is here that it cannot be cached.

Day Communique has the feature (the guys of Day ported it also to Apache Sling) to use so-called selectors. They behave like the query parameters one used since the stoneage of the HTTP/HTML era. But they are not query parameters, but merely encoded in the static part of the URL. So the query part of the ULR (as of HTTP 1.1) is no longer needed.

So you can use the URLs /etc/medialib/trafficjam.preview.jpg, /etc/medialib/trafficjam.big.jpg and /etc/medialib/trafficjam.original.jpg to adress the image in the 3 required resolutions. If your dispatcher doesn’t find them in its cache, it will forward the request to your CQ, which can then scale the requested image on demand. Then the dispatcher can store the image and deliver it then from its cache. That’s a very simple and efficient way to make dynamic objects static and offload requests from your application servers.