Today I want to discuss some aspects of an AEM application, which is rarely considered during application development, but which normally gets very important right before a golive: the path element of a URL, and how it is constructed (either in full version or in a shortened one).
Newcomers to the AEM world sometimes ask how the public URLs are determined and maintained; from their experience with older or other CMS systems pages have an ID and this ID has to be mapped somehow to a URL.
Within AEM this situation is different, because the author creates a page directly in the tree structure of a site. And the name of the page can be directly mapped to a URL. So if an author creates a page /content/mysite/en/news/happy-new-year-2016, this page can be reached via https://HOST/content/mysite/en/news/happy-new-year-2016.html (in the simplest form).
From a technical point of view, the resource path is mapped to the path-element of a URL. In many cases this is a 1:1 mapping (that means, that the full resource path is taken as path of the URL). Often the „many“ means „in development environments“, because in production environments these kinds of URLs are long and contain redundant informations, which is something you should avoid. A URL also contains a domain part, and this domain part often carries information, so it isn’t needed in the path anymore.
So instead of „https://mysite.com/content/mysite/en/news.html“ we rather prefer „https://mysite.com/en/news.html“ and map only a subset of the resource path.
When mapping the resource path to a URL you must be careful, because the other way (the mapping of URL to resource path) has to work as well, and there must be exactly 1 mapping.
Such kind of mappings (I often call the mapping „resource path to URL path“ a forward mapping and the „URL path to resource path“ a reverse mapping) can be created using the /etc/map mechanisms . In a web application you need to use both mappings:
- when the request is received the URL path has to get mapped to a resource, so the sling resource processing can start.
- When the rendered page contains links to other pages, the resource path of these pages has to be provided as URL path.
(1) is done automatically by the sling if the correct ruleset is provided. (2) is much more problematic, because all references to resources provided by AEM have to be rewritten. All references? Generally spoken yes, I will discuss this later on.
This mapping can be done through the 2 API methods of the resource resolver:
- forward mapping: resourceResolver.map()
- reverse mapping: resourceResolver.resolve()
You might wonder, why you never use these 2 methods in your own code,even if I wrote above, that all the links to other pages need to rewritten. Basically you don’t have to do this, because the HTML created by the rendering pipeline (including all filters) is streamed through the Sling Output Rewriting Pipeline. This chain contains a rewriter rule, which scans through all the HTML and tries to apply a forward mapping to all links.
The Apache mod_rewrite modules also offers very flexible ways to do reverse mapping, but it lacks the a way to apply a forward mapping to the HTML pages (as the Sling Output Rewriter does). So mod_rewrite is a cool tool, but it is not sufficient to completely cover all aspects of resource mapping.