Resource path vs URL and rewriting links

Today I want to discuss some aspects of an AEM application, which is rarely considered during application development, but which normally gets very important right before a golive: the path element of a URL, and how it is constructed (either in full version or in a shortened one).

Newcomers to the AEM world sometimes ask how the public URLs are determined and maintained; from their experience with older or other CMS systems pages have an ID and this ID has to be mapped somehow to a URL.
Within AEM this situation is different, because the author creates a page directly in the tree structure of a site. And the name of the page can be directly mapped to a URL. So if an author creates a page /content/mysite/en/news/happy-new-year-2016, this page can be reached via https://HOST/content/mysite/en/news/happy-new-year-2016.html (in the simplest form).

From a technical point of view, the resource path is mapped to the path-element of a URL. In many cases this is a 1:1 mapping (that means, that the full resource path is taken as path of the URL). Often the „many“ means „in development environments“, because in production environments these kinds of URLs are long and contain redundant informations, which is something you should avoid. A URL also contains a domain part, and this domain part often carries information, so it isn’t needed in the path anymore.
So instead of „https://mysite.com/content/mysite/en/news.html“ we rather prefer „https://mysite.com/en/news.html“ and map only a subset of the resource path.

When mapping the resource path to a URL you must be careful, because the other way (the mapping of URL to resource path) has to work as well, and there must be exactly 1 mapping.

Such kind of mappings (I often call the mapping „resource path to URL path“ a forward mapping and the „URL path to resource path“ a reverse mapping) can be created using the /etc/map mechanisms . In a web application you need to use both mappings:

  1. when the request is received the URL path has to get mapped to a resource, so the sling resource processing can start.
  2. When the rendered page contains links to other pages, the resource path of these pages has to be provided as URL path.

(1) is done automatically by the sling if the correct ruleset is provided. (2) is much more problematic, because all references to resources provided by AEM have to be rewritten. All references? Generally spoken yes, I will discuss this later on.

This mapping can be done through the 2 API methods of the resource resolver:

You might wonder, why you never use these 2 methods in your own code,even if I wrote above, that all the links to other pages need to rewritten. Basically you don’t have to do this, because the HTML created by the rendering pipeline (including all filters) is streamed through the Sling Output Rewriting Pipeline. This chain contains a rewriter rule, which scans through all the HTML and tries to apply a forward mapping to all links.

But it does only run on HTML output, but there are other elements of a site, which contain references to content stored in AEM as well, for example Javascript or CSS files. References contained in these files are not rewritten, but delivered as they are stored in the repository. In many cases the setup is designed in a way, that a 1:1 mapping still works; but that’s not always possible (or wanted).

So please take this as an advice: Do not hardcode a path in CSS or Javascript files if there’s a chance that these paths need to be mapped.
Rewriting other formats than HTML is not part of AEM itself; of course you can extend the defaults and provide a rewriting capability for Javascript and CSS as well, but that’s not an easy task.)

The question is, if you really have to rewrite all resource paths at all. In many cases it is ok just to have the URLs of the HTML pages looking nice (because these are the only URLs which are displayed prominently) . But all the other resources (e.g assets, CSS and Javascript files) don’t need to get mapped at all, but there the default 1:1 mapping can be used. Then you’re fine, because you only have to do the mapping once in /etc/map and that’s it.

The Apache mod_rewrite modules also offers very flexible ways to do reverse mapping, but it lacks the a way to apply a forward mapping to the HTML pages (as the Sling Output Rewriter does). So mod_rewrite is a cool tool, but it is not sufficient to completely cover all aspects of resource mapping.