What’s the maximum size of a node in JCR/AEM?

An interesting question which comes up every now and then is: “Is there a limit how large a JCR node can get?”. And as always in IT, the answer is not that simple.

In this post I will answer that question and also outline why this limit is hardly a constraint in AEM development. Also I will show ways how you can design your application so that this limit is not a problem at all.

Continue reading →

Ways to achieve content reuse in AEM

Whenever an AEM project starts, you have a few important decisions to make. I already wrote about content architeture (here and here) and its importance to a succesful project and an efficient content development and maintenance process. A part of this content architecture discussion is the aspect of content reuse.

Content reuse happens on every AEM project, often it plays a central role. And because requirements are so different, there are many ways to achieve content reuse. In this blog post I want to outline some prominent ways you can use to reuse content in AEM. Each one comes with some unique properties, thus pay attention to them.

I identified 2 main concepts of content reuse: Reuse by copy and Reuse by reference.

Reuse by copy

The AEM multisite manager (MSM) is probably the most prominent approach for content reuse in AEM. It’s been part of the product for a long time and therefor a lot of people know it (even when you just have started with AEM, you might came across its idioms). It’s an approach which creates independent copies of the source, and helps you to keep these copies (“livecopies”) in sync with the original version (“blueprint”) . On the other hand side you still can work with the copies as you like, that means modify them, create and delete parts of the pages or even complete pages. With the help of the MSM you can always get back to the original state, or a change on the blueprint can be propagated to all livecopies (including conflict handling). So you can could call this approach “managed copy”.

The MSM is a powerful tool, but comes with its own set of complexity and error cases; you and your users should understand how it works and what situations can arise out of it. It also has performance implications, as copies are created; also rolling out changes on the blueprint to livecopies can be complex and consume quite some server resources. If you don’t need have the requirement to modify the copies, the MSM is the wrong approach for you!

Unlike the MSM the language copy approach just creates simple copies; and when these copies have been created there is no relationship anymore between the source and the target of the language copy. It’s an “un-managed copy”. Personally I don’t see much use in it in a standalone way (if used as part of a translation workflow, the situation is different).

Reuse by reference

Reuse by reference is a different approach. It does not duplicate content, but just adds references and then the reference target is injected or displayed. Thus a reference will always display the same content as the reference target, deviations and modifications are not possible. Referencing larger amount of content (beyond the scope of parts of a single page) can be problematic and hard to manage, especially if these references are not marked explicitly as such.

The main benefit of reuse by reference is that any change to the reference target is picked up immediately and reflected in the references; and that the performance impact is negligible. Also the consistency of the display of the reference with the reference target is guaranteed (when caching effects are ignored).

This approach is often used for page elements, which have to be consistent all over a site, for example for page headers of footers. But also the DAM is used in this way, even if you don’t embed the asset itself into the page, but rather just add a reference to it into the page).

If you implement reuse by reference, you always have to think about dispatcher cache invalidation, as in many cases a change to a reference target it not propagated to all references, thus the dispatcher will not know about it. You often have to take care of that by yourself.

Having said that, what are the approaches in AEM to implement reuse by reference?

Do it on your own: In standard page rendering scripts you already do includes, typically of child nodes inside the page itself. But you can also include nodes from different parts of the repository, no problem. That’s probably the simplest approach and widely used.

Another approach are Content Fragments and Experience Fragments. They are more sophisticated approaches, and also come with proper support in the authoring interface, plus components to embed them. That makes it much easier to use and start with, and it also offers some nice features on top like variants. But from a conceptual point if view it’s still a reference.

A special form of reuse by reference is “reuse by inheritance“. Typically it is implement by components like the iparsys or (when you code your own components) by using the InheritanceValueMap of Sling. In this case the reference target is always the parent (page/node). This approach is helpful when you want to inherit content down the tree (e.g from the homepage of the site to all individual pages); with the iparsys it’s hte content of a parsys, with the InheritanceValueMap it’s properties.

What approach should I choose?

The big differentiator of the reuse by copy and reuse by reference is the question if reused content should be adapted or changed at the location where it should be reused. As soon as you need to have the requirement “I would like to change the content provided to me”, the you need to have reuse by copy. And in AEM this normally mans “MSM”. Because content is not created once, but needs to be maintained and updated. At scale MSM is the best way to do it. But if you don’t have that requirement, use reuse by reference.
You might even use both approaches, “reuse by copy” to manage the reuse of content over different sites, and “reuse by reference” for content within a site.

Your friendly business consultant is a can help you find out which reuse strategy makes sense for your requirements.

AEM anti-pattern: The hardcoded content structure

One the first things I usually do when we start an AEM project is to get a clear vision of the content and its structure. We normally draw a number of graphs, discuss a number of use cases, and in the end we come up with a content structure, which satisfies the requirements. Then we implement this structure as a hierarchy of nodes and that’s it.

In many cases developers start to use this structure without too much thinking. They assume, that the node structure is always like this. They even start to hardcode paths and language names or mimic this structure. Sometimes that’s not a problem. But it is getting hard, when you are building a multi-language or multi-tenant site and you start simple with only 1 language and 1 tenant; then you might end up with these languages or tenants being hardcoded, as “there was no time to make it right”. Imagine when you start with the second language or the second site and someone hardcoded a language or a site name/path.

So, what can you do to avoid hardcoded paths? Some information is always stored at certain areas. For example you can store basic contact information on the root node, which you can reuse on the whole site. So how do you identify the correct root node if you have multiple sites? Or how do you identify the language of the site?

The easiest way is to mark these site root pages (I prefer pages here over nodes, as they can be created using the authoring UI and are much more easier authorable) with a certain property and value. The easiest way is then if you have a special template with its dedicated resource type. Then you can identify these root pages using 2 approaches:

When you need to find them all, use a JCR query and look for all pages with this specific resource type.
When you need to find the siteroot page for a given page (or resource), just iterate up the hierarchy until you find a page with this resource type.

This mechanism allows you to be very flexible in terms of the content hierarchy. You no longer depend on pages being on a certain level or having special names. It’s all dynamic and you don’t have any dependency on the content structure. This page doesn’t even have to be the root-page of the public facing site, but is just a configuration page used for administration and configuration purposes. The real root-page can be a child or grand-child of it. You have lot’s of choices then.

But wait, there is a single limitation: Every site must have a sitters page using this special template/resourcetype. But that isn’t a hard restriction, isn’t it?

And remember: Never do string operations on a content path to determine something, neither the language nor the site name. Never.