A typical task when you run AEM as a platform is deployment. As platform team you own the platform, and you own the admin passwords. It’s your job to deploy the packages delivered by the various teams to you. And it’s also your job to keep the platform reliable and stable.
With every deployment you have the chance to break something. And not only the part of the platform which belongs to the team which code you deploy. That’s not a problem, if their (incorrect) code breaks their parts. But you break the system of other tenants, which are not involved at all in the deployment.
This is one of the most important tasks for you as platform owner. A single tenant must not break other tenants! Never! The problem is just, that it’s nearly impossible to guarantee. You typically rely on trust towards the development teams and that they earn that trust.
To help you a little bit with this, I created a simple maven plugin, which can validate content-packages against a ruleset. In this ruleset you can define, that a content-package delivered by tenant A will only contain content paths which are valid for tenant A. But the validation should fail, if the content-package would override clientlibraries of tenant-B. Or which will introduce new overlays in /apps/cq. Or which introduces a new OSGI setting with a non-project PID. Or anything else which can be part of a content-package.
Check out the the github repo and the README for its usage.
As already noted above, it can help you as a platform owner to ensure a certain quality of the packages you are supposed to install. On the other hand it can help you as project team to establish a set of rules which you want to follow. For examples you can verify a “we don’t use overlays” policy with this plugin as part of the build.
Of course the plugin is not perfect and you still can easily bypass the checks, because it does not parse the .content.xml files in there, but just checks the file system structure. And of course I cannot check bundles and the content which comes with them. But we all should assume that no team wants to break the complete system when deployment packges are being created (there are much easier ways to do so), but we just want to avoid the usual errors, which just happens when being under stress. If we catch a few of them upfront for the cost of configuring a rulset once, it’s worth the effort 🙂
Developing an application for a multi-client-installation isn’t only a technical or engineering quest, but also reveals some question, which affect administration and organisationial processes.
To ease administration, the user accounts in CQ are often organized in a hiearchy, so that users which are placed higher in the hierarchy, can administrate user which are lower in the hierarchy tree below them. Using this mechanism a administrator can easily delegate the administration of certain users to other users, which can also do adminstrative works for “their” users.
The problem arises when a user has to have rights in 2 applications within the same CQ instance and every application should have its own “application administrator” (a child node to the superuser user). Then this kind of administration is no longer possible, because it is impossible to model a hierarchy where neither application administrator user A has a parent or child relation to application administration user B nor A and B are placed in the hierarch higher than any user C.
I assume that creating accounts for different application but the same person isn’t feasible. That would be the solution which the easiest one from an engineering point of view, but this does contradict the ongoing move not to create for each application and each user a new user/password pair (single sign on).
This problem imposes the burden of user administration (e.g assigning users to groups, resetting passwords) to the superuser, because the superuser is the user, which is always (either by transition or directly) parent to any user. (A non-CQ-based solution would be to handle user related changes like password set/reset and group assignment outside of CQ and synchronize these data then into CQ, e.g. by using a directory system based on LDAP.)
ACLs, access to templates and workflows should be assigned only using groups and roles, because these can be created per application. So if an application currently is based on a user hierarchy and individual user rights it’s hard to add a new application using the same user.
So one must make sure, that all assignments are only based on groups and roles, which are created per application. Assigning individual rights to a single user isn’t the way to go.
Working in a group requires a kind of discipline from people, which some are not used to. I remember to a colleague, who always complained about his office mate who used to shout at the phone, even in normal talks. If people work together and share ressources, everyone expects to be cooperative and not to trash the work and morale of their team members.
The same applies to applications; if they share ressources (because they may run on the same machine), they should release the ressources if they are no longer needed, and should only claim ressources, if they’re needed at all. Consuming all CPU because “the developer was to lazy to develop a decent algorithm and just choose the brute-force solution” isn’t considered a good behaviour. It’s even harder if these applications are contained within one process, so a process crash not only affects application A, but also B and C. And then it doesn’t matter, that these are well-thought and perfectly developed.
So if you plan to deploy more than one appliction to a single CQ instance, you should take care, that the developers were aware of this condition and they had it in their mind. Because the application does no longer control the heap usage on its own (on top of the heap consumption of CQ itself), but must share it with other applications. It must be programmed with stability and robustness in mind, because unknown structures and services may change assumptions about timing and sizes. And yes, a restart also affects the others. So restarting because of application problems isn’t a good thing.
In general an application should never require a restart of the whole JVM; only when it comes to necessary changes to JVM parameters, it should be allowed. But all other application specific settings should be changable through special configuration templates which are evaluated during runtime, so changes are picked up immediately. This even reduces the amount of work for the system administrator, because changing such values can be delegated to special users using the ACL mechanism.
In large companies there’s a need for a WCMS like Day CQ in many areas; of course for the company webpage, but also for some divisions offering services to internal or external customers, which either implement their application on top of Day CQ, use CQ as a proxy for other backend systems or just need a system for providing the online help.
Some of these systems are rather small, but nevertheless the business needs a system, where authors can create and modify content according to the business’ needs. If you already have CQ and knowledge in creating and operating applications, it’s quite natural to use it to satisfy these needs. The problem is, that building up and operating a CQ environment isn’t cheap for only one application. Very often one asks if CQ is capable of hosting several application within one CQ installation to leverage effets of scale. CQ is able to handle hundreds of thousands of content handles per instance, so it’s able to host 5 applications with 20 thousands handles each, isn’t it?
Well, that’s a good question. By design CQ is able to host multiple applications, enfore content separation by the ACL concept and limit the access to templates for users. Sounds good. But — as always — there are problematic details.
I will cover these problems and ways to their resolution in the next posts.