Micro optimizations are important, and their importance is described by a LWN posting about the linux kernel:
Most users are unlikely to notice any amazing speed improvements resulting from these changes. But they are an important part of the ongoing effort to optimize the kernel’s behavior wherever possible; a long list of changes like this is the reason why Linux performs as well as it does.
And is not specific for the Linux kernel, but you can apply the same strategy to every piece of software. AEM as a complex (and admittedly, it can sometimes be really slow) beast applies the very same.
There are a number of cases in AEM, where do you operate not only single objcets (pages, assets, resources, nodes), but apply the same operation on multiple of these objects.
The naive approach of just iterating the list and execute the operation on a single element of that list can be quite ineffective, especially if this operation comes with a static overhead.
- For replication there are some pre-checks, then the creation of the package, the creation of the sling jobs (or sending the package to the pipeline when running on AEM as a Cloud Service), the update of the replication status, writing the audit log entries.
- When determining the replication status of a page, the replication queues need to checked if this page is still subject to a pending replication, which can get slow when the queues are full.
- Committing changes to the JCR repository; there is a certain overhead in it (validating all changes, comitting them to permanent storage, invoking the synchronous listeners, locking etc).
And in many cases these bottlenecks are known for a while, and there is API which allows to perform this action in a batch mode for a multitude of elements:
- Replication: Batch replication (you can provide a number of path strings)
- Getting status for a large amount of resources: ReplicationStatusProvider.getBatchReplicationStatus
- The Audit Log
- and many more
(The ReplicationStatusProvider has been introduced some years back when we had to deal with large workflow packages being replicated, which resulted in a lot of traversales of the replication queue entries. Adding this optimized version improved the performance by at least a factor of 10; so even in less intense operations I expect an improvement.
So if you have a hand-crafted loop to execute a certain activity on many elements, check if a more efficient batch API is available. There’s a good chance that it is already there.
If you have more cases where batch mode should be available, you it isn’t, leave a comment here. I am happy to support to either find the right API or potentially kickstart a product improvement.