What’s the maximum size of a node in JCR/AEM?

An interesting question which comes up every now and then is: “Is there a limit how large a JCR node can get?”.  And as always in IT, the answer is not that simple.

In this post I will answer that question and also outline why this limit is hardly a constraint in AEM development. Also I will show ways how you can design your application so that this limit is not a problem at all.

(Allow me a personal note here: For me the most interesting part of that question is the motivation behind it. When this question I asked I typically have the impression that the folks know that they are a bit off-limit here, because this is a topic which is discussed very rarely (if at all). That means they know that they (plan to) do something which violates some good practices. And for that reason they request re-assurance. For me this always leaves the question: Why do they do it then?? Because when you follow the recommended ways and content architecture patterns, you will never hit such a limit.)

We first have to distinguish between binaries and non-binaries. For binaries there is no real limit as they are stored in the blobstore. You can put files with 50GB in size there, not a problem. Such binaries are represented either using the nodetype “nt:file” (used most often) or using binary properties (rarely used).

And then there is the non-binary data. This data comprises of all other node- and property-types, where the information is stored within the nodestore (also often as multi-value properties). Here are limits.

In AEM CS MongoDB is used as data storage, and the maximum size of a MongoDB document is 16 Megabyte. As an approximation (it’s not always the case), you can assume that a single JCR node with all its properties is stored in a single MongoDB document, which directly results in a maximum size per node: 16 Megabytes.

In reality a node cannot get that large; other data is also stored inside that document. I recommend to store never more than 1 Megabyte of non-binary properties inside a single node. Technically you don’t have that limit in a TarMK/SegmentTar-only setup, but I would not exceed it either. You will have all kind of interesting problems and you barely have experience with such large nodes in the AEM world.

If you actually violate this limit in the size of a document, you get this very nasty exception and your content will not be stored:

javax.jcr.RepositoryException: OakOak0001: Command failed with error 10334 (BSONObjectTooLarge): ‘BSONObj size: 17907734 (0x1114016) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: “7:/var/workflow/instances/server840/2022-06-01/xxx_reviewassetsworkflow1_114/history”‘ on server cmgbr9sharedcluster51rt-shard-00-01.xxxxx:27017. The full response is {“operationTime”: {“$timestamp”: {“t”: 1656435709, “i”: 87}}, “ok”: 0.0, “errmsg”: “BSONObj size: 17907734 (0x1114016) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: \”7:/var/workflow/instances/server840/2022-06-01/xxx_reviewassetsworkflow1_114/history\””, “code”: 10334, “codeName”: “BSONObjectTooLarge”, “$clusterTime”: {“clusterTime”: {“$timestamp”: {“t”: 1656435709, “i”: 87}}, “signature”: {“hash”: {“$binary”: “MXahc2R2arLq+rc41fRzIFKzRAw=”, “$type”: “00”}, “keyId”: {“$numberLong”: “7059363699751911425”}}}} [7:/var/workflow/instances/server840/2022-06-01/xxx_reviewassetsworkflow1_114/history]
at org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:250) [org.apache.jackrabbit.oak-api:1.42.0.T20220608154910-4c59b36]

But is this really a limit which is hurting AEM developers and customers? Actually I don’t think so. And there are at least 2 good reasons why I believe this:

  • Pages barely do have that much content stored in a single component (be in the jcr:content node or any component beneath it), the same with assets. The few instances I have seen this exception just happened because a lot of “data” was stored inside properties (e.g. complete files), which would have better been stored in “nt:file” nodes as binaries.
  • Since version 1.0 Oak contains a warning if it needs to index properties larger than 100 Kilobytes, and I have rarely seen this warning in the wild. There are prominent examples in AEM itself when this warning is written for nodes in /libs.

So the best way to find out if you are close to run into this problem with the total size of the documents is to check the logs for this warning:

05.07.2022 09:31:57.326 WARN [async-index-update-fulltext-async] org.apache.jackrabbit.oak.plugins.index.lucene.LuceneDocumentMaker String length: 116946 for property: imageData at Node: /libs/wcm/core/content/editors/template/tour/content/items/third is greater than configured value 102400

Having these warnings in the logs means that you should pay attention to them; here it’s not a problem because this property is unlikely to get any larger over time. But you should pay attention to those properties which can grow over time.
(Although there is no warning if you have many smaller properties, which in sum hit the limits of the MongoDB document.)

How to mitigate?

As mentioned above it’s hard to come up with cases where this is actually a problem, especially if you are developing in line with the AEM guidelines. The only situation where I can imagine this limit to be a problem is when a lot of data is stored within a node, which is to be consumed by custom logic. But in this case you own the data and the logic. Therefor you have the chance to change the implementation in a way that this situation is not occurring anymore.

When you design your content and data structure, you should be aware of this limit of not storing more than 1 Megabyte within a single node. Because there is no workaround you have when you get that exception. The only way to make it work again is to fix the data structure and the code for it. There are 2 approaches:

  • Split the data across more nodes, ideally in a tree-ish way where you can also use application knowledge to store it in an intuitive (and often faster) way.
  • If you just have a single property which is that large, you could also try to convert it into a binary property. This is much simpler as in the majority of cases you just need to change the type of the property from String to Binary. The type conversions are done implicitly but if you store actual string data, you should worry about encoding then…

Now you know the answer to the question “What’s the maximum size of a node in JCR/AEM” and why it should never be a problem for you. Also I also outlined ways how you can avoid hitting this problem at all by choosing an appropriate node structure or storing large data in binaries instead of properties.

Happy developing and I hope you never encounter this situation!