Ways to access your content with JCR (part 2): Performance aspects

In the previous post I described ways how you can access your data in JCR. I also showed, that the performance of these ways is different.

For the direct lookup of a node the complexity depends on the number of path elements, which need to be traversed from the root node to that node. Also the number of child nodes on each of these levels has an impact. But in general this lookup is pretty fast.
If you just iterate through child nodes (using node.getChildren()), it’s even faster, the lookup complexity is constant.
The JCR search as third approach no general estimation can be given, it depends too much on the query.

First, the JCR query consists of 2 parts: An index lookup and operations on the JCR bundles.

Note: Of course you can build queries, where an index lookup is not required and might be optimized by the query engine; for example “//jcr:root/content/geometrixx/*” would return all nodes below /content/geometrixx, but building such queries isn’t useful at all, and I consider them as a mis-use of JCR queries.

This combination is usually in such a way, that the index lookup produces a set of possible results, which are then filtered by the means of JCR, e.g. by applying path constraints or node type restrictions. In every case the ACLs taken into account.

Let’s consider this simple example:

/jcr:root/content/geometrixx/en//*[jcr:contains(., 'support')]

First, it looks up all properties for the search term “support”. As the backing system for JCR search is Apache Lucene, and Lucene is implemented as inverted index, direct lookups like this are extremely efficient.
Then for all results the path is calculated. This means, that for each result item the parent is lookup recursively until the root node. In that process the ACL checks are performed.

As soon as the query gets complicated and Lucene delivers many results (for example because you are looking for wildcards) or you do complex JCR-based operations in the query, this isn’t that easy and performant any more. The more nodes you need to load to execute a query (and for all path and ACLs evaluations you need to load the bundle from disk to your BundleCache) the more time it takes.

But if you traverse a subtree with node.getChildren() only these bundles are loaded to the BundleCache for evaluation.

So in many cases, especially when you need to search a small subtree for a specific node, it’s more efficient to manually traverse the tree and search for the node(s) than to use JCR search. This means, you use the other 2 approaches listed above. You might not be used to it when you worked with a relational database for years, but it is a very feasible way with possibilties of huge performance benefits.
So, give it a try. But don’t expect differences on your developer machine with a blazing fast SSD and 1 gigabyte repository size. Test it on your production-size repository!

2 thoughts on “Ways to access your content with JCR (part 2): Performance aspects”

alexkli says:

December 8, 2012 at 03:45

The sample queries are a bit broken: the first one should be /jcr:root/content/geometrixx/* and the second one /jcr:root/content/geometrixx/en//*[jcr:contains(., ‘support’)] (/ missing after jcr:root).
1. Jörg says:
  
  December 8, 2012 at 19:40
  
  Thanks Alex for spotting. Fixed it in the postings.
  
  Jörg

Comments are closed.

Share this:

Published by Jörg

2 thoughts on “Ways to access your content with JCR (part 2): Performance aspects”