Ways to access your content with JCR (part 2): Performance aspects

In the previous post I described ways how you can access your data in JCR. I also showed, that the performance of these ways is different.

  • For the direct lookup of a node the complexity depends on the number of path elements, which need to be traversed from the root node to that node. Also the number of child nodes on each of these levels has an impact. But in general this lookup is pretty fast.
  • If you just iterate through child nodes (using node.getChildren()), it’s even faster, the lookup complexity is constant.
  • The JCR search as third approach no general estimation can be given, it depends too much on the query.

First, the JCR query consists of 2 parts: An index lookup and operations on the JCR bundles.

Note: Of course you can build queries, where an index lookup is not required and might be optimized by the query engine; for example “//jcr:root/content/geometrixx/*” would return all nodes below /content/geometrixx, but building such queries isn’t useful at all, and I consider them as a mis-use of JCR queries.

This combination is usually in such a way, that the index lookup produces a set of possible results, which are then filtered by the means of JCR, e.g. by applying path constraints or node type restrictions. In every case the ACLs taken into account.

Let’s consider this simple example:


/jcr:root/content/geometrixx/en//*[jcr:contains(., 'support')]

First, it looks up all properties for the search term “support”. As the backing system for JCR search is Apache Lucene, and Lucene is implemented as inverted index, direct lookups like this are extremely efficient.
Then for all results the path is calculated. This means, that for each result item the parent is lookup recursively until the root node. In that process the ACL checks are performed.

As soon as the query gets complicated and Lucene delivers many results (for example because you are looking for wildcards) or you do complex JCR-based operations in the query, this isn’t that easy and performant any more. The more nodes you need to load to execute a query (and for all path and ACLs evaluations you need to load the bundle from disk to your BundleCache) the more time it takes.

But if you traverse a subtree with node.getChildren() only these bundles are loaded to the BundleCache for evaluation.

So in many cases, especially when you need to search a small subtree for a specific node, it’s more efficient to manually traverse the tree and search for the node(s) than to use JCR search. This means, you use the other 2 approaches listed above. You might not be used to it when you worked with a relational database for years, but it is a very feasible way with possibilties of huge performance benefits.
So, give it a try. But don’t expect differences on your developer machine with a blazing fast SSD and 1 gigabyte repository size. Test it on your production-size repository!

CQ coding patterns: Sling vs JCR (part 3)

In the last 2 postings (part 1, part 2) of the “Sling vs JCR” shootout I only handled read cases, but as Ryan D Lunka pointed out in the comments of on part 1, writing to the repository is the harder part. I agree on that. While on reading you often can streamline the process of reading nodes and properties and ignore some bits and pieces (e.g. mixins, nodeTypes), on the write part that’s a very important information. Because on the write part you define the layout and types, so you need to have full control of it. And the JCR API is very good in giving you control over this, so there’s not much room for simplification for a Sling solution.

Indeed, in the past there was no “Resource.save()” method or something like this. Essentially the resourceResolver (as a somehow equivalent of the JCR session) was only designed for retrieving and reading information, but not for writing. There are some front ends like the Sling POST servlet, which access directly the JCR repository to persist changes. But these are only workarounds for the fact, that there is no real write support on a Sling Resource level.

In the last months there was some activity in that area. So there is a wiki page for a design discussion about supporting CRUD operations within Sling, and code has been already submitted. According to Carsten Ziegler, the submitter of this Sling issue, this feature should be available for the upcoming version of CQ. So let’s see how this will really look like.

But till then you have 2 options, to get your data into the CRX repository:

  • Use the SlingPostServlet: Especially if you want to create or change data in the repository from remote, this servlet is your guy. It is flexible enough to cater most of normal use cases. For example most (if not all) of the CQ dialogs and actions use this SlingPostServlet to persist their data in the repository.
  • If that’s not sufficient for you, or you need to write structures directly from within code deployed within CQ, use JCR.

So here Sling currently does not bring any benefit, if your only job is to save data from within OSGI to CQ. If many cases the Default POST Servlet might be suffcient when you need to update data in CRX from external. And let’s see what CQ 5.6 brings with it.

CQ coding patterns: Sling vs JCR (part 2)

In the last posting I showed the benefits of Sling regarding resource access over plain JCR. But not only in resource access both frameworks offer similar functionality, but also in the important area of listening to changes in the repository. So today I want to compare JCR observation to Sling eventing.

JCR observation is a part of the JCR specification and is a very easy way to listen for changes in the repository.

@component (immediate=true, metatype=false)
@service
class Listener implements ObservationListener {

  @Reference
  SlingRepository repo;

  Session session;
  Logger log = LoggerFactory.getLogger (Listener.class);

  @activate
  protected void activate () {
    try {
      Session adminSession = repo.loginAdministrative(null);
      session = adminSession.impersonate (new SimpleCredentials("author",new char[0]));
      adminSession.logout();
      adminSession = null;
      session.getObservationManager.addEventListener( this, // listener
        NODE_CREATED|NODE_DELETED|NODE_MOVED, // eventTypes
        "/", // absPath
        true, // isDeep
        null, // uuid
        null, //nodeTypeNames
        true // noLocal
      );
    } catch (RepositoryException e) {
      log.error ("Error while registering observation", e);
    }
  }

  @deactivate
  protected void deactivate() {
    session.getObservationManager.removeListener(this);
    session.logout();
    session = null:
  }

  private handleEvents (Events events) {
    while (events.hasNext()) {
      Event e = events.next();
      … // do here your event handling
     }
  }
}

In JCR the creation of an observation listener is straight forward, also the event listening. The observation process is tied to the lifetime of the session, which is started and stopped at activation/deactivation of this sling service. This kind of implementation is a common pattern.
Note: Be aware of the implications of using an admin session!

  • You can read and write everywhere in the repository
  • You work with elevated rights you probably never need.

So try to avoid an adminSession. In th example above is use a session owned by the user “author” for this via impersonation.

A sling implementation of the same could look like this:

@Component (immediate = true)
@Service()
@Property (name = "event.topics", value = "/org/apache/sling/api/resource/Resource/*")
class Listener implements EventHandler {

  public void handleEvent (Event event) {
    // handle
  }
}

You see, that we don’t have to care about registering a listener and managing a session. Instead we just subscribe to some events emitted by the Sling Eventing framework. This framework is essentially implementing the OSGI event specification, and therefor you can also subscribing the very same way to various other event topics. You can check the “Event” tab in the Felix console, where you can see a list of events, which just happened in your CQ.

Some example for topics:

One disadvantage of Sling eventing is, that you cannot restrict the resource events to a certain path or user. Instead Sling registers with an admin session and publishes all observation events starting at the root node of the repository (/). You should filter quite early in your handleEvent routine for the events you are really interested in.

But: With these events you don’t geld hold of a resource. You always have you create them by this pattern:

if (event.getProperty(SlingConstants.PROPERTY_PATH) != null) {
  Resource resource = resourceResolver.getResource(event.getPath()); 
}

Also there you need to take care, that you’re using an appropriate ResourceResolver (try to avoid an admin resource resolver as well).

Also JCR observation has some benefits, if you want or need to limit the events you receive more granularly, for example by listening to changes of specific nodeTypes only or mixins.

So, as a conclusion: If you need to restrict the amount of repository events you want to process directly on API level, JCR observation is the right thing for you. Otherwise go for Sling Eventing, as it offers you also the possibility to receive many more types of events. For non-repository events it’s the only way to listen for them.