The magic of OSGI: track the come and go of services

Have you already asked yourself, how it comes, that you just need to implement an interface, mark the implementation as service, and — oh magic — your code is called at the right spot. Without any registration.
For example when you wrote your first sling servlet (oh sure, you did it!) it looked like this:

@Service
@Component
@Property (name=“sling.servlet.paths”, value=“/bin/myservlet”)
public class MyServlet extends SlingSafeMethodServlet  {
  ...
}

and that’s it. How does Sling (or OSGI or whoever) knows that there is a new servlet in the container and calls it, when you visit $Host/bin/myservlet. Or how can you build such a mechanism yourself?

Basically you just use the power of OSGI, which is capable to notify you about any change of the status of an existing bundle and service.

If you want to keep track of all Sling servlets registered in the system you just need to write some more annotations:

@Service
@Component(value=ServletTracker.class)
@Reference (name=“servlets”, policy=ReferencePolicy.DYNAMIC, cardinality=ReferenceCardinality.OPTIONAL_MULTIPLE, interfaceReference=Servlet.class)
public class ServletTracker {

List<Servlet> allServlets = new ArrayList<Servlet>();

protected bindServlets (Servlet servlet, final Map<String, Object> properties) {
  allServlets.add (servlet);
}

protected unbindServlets(Servlet servlet, final Map<String,Object> properties) {
  allServlets.remove(servlet);
}

public doSomethingUseful() {
  for (Servlet servlet : allServlets) {
    // do something useful with them ...
  }
}

(Of course you can track any other interface through which services are offered. But be aware, that in many cases only a single instance of a service exists.)

The magic mostly is in the @Reference annotation, which defines that there is optional reference, which takes zero till unlimited references of services implementing the class/interface “Servlet”. By default methods are called, for which the names are constructed using the “name” statement, resulting in “bindServlets” when a new servlet is registered, and “unbindServlets” when the servlet is unregistered. You can use these methods to whatever you want to do, for example storing the references locally and calling them whenever appropriate. And that’s it.

If you use this approach, your code is called whenever some service implementing a certain interface is being activated/deactivated. With the SCR annotations it’s all possible without having too much trouble and the best of all: Nearly all just by declaration.
If you like to have some more control over (or just want to code) you can use a ServiceTracker (a nice small example for it is the Apache Aries JMX Whiteboard) to keep manually track of services.

And as recommended reading right now: Learn all the other cool stuff at the Felix SCR page.

And if you need to have more code which uses this approach, you might want to have a look at the SlingPostServlet, which is a excellent example of using this pattern. Oh, and by the way: This pattern is called OSGI whiteboard pattern.

AEM 6.0 and Apache Oak: What has changed?

One of the key features of AEM6.0 on the technical side is the use of Apache Oak as a much more scalable repository. It supports the full semantic of JCR 2.0, so all CQ 5.x applications should continue to work. And as an extension of this feature, there is of course mongoDB, which you can use together with Oak.

But, as with ever major reimplementation, something has changed. Things, which worked well on Jackrabbit 2.x and CRX 2.x might behave differently. Or to put in other words: Jackrabbit 2.x allowed you to do some things, which are not mandated by the JCR 2.0 specification.

One of the most prominence examples for this is the visibility of changed nodes. In CRX 2.x when you have an open JCR session A, and in a different session B some nodes are changed, you will see these changes immediately in session A. That’s not mandated by the specification, but Jackrabbit supports it.

Oak introduced the concept of MVCC (multi version concurrency control), which makes that each session only sees a view of the repository, which has been the most recent one the session has been created, but it’s not updated on-the-fly with the changes performed by other sessions. So this is a static view. If you want to get the most recent view of the repository, you need to call explicitly “session.refresh()”.

So, what’s the effect of this?
You may run into subtle inconsistencies, because you don’t see changes performed by others in your session. In most cases, only long-running sessions are really affected by this, because for them it’s often intended to react on changes from the outside, and that you can react on changes made by other threads (e.g. you can check if a certain node has already been created by another session). So if you already have followed the best practices established in the last 1-2 years, you should be fine, as long-running sessions have been discouraged. I also already showed, how such a long-running session might affect performance when used in a service context.

Oak supports you with some more “features” to spot such problems more easily. First, it prints a warning to the log, when a session is open for more than 1 minute. You can check the log and review the use of this sessions. A session being open more than 1 minute is normally a clear sign, that something’s wrong and that you should think about creating sessions with a smaller lifespan. On the other hand you can imagine also cases, where a session open for some more time is the right solution. So you need to carefully evaluate each warning.
And as second “feature”, Oak is able to propagate changes between sessions, if these changes are performed by a single thread (and only by a single thread).
But consider these features (especially the change propagation) as transient features, which won’t be supported forever.

This is one of the interesting parts of the changes in Apache Oak compared to Jackrabbit 2.x, you can find some more in in the Jackrabbit/OAK Wiki. It’s really worth to have a look at when you start with your first AEM 6.0 project.

AEM 6.0: Admin sessions

With AEM6.0 comes a small feature, which you should use to reconsider your usage of sessions, especially the use of admin sessions in your OSGI services.

The feature is: “ResourceResolverFactory.getAdministrativeResourceResolver” is going to be deprecated!

Oh wait, that should be a feature, you might ask. Yes, it is. Because it is being replaced by a feature, which allows you to easily replace the sessions, previously owned by the admin (for the sake of laziness of the developer …) by sessions owned by regular users. Users which don’t have the super-power of admin, but regular users, which have to follow the ACLs as any other regular user.

A nice description how it works can be found on the Apache Sling website.

But how do you use it?

First, define what user should be used by your service. Specify this in the form “symbolic-bundle-name:sub service-name=user” in the config of the ServiceUserMapper service.
Then there 2 extensions to existing services, which leverage this setting:

ResourceResolverFactory.getServiceResourceResolver(authenticationInfo) returns a ResourceResolver created for the user defined in the ServiceUserMapper for the containing bundle (you can specify the sub service in the authenticationInfo if required).

And the SlingRepository service has an additional method loginService(subserviceName, workspace), which returns you a session using this user.

But then this leaves you with the really big task: What permissions should my service user have then? read/create/modify/delete on the root node? But that’s something you can delegate to the people who are doing the user management …

Update 1: Sune asked if you need to specify a password. Of course not 🙂 Such a requirement would render the complete approach redundant.

Meta: AEM 6.0

I am a bit behind the official announcement of AEM 6.0 (Adobe TV, docs ), also some of my colleagues have taken the lead and already started posting about the major new technical features, Apache Oak (including MongoDB) and Sightly. My colleague Jayna Kandathil offers a nice overview of the technical news.

I will focus on the smaller changes in the stack, and there’s a vast number of it. So stay tuned, I hope to find some quiet moments to blog in the next 2 weeks.

Is “fresh content” a myth?

n the past I often had the requirement, that content must be absolute up to date when it’s delivered to the enduser. It always has to reflect the approved status on the servers, and it must not be outdated. When new content is pushed to the publish instances it has be picked up immediately and every response should return this update content now.
That’s sometimes (and only sometimes) a valid requirement, but the implementation is something you can hardly control.

For example: When your editor presses the “publish” (or “activate”) button in AEM, it’s starting an internal process, which is taking care of packing the content and delivering it to the publish instances, where the content is unpacked. And then you need to invalidate any fronted cache (dispatcher cache) sitting in front of the publish instance.
That’s infrastructure you can control, and with proper setup and monitoring you can achieve a reliable delay of a few seconds between pressing the button and the new content being delivered out of your web servers.

And then comes the internet. Huge caching layers (Content delivery networks) are absorbing lots of requests to reduce internet traffic. Company proxies caching favorite sites of its users to save money. You will find caches all over the place, to reduce costs and improve the speed. Sometimes for the cost of delivering outdated content. And not to forget the browser, which also tries to reduce page loading time by eliminating requests and leveraging a cache on local disk.

They do it for a single reason: You are normally more frustrated by a slow page than by reading a version of the news page, which doesn’t show the very most recent version published 1 second ago (and of course you don’t know that this page was just updated …)

Caching is a tool you should always use to provide a good experience. And in many times the experience is better if you deliver a fast, nearly up-to-date site (with news being 30 seconds old) than a slow site, showing also the breaking news which happened 1 second ago.
If this delay is reasonable small (in many cases 5 minutes is perfect), noone will ever mention it (besides the editor who pressed the button), but you can improve your caching and the experience of the enduser.

JCR Observation throttle

With JCR observation any JCR repository offers a really nice feature to react on any changes in the repository. Because the mechanism is quite powerful and easy to use, it has found many adopters which work with it.

If you have worked with that feature and also imported large amount of content, you might also encounter the problem that there is a delay between persisting a change in the repository and the time, when the observation event for this change is fired. This delay is mainly based on the number of registered observation handlers and the time these handlers require to process the event (the whole JCR observation events are handled by a single thread). If it takes 100ms to run through all handlers and you persist 10k changes in 2 minutes, it will pill up and take about 20 minutes until the queue is empty again.

This delay may harm your application and user experience, so it’s advisable to

improve the processing speed of any JCR event handler
keep the queue small.

Especially if you have jobs, which are not time critical but might cause a storm of events (e.g. data importers), you should be aware of this and add artificial pauses to your job, so the observation event queue will not get too large.

A simple tool to help has been added to CQ/AEM 5.6.1, it’s called JCR Observation throttle (see http://dev.day.com/docs/en/cq/current/javadoc/com/day/cq/commons/jcr/JcrObservationThrottle.html)

It allows you to wait until all pending events in the observation queue have been delivered. This helps you to wait for quiet moments and then start your data import. But be careful, the delay might be very long, if other processes constantly interfere.

(N)one click deployment

Last week I attended the AEMHub conference in London and I really loved it. Lot of nice people, interesting talks (and chats) and inspiring presentations. And Cognifide did a great job in organizing this. Thanks folks, especially to Juliette and Kimberly!

I also held a presentation called “(N)one click deployment”. It focussed to the point, that IT operation staff should not be held responsible for the automation of operation processes (for many reasons, like insufficient time, insufficient skills and sometimes even the lack of motivation). But instead, developers are by nature creators of automation, because programming is just automating steps to perform a task.

Additionally the features you might consider as natural tools for automating CQ maintenance or deployment procedures are just build blocks, not tools. When you use curl to automate such processes you have to care about error handling and reporting. Which can get pretty complicated, when you have to parse server responses to determine the right status, and your only tool is the Unix shell.

So in the end you’re better off when you use a programming language, which offers more feature than shell and makes things easier to build, test and debug. So if you are an operation guy which focusses on automating AEM deployments and maintenance tasks, don’t focus too much on handling too much external, but motivate the developers (and probably also the vendor :-)) to include more sophisticated building blocks to the application or the product itself, so your job is getting easier.

You can find my slidedeck on the offical AEMHub slideshare page.

Meta: New blog layout

When I started this blog back in December 2008 I really didn’t care that much for the design of the blog, and I simply took the Kubrick theme. Which is just very simple and straight style with a deep blue header.Now we are in 2014 and the times have changed and me as well. It’s time for some cleanup and adjustments. So today I changed the style of this to something more modern and also added a twitter image to my sidebar. And the comment function is now on top of a posting and no longer on the bottom. But that should be all.

If you are reading this blog through a feedreader you probably don’t see any change at all. But that’s fine then 🙂

Using curl to install CQ packages — and why it isn’t a good idea

CQ5 has a very good tradition of exposing APIs, which are accessibly by simple HTTP (in some cases even RESTful APIs). This allows one to access data and processes from external and is also a good starting point for automation.

Especially the Package Manager API is well documented, so most steps to automate deployment steps really start here. A typical shell script to automate package installation might look like this:

ZIP=directory/to/the/package-1.0.zip
FILENAME=`basename $ZIP`
CURL=curl -u admin:admin
HOST=http://localhost:4502/crx/packmgr/service/.json

$CURL -s -F package=@ZIP -F force=true $HOST?cmd=upload
if test $? -ne 0;
  echo “failed on upload"
fi
$CURL -X POST -s $HOST/etc/packages/mypackages/$FILENAME?cmd=install
if test $? -ne 0;
  echo “failed on install"
fi

As you see, it lacks any kind of sophisticated error handling. Introducing a good error handling is not convenient, as curl doesn’t return the HTTP status as a return code (but just if the request itself has been performed successfully), so you have a parse the complete output to decide if the server side returned a HTTP 200 or something else. Any non-seasoned shell-script-developer will probably just omit this part and hope for the best …

And even then: When your package installation throw an error during deserialization of a node (maybe you have a typo in one of your hand-crafted .content.xml files), the system still returns a http 200 code. Which of course it shouldn’t.
(The reason for the 200 is, that the code streams the installation progress on each node, and that the decision for the status code has to be done before all nodes are imported into the repository. Therefor the need for an internal status, which is one of the last lines of the result. Thanks Toby for this missing piece!)

And of course we still lack the checks if embedded bundles are starting up correctly …

So whenever you do such a light-weight deployment automation, be aware of the limits of it. Good error handling, especially if the errors are inlined in some output, was never a primary focus of shell scripting, and most of the automation scripts I’ve seen in the wild (and written myself, to be honest) never really cared about it.
But if you want to have it automated, it must be reliable. So you can focus on your other work, and not on checking deployment process logs for obvious and non-obvious errors.

On AEMHub I will talk about the importance of such tools and why developers should care about such operation topics. And I hope, that I can present the foundations of a small project aimed for proper CQ deployment automation.

Rewrapping CQ quickstart to include your own packages

CQ quickstart is a cool technology to ease the setup of CQ installations; although it’s not a perfect tool for server installations, it’s a perfect to developers to re-install a local CQ development environment or for any kind of demo installation.But a out-of-the-box installation is still a out-of-the-box installation, it doesn’t contain hot fixes or any application bundles, it’s just a raw CQ. In this posting of 2010 I described a way how you can leverage the install directory of CRX to deploy packages and how you can package it for distribution. It’s a bit clumsy, as it requires manual work or extra steps to automate it.

In this post I want to show you a way how you can rebuild a CQ quick start installation including extra packages. And on top of that, you can do it as part of your maven build process, which anyone can execute!

The basic idea is to put all artifacts into a maven repository (e.g. Nexus), so we address it via maven. And then use the maven-assembly-plugin to repackage the quick start file.

Required Steps:

Put your CQ quickstart into your maven repository, so you can reference it. The can freely choose the name, for our example let’s use groupId=com.adobe.aem.quickstart, artifactId=aem-quickstart, version=5.6.1, packaging=zip. For this example you can also just put the file in your local m2 archive: ~/m2/repository/com/adobe/aem/quickstart/aem-quickstart/5.6.1/aem-quickstart-5.6.1.zip
Update your pom.xml file and add dependencies to both maven-quickstart-plugin, the aem-quickstart artifact and an additional content package you create during your build (e.g. com.yourcompany.cq5.contentpackage:your-contentpackage).

Extend your pom.xml by this definition

<plugins>
  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-assembly-plugin</artifactId>
    <executions>
      <execution>
        <id>quickstart-repackage<id>
        <configuration>
          <finalname>quickstart-repackaged</finalname>
          <descriptors>
            <descriptor>src/assembly/quickstart-repackaged.xml</descriptor>
          </descriptors>
          <appendAssemblyId>false</appendAssemblyId>
        </configuration>
        <phase>package</phase>
        <goals>
          <goal>single</goal>
        </goals>
      <execution>
    <executions>
  <plugin>
</plugins>

The magic lies in the descriptor file, which I placed in src/assembly (which is just as good as any other location …).

This file src/assembly/quickstart-repackaged.xml can look like this:

<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2http://maven.apache.org/xsd/assembly-1.1.2.xsd”>

<id>bin</id>
<formats>
<format>jar</format>
</formats>

<includeBaseDirectory>false</includeBaseDirectory>
<dependencySets>
  <dependencySet>
    <outputDirectory>/</outputDirectory>
    <unpack>true</unpack>
    <includes>
      <include>com.adobe.aem.quickstart:aem-quickstart</include>
    <includes>
  </includes>
  <dependencySet>
    <dependencySet>
      <outputDirectory>/static/install</outputDirectory>
      <includes>
        <include>com.yourcompany.cq.contentpackage:your-contentpackage</include>
    </includes>
  </dependencySet>
</dependencySets

This descriptor tells the plugin to unpack the quickstart file and then add your your-contentpackage to the static/install folder; from this folder CQ also bootstraps packages during the startup. After this file has been added, the file is repackaged as jar file with the name “quickstart-repackaged” (taken from the pom.xml file).

Invoke maven with the package goal

If you take this route, you’ll have a fantastic way to automatically build your own flavour of quickstart files. Just download the latest version from your Jenkins server, double-click and — voila — you have a full-fledged up-to-date demonstration or testing instance up and running. And as soon as you have all required ingredients on your nexus, everyone can build such quickstart variants, as it only requires a working maven setup.