(Note: This post is not about getting content from environment A to B or from your AEM 6.5 to AEM CS.)
The requirements towards content and component structure evolve over time; the components which you started initially with might not be sufficient anymore. For that reasons the the components will evolve, they need new properties, or components need to be added/removed/merged, and that must be reflected in the content as well. Something which is possible to do manually, but which will take too much work and is too error-prone. Automation for the rescue.
I already came across a few of those “automated content migrations”, and I have found a few patterns which don’t work. But before I start with them, let me briefly cover the one pattern, which works very well.
The working approach
The only working approach is a workflow, which is invoked on small-ish subtrees of your content. It skips silently over content which does not need to be migrated, and reports every situation which got migrated. It might even have a dry-run mode, which just reports everything it would change. This approach has a few advantages:
- It will be invoked intentionally on author only, and only operates a single, well-defined subtree of content. It logs all changes it does.
- It does not automatically activate every change it has done, but requires activation as a dedicated second step. This allows to validate the changes and activate it only then.
- If it fails, it can repeatedly get invoked on the same content, and continue from were it has left.
- It’s a workflow, with the guarantees of a workflow. It cannot time out as a request can do, but will complete eventually. You can either log the migration output or store it as dedicated content/node/binary data somewhere. You know when a subtree is migrated and you can prove that it’s completed.
Of course this is not something you can simply do, but it requires some planning in both designing, coding and the execution of the content migration.
Now, let’s face the few things which don’t work.
Non-working approach 1: Changing content on the fly
I have seen page rendering code, which tries to modify the content it is operating on, removing old properties, adding new properties either with default values and other values.
This approach can work, but only if the user has write permissions on the content. As this migration happens at the first time the rendering is initiated with write permissions (normally by a regular editor on the authoring system), it will fail in every other situation (e.g on publish if the merging conditions exist there as well). And you will have a non-cool mix of page rendering and content-fixup code in your components.
This is a very optimistic approach, over which you don’t have any control, and for that reason you probably can never remove that fixup code, because you never know if all content has already been changed.
Non-working approach 2: Let’s do it on startup
Admitted, I have seen this only once. But it was a weird thing, because a migration OSGI service was created, which executed the content migration in its activate() method. And we came across it because this activate delayed the entire startup to a situation, which caused our automation to run into a timeout, because we don’t expect a startup of an AEM instance to take 30+ minutes.
Which is also its biggest problem and which makes it unusable: You don’t have any control over this process, it can be problematic in the case of clustered repositories (in AEM CS authoring) and even if the migration has already been completed, the check if there’s something to do can take quite long.
But hey, when you have it already implemented as service, it’s quite easy to migrate it to a workflow and then use the above recommended approach.
Let me know if you have found other cases of working or non-working approaches for content migration; but in my experience it’s always the best way to make this an explicit task, which can be planned, managed and properly executed. Everything else can work sometimes, but definitely with a less predictable outcome.
Hi Jorg,
My main tool for content migrations is the Groovy Console. We’re 8 or so years in on AEM, so there’s now a constant stream of updates to accommodate changes in components, business requirements, and content. I develop reporting and update scripts every couple of weeks these days. In most cases, the changes are one-offs, so I don’t need or want to deploy a service, and the code needs to be just good enough to get the job done. Same for reports — I rarely get asked the same question twice, and the answers often involve some tricky traversal of the JCR to tie things together, so the built-in reporting tools are a bad fit.
Some examples:
– Report, then find & replace all instances of a product name in various properties of various components on specific page types
– Inject a new component into existing pages (we don’t have editable templates in our existing websites yet)
– Update properties of existing components
In general, scripts run on Author, and replicate to Publishers if required — we don’t run updates directly on Publishers.
The Groovy console lets me interact with AEM in the same ways as deployed code — but interactively. It’s also a great way to learn the system, makes playing with the APIs a breeze. Its Web UI is a bit clunky, so I wrote an extension for VS Code that posts to the console’s back-end — now I get code completion, Git integration, AI assist, all the benefits of an IDE, but none of the build and deployment overhead.
I’d have a very hard time without this or similar tool.
Best,
– Val
Hi Val,
thanks for the feedback, really appreciate it. May I ask what kind of one-off reports you are requested to create, can you give a few samples?
And regarding the changes: Are these changes planned, and you are using the tools you have at hand? Or is it rather in situations where changing content is easier than to fix a code change which has unforseen consequences?
Thanks,
Jörg
Maybe 2 or 3 out of 10 script request are for reports, and many of those are precursors to an update request: how many things match these conditions? If it’s a small amount, we’ll make the changes manually, but if it’s a lot, it may be a candidate for an update script.
A few report examples:
The reports usually show paths to the items, usually supporting property values from the item or other related objects, sometimes hyperlinks to the parent page, sometimes URLs of the images so they display in the spreadsheet generated from the report….
I can crank these out pretty quickly, and have a great deal of freedom in how I format the results.
Re updates, some are planned (name changes, bulk imports, new or changed components) and some are last-minute “oh crap, save me!” jobs — about an even split.
We try to use the tools at hand, but they’re not always available. E.g. we still use static templates for most of both our sites, and have not found the time and will to convert them to editable templates. There are many places we’d benefit from experience or content fragments, but we built custom tools before those features rolled out. And often they wouldn’t help anyway; e.g. product names are sprinkled in titles, headlines, RTE properties, custom properties on components — none of which allow inserting a replaceable object even if we think in advance we may want to change it globally in the future.
I’m not (very) afraid of code changes, but not much of what I script can be fixed in component code, front-end or back-end. So it’s less about unforseen effects and mostly about automating what would be a tedious manual job for authors.