When you are long-time user of AEM 6.x (and even CQ5), you are probably familiar with the Asset Update workflow. The primary task of it is the extraction of metadata from the binary asset and the creation of (smaller) renditions for it. This workflow is normally executed on the AEM authoring instance.
But since the begin of this approach it is plagued with problems:
- The question of supported filetypes. Given the almost unlimited amount of file formats and their often proprietary implementation, it’s not always possible to perform these operations. In many cases, the support of these file types within Java is poor.
- Additionally, depending on the size and the type of the asset and the quality of the library which provides support for this filetype, the processing can be very time consuming and also consume a lot of heap. Imagine that you can want to create renditions of a TIFF file which has dimensions of 10k * 10k pixels (assuming that you have a 24bit resolution) this requires 300 megabyte of contininous heap to store an uncompressed version of it. You have to size the heap size accordingly, otherwise you will run out of memory (OOM).
- To avoid these issues, for many filetypes external tools like imagemagick were used, which both come with support of various image types (in many cases much better than the Java Image library), plus the ability not to blow the AEM process when the process fails (because imagemagick runs in a dedicated process). But also the capabilities of imagemagick are limited, and the support for more exotic (non-image) file types could be better.
- In all cases you need to size your hardware for a worst case scenario. For example you need to provision a lot of heap, if your authors might start to ingest large images. And you need to provision enough CPU to mitigate negative impacts on all other operations.
- Another big problem is the latency. Assuming that your asset is very large (it’s not uncommon to have assets larger than 1 Gigabyte), it takes time to copy the binary from the (remote) datastore to a location where the processing takes place. Even if you can transfer 100 MiB per second, it needs 10 seconds to have the file transferred to the local disk; normally this process runs through the AEM JVM, which is problematic in terms of heap usage, and also can cause performance problems. Not to mention code, which is not aware of the possible sizes and tries to load the complete stream into memory.
In AEM as a Cloud Service this is offloaded, and that’s what AssetCompute is for. It performs all these steps on its own; also not using imagemagick for image handling, but high quality and optimized routines which also power other Adobe products.
But what does that mean for you as developer for AEM as a Cloud Service? In the first place, it does not have any impact. But you should learn a few things from it:
- Do not create any renditions on your own, use assetCompute instead. This service is extensible (checkout Project Firefly), so you can do all kind of asset operations there. There is no need anymore to use the java image library code.
- Avoid streaming binary data through AEM. AEM as a Cloud Service itself (the JVM) should not be bothered with streaming binary data into and out of the JVM. If you want to upload files into AEM, you should use the aem-upload library.
In general, think twice before you open an InputStream in AEM (either via Rendition.getStream() or also via the JCR API). Normally you never know how much data is behind it, and for almost all transformation cases it makes sense to use AssetCompute to perform these.
2 thoughts on “AEM as a Cloud Service and the handling of binaries”
Jörg, I guess it should not be a problem to use https://github.com/adobe/aem-upload in the on-prem version of AEM 6.5?
I don’t think that it’s possible to use aem-upload for the on-prem version. There I would just use the “usual methods”, e.g. POSTing the binaries.
Comments are closed.