Today the question was raised, if TarMK running on NAS is a good idea. The short answer is: „No, it’s not a good idea“.
The long answer: The TarMK relies on the ability of the operating system to map files into memory (using the so-called memory-mapped technology, short: mmap; see this wikipedia page on it). Oak does this for the heavily used parts of the TarMK to increase performance, because then these parts don’t need to be read from filesystem again, but are rather always available in memory (which is by at least an order of magnitude faster). This works well with a local filesystem, where the operating system knows about every change happening on the filesystem, because it is the only one through which access to this filesystem happens, and it can make sure, that the content of the file on disk and in memory are in sync. I should also mention, that this memory isn’t part of the heap of the JVM, but rather the free RAM of the system is used for this purpose.
With a NAS the situation is different. A NAS is designed to be accessed by multiple systems in parallel without the need to synchronize between each other. The 2 most common filesystems for this are NFS and SMB/CIFS. On NFS one system can open a file and is not aware that a second system modifies in the same time. This is a design decision which prevents that a system can keep the content of a file on NFS and in-memory in sync. Thus mmap is not usable when you use a NAS to store your TarMK files.
And because mmap is not usable, you’ll get a huge performance impact compared to a local filesystem where mmap can be used. And then I haven’t even mentioned the limited bandwidth and higher latency of a remote storage compared to local storage.
If you migrate from CRX 2.x (till AEM 5.6.1) this problem was not as visible as it is now with Oak, because there was the BundleCache, which cached data already read from disk; this bundle cache is an in-memory, in-heap structure and you had to adjust the heap size for it. CRX 2.x did not use mmap.
But Oak does not have this in-memory cache any more, but relies on the mmap() feature of the operating system to keep the often-accessed parts of the filesystem (the TarMK) in memory. And that’s the reason why should leverage mmap as much as possible and therefor avoid a NAS for TarMK.
Thanks for writing this Jorg !! Very helpful
-Lokesh
http://adobeaemclub.com
Hi Jörg,
thanks for bringing this topic up… I share your opinion for NFS and CIDS/SMB…
I did not try this myself… but nmap() should work pretty well if your NAS is mounted as an iSCSI device.
Of course if you use a shared NAS, then performance may become unpredictable, if multiple users set your NAS under heavy fire…
-achim
Hi Achim,
thanks for your comment.
When you talk about iSCSI, you talk about a SAN. A typical storage box (like Netapp, EMC, … sell them) offers both, but only a SAN uses iSCSI as underlying protocol. A NAS uses NFS or SMB/CIFS as protocols.
I just posted about this: https://cqdump.wordpress.com/2016/02/24/tarmk-and-san/
Jörg
iSCSI must not necessarily be provided by a SAN – though this is the most common case. Modern NAS Systems also provide iSCSI (have a look here: http://files.qnap.com/news/pressresource/product/How_to_set_up_the_QNAP_Turbo_NAS_as_an_iSCSI_storage_for_Microsoft_Hyper-V_and_as_an_ISOs_repository.pdf) – even consumer level devices. In the end it’s just a protocoll anyone is free to implement, right. I have seen appliances from Qnap and Synology offering options to serve as iSCSI device. Though i belive, that a local disk probably will provide a better performance. AEM -in my experience- is I/O bounded. Disk performance is one crucial performance factor – so i won’t make compromises here. So this should be seen as a hypothetically option – though i think it might have it’s uses
Hi Jörg,
did some research – your definition is – according to wikipedia – right:
“In a NAS solution the storage devices are directly connected to a “NAS-Server” that makes the storage available at a file-level to the other computers across the LAN. In a SAN solution the storage is made available via a server or other dedicated piece of hardware at a lower “block-level” (…) . One way to loosely conceptualize the difference between a NAS and a SAN is that NAS appears to the client OS (operating system) as a file server (the client can map network drives to shares on that server) whereas a disk available through a SAN still appears to the client OS as a disk, visible in disk and volume management utilities (along with client’s local disks), and available to be formatted with a file system and mounted..”
I always saw the difference is single device versus network device – which according to the definition above – is not accurate. My bad.
Keep posting 🙂
-achim
Thanks Jörg!
We were planning on moving our data stores to Amazon EFS from S3, when it became available, but will now be moving them to EBS thanks to this post.
I suspect you have saved me much pain.