Sandeep asked in a comment to the previous posting:
Even if your sessions are short, and you have made a call to session.refresh(true), it is possible that some one made a change before you did a session.save(), right? So, what is the best practice in dealing with such a scenario?
Keep refreshing (session.save(true)) in a loop, until your session.save() is successful or until you hit an assumed maximum number of attempts limit?
Or is there any other recommended best practice?
That’s a good question, but also a question with no satisfying answer. Whenever you want to modify nodes in the repository, there’s a change that the same nodes are changed in parallel, even you change only a single node or property. In reality, this rarely happens. Most features in AEM are built in way, that each step (or each workflow instance, each Sling job, each replication event, etc) has its own nodes to operate upon. So concurrency must not provoke such a situation, that multiple concurrent operations compete for writes on a single node.
So from a coding perspective it should possible to avoid such situations. Not only because of this kind of problems, but also because of performance and debugging.
Something you cannot deal with in this way are author changes. If 2 authors decide to change a page at the same time, it’s likely that they screw up the page. You can hardly avoid that just using code. But if you cannot guarantee from a work organization point of view, that no 2 persons work at the same page at the same time, teach your authors to use the „lock“ feature. I basically prevents other authors from making changes temporarily. But according to the Oak documentation it isn’t suited to be used as short-living locks (in a database sense), but rather longer-living locks (author locks a page to prevent other authors from editing it).
So, to come a conclusion to Sandeeps question: It depends. If you designed your application carefully, you should rarely come into such situations, that you compete with multiple threads for a node. But whenever it occurs it should be considered as a bug, analyzed and then get fixed.
But there can be other cases, where this approach could make sense. In any case I would retry a few times (e.g. 10) and then break the operation with a meaningful log message. But I don’t think that it’s good to retry indefinitely.