Reporting application problems

(I write this blog article with a certain background: In the Daycare ticketing system the support often needs to ask for additional information to start with an initial analysis of the issue report. This is a time-consuming task and increases the time-to-fix. So this a help for my colleagues at the Day support, but it should be also applicable for the contact with most enterprise support lines.)

Writing a bugreport is hard work. Many issue reporter often thinks, that the people, who are responsible for the application itself, just try to refuse to fix a bug and therefor ask questions and demand information, which are hard to deliver and which are absolutly clear to you. That these people don’t want to admit that their product has issues.

But these questions can often be easily answered, if you are well prepared.

Usually developers (and the support people, who work as first line for them) ask the following questions:

What software are you using, which versions, which additional fixes?
People often assume that these informations are already known to the support (especially if you’re dealing with N enterprise support); but these support lines often don’t track the software versions of their customers; and who knows, maybe you report an issue with a new version, which is currently only installed on your development systems.

Providing these informations on the opening of an issue report as a default informations helps the support to provide a quicker help. There’s no need to deal further with version information and ask for installed hotfix versions. At least one round of question – answer less.

A point for all software developers: Provide an facility to get all these informations without hassling with the package databases or registry of your systems. Keep these information automatically up-to-date when installing additional packages, fixes or enhancements.

What’s the impact of the reported issue?
Provide the impact of the problem, so the support can estimate the importance of the issue. A report on a wrongly documented feature gets another priority on the developers todo list than an hourly crashing ERP system.
Also provide the audience, which is affected by the issue. A non-working feature, which is offered as a vital part of your website is clearly more important than the same feature, if it’s non-functional for a small group of people; because for the latter is probably more easy to provide a workaround.
(It will probably super-important if these small group is the management, but that’s another topic …)

When was the issue spotted first?
This information may help to correlate the issue with other events; often problems get visible only under certain circumstances, which are not present at the start. This may be a system update (operating system, JVM, database, …), changed settings in the applications itself or just a heavier use of the system (more data, more users, higher peak usage). All these factors may increase the possibility that certain, yet unknown and unspotted problems get visible and harm your application.
(That’s the background of the famous quote “never change a running system”.)

It’s your task as an issue reporter to provide these information to the support. This information helps the support to focus on the impact of such changes, which very often reduces the amount of investigation dramatically.

So for example if you recently have just updated your Sun JVM due to security reasons from 1.5.0.8 to 1.5.0.11 and suddenly encounter spurious crashes, the developers may focus on the changes introduced by this JVM change and analyse their impact the application. Without this information you probably have to go through a long and painful analysis phase, when developers ask for all kind of dumps, JVM instrumentation and so on.

Is the problem reproducible
The question in which a developer is most interested in. If an issue can be reproduced it can be fixed. Because the developer can analyze the issue, understand the problem and then solve it, all without too much trial and error just to see the problem.
If an issue cannot be reproduced, a lot of information are not known. So maybe the problem occurs under conditions, which are there on your special system or with your special data. Trying to reproduce the issue on any other system is hard or impossible.
So this is one of the most important task of an issue reporter: Trying to provide a reproducable test case. If you are able to reproduce the issue, describe all the prerequisites and the steps to actually reproduce it. Be it a step-by-step documentation or by a little screencast, any appropriate format is welcome.

In the case of Day CQ the basis for testcases the playground/geometrixx application of a plain CQ installation can be used. So just install a plain CQ and make as few changes as possible to reprouce the problem.

If you can reproduce your problem on a plain CQ installation, you make the task of fixing your issue much more easy for Day. Time consuming analysis and making assumptions on a lot of parameters can be avoided then, and the developers may head directly to the issue itself.

Often you cannot reproduce the issue you want to report; either be it, that you don’t know the issue exactly (“my system just crashes”), or you cannot reproduce the problem, because it’s specific to a certain environment (“the crash only happens under heavy load; we couldn’t reproduce this crash using stress tests yet”). Then you need to provide as much information as possible.

Additional informations
Attach all available information (ok, not really _all_ information; only the one, which sounds usable, e.g. logfiles containing application specific logs, system dumps, threadumps for java applications, …) to your issue report.

If some special information is missing, the support will ask for it. But if you provide a certain standard set of information (depending on your application), this will be sufficient in 90%.

For a Day CQ installations these informations are the followings:

  • error.log of CQ
  • error.log of CRX
  • in case of performance problems: request.log, garbage collection log)
  • in case of performance problems and system lockups: threaddumps
  • in case of performance problems and out-of-memory-exceptions: threaddumps, heapdumps

Conclusion

For all these questions there are good reason why they are asked. I hope I showed you some of the background to understand these reasons.

So providing the right information directly from the start will reduce the time until you get support, which actually helps you in resolving your issue; or it can at least try to provide useful tips, which may help to establish workarounds. So in the longterm it helps both you as an issue reporter the support.