As a bit of a numbers junkie, I like to look at the stats and see what they tell me. As the old saying goes, if it is too good to be true, then it probably is. So with that in mind, what do we do when those numbers begin to look suspect? The answer is that we may need a way to verify against another set of information.
Case Closed and Reopened
Here is an example:
Each month the leadership team within IT receives the metrics for the previous month. One item of interest is the success rate of the previous month’s changes. It indicates that 250 changes were attempted with a 100% success rate – fantastic, right?
Immediately the skeptic within me is beginning to ask questions. Let’s assume for this example’s sake that last month we actually had 3 failed changes which resulted in a total of 10 reported incidents. This is where we tie in the other service management processes.
From an incident management standpoint, are we relating all these issues to a change record? We need to be able to correlate the failed change and the incident together. You might be thinking to yourself that this may be easy to do in obvious situations or if the issues present themselves right after the change, but what about in cases where the issue happens well after the change is implemented? When staff attempt to tie the incident back to a change, it can become challenging as the description of changes may be cryptic or not reflect the issues seen by the business.
It is in these seemingly unrelated issues that problem management should be facilitating the incidents that have no “apparent” root cause. It is entirely possible that the person who has implemented the change and has completed the implementation and testing to believe that this was completed without issue. Because of that they have likely closed off the change as successful, as nothing has been shown otherwise. Nothing wrong with that, but once this new information surfaces, the closure code should be updated to reflect the change in status. In this case ‘caused issues,’ or something to that effect.
That was when the numbers look too good to be true; however in reality we should have a checkpoint to ensure that this doesn’t happen in the first place. The reason we want to avoid having numbers which are too good to be true is that your stakeholders, whether they are customers or IT, will start to question all statistics and their validity from that point forward.
In my experience, having a regular (weekly) meeting with your service management functionality and IT stakeholders reduces any inconsistencies in numbers, and gets alignment on the activities that are happening in your environments for a particular time frame.
This activity will allow people who are responsible for incidents to see what changes are scheduled in the next week and ask appropriate questions on what may cause customers to call in with any potential issues. It will also allow your staff who manage problems to determine if there are any new problems they should be focusing on. In some cases we may even need to evaluate the current priority of problems under investigation. This type of dialog would allow the change manager in this example to be able to identify if any changes which happened in the previous week had inadvertently caused issues which could be avoided next time.
Performing these checkpoints will allow your teams to be more confident that, when the numbers are this good, they actually are.
For more brilliant insights, check out Ryan’s blog: Service Management Journey