Recently, as part of a professional networking organization, I was tasked with scheduling a session for like-minded colleagues. The challenge, I soon found, was that while most people were good with Thursday, one service desk manager was adamant that he couldn’t make it. When I asked why, he said they are almost always overloaded with incidents on Thursday, so the likelihood of making it was next to nil. Intrigued by his ability to see into the future, I asked how he could foresee it with such certainty. He said the IT department had changed the day upon which the change advisory board (CAB) convened a few months back, and since then Thursdays have been a flurry of activity.
The background story was that CAB had a better attendance on Tuesday rather than Wednesday, which was when they used to have the CAB in the past. While he was telling me this, I found it interesting that a story about incidents had become a story about changes. And that got me to wonder: What does the cycle look like from the implementation of a change and then into a resulting incident?
As I was developing a better picture of what was happening, he also explained that changes could be implemented as early as 24 hours after CAB, which would be on Wednesday. When issues arise from these changes, the service desk starts to see them on Thursday—which correlates to the influx of incidents. As a result, depending on the circumstances, some changes are reverted to a pre-change state while others require an emergency change to be implemented to correct these issues.
This explanation raised some important questions:
- Was this not something that IT had considered?
- Was this issue not discussed at CAB?
- Is no one looking at correlating the incidents and changes?
- If the changes and incidents were looked at graphically, what would that look like?
My colleague said that they had no formal change manager and that the changes were reviewed strictly by committee in CAB. Another layer to this was that, since they were a busy organization, there was a high demand on getting changes implemented early in the week. In other words, everyone wanted to put in changes on Wednesday and no one was governing how changes were scheduled.
To me the issue seems obvious; however, sometimes we cannot see the issues in front of us when we are in this type of situation. In other cases we are not equipped to correct the issues.
My colleague outlined that one of the reasons that the incidents as a result of changes may not have been flagged as a major issue by IT was that, prior to the CAB date move, almost all the incidents seemed to appear on Mondays. Magically after the date was changed, all of the issues that seemed to occur on Monday all but stopped—only to move to Thursday. So in some strange way it looked like service delivery had improved… He also pointed out the number of incidents now is just more balanced, and this is why the issue isn’t pressed as hard from an IT perspective.
The trick here is that, from a user experience perspective, the issues are still abundant as outlined in the diagram below. If you were only to shuffle the days at the bottom axis around, the humps that represent changes and incidents remain.
“Why does no one see that this is an issue?” I asked.
The response was that IT reports on service management metrics (incident and change) in silos, and because of this, no correlation is made with the exception of what the service desk manager is experiencing from day to day.
That has to change, I offered. As a start, I would look at the following:
- The number of changes that cause incidents and/or are rolled back. In the case where changes are not updated to reflect any issues and we see a high rate of implementation success (which is likely, if the two teams are not working collaboratively), we need to see how many emergency changes we are creating and why. Even in a state where change is measured in a silo, we should see that a change caused an emergency change.
- The timeframe where something is reviewed in CAB and then implemented in production. It might be possible that the changes that are not successful are also the ones that are being implemented 24 hours after CAB. Knowing which changes are impacting the business in a negative way will allow us as an IT organization to better assess what is not working as well as it could be and build a strategy to improve it.
- Visibility and governance on the activities that are driving this service delivery. Having people in these functions collaborating and reviewing what needs improvement and what is working well will go a long way to continually improve service.
The end result is to ensure that the business is getting a good experience. To do that we need to manage and measure processes effectively, collaborate regularly with key stakeholders, and—most importantly—communicate.
For more brilliant insights, check out Ryan’s blog: Service Management Journey