The IT Skeptic is at it again, calling out one of the big head-scratchers of ITIL use. As it turns out, while “major incidents” are referenced in the most recent version of ITIL, they are never actually defined anywhere. So either you can use your imagination to decide what horror constitutes a major incident, or you can read Rob England’s thoughts on the matter.
Analyzing the Nightmare
England thinks that a major incident is an occasion where normal incident and problem management will not fix the problem. It is somewhere in severity between a “normal” incident and a “disaster.” But England prefers a more intuitive approach, saying that if a problem makes you go, “Oh sh*t!” then that is probably a major incident.
Handling these incidents ultimately boils down to case management, and if you want to properly set the case, you need to define the following:
- Policy: if people are making decisions on the fly give them principles, guidelines, rules, bounds, goals, inputs, and outputs.
- Roles and responsibilities: especially a Comms Manager and a Technical Manager, who work back to back – one faces outwards and one inwards. One of these could be the overall Major Incident Manager or it could be separate person.
- Procedures: comms plan, war-rooms, supplier mobilisation, RCA…
The immediate goal should be to restore service, not resolve the problem, though the two are of course closely related. If you can afford to have both an incident resolution team and a problem resolution team, definitely make it happen. And once practices for dealing with a major incident are in place, rehearse it. It is the critical last step that goes often forgotten.
For a more in-depth discussion, you can read England’s full blog post here: http://www.itskeptic.org/content/what-itsm-major-incident-itil-doesnt-say