At the end of the cult hit movie Galaxy Quest, Tim Allen’s character activates the Omega-13 Device to travel back in time thirteen seconds to undo one critical mistake that would have cost the lives of himself and all of his allies. In IT, there is no such device, not even one that could grant us a measly thirteen seconds, so it is up to the organization to have damage control methods in place for when everything has gone wrong. Tom Olzak writes for the TechRepublic about the necessary steps that go into business continuity planning (BCP), which must include support for:
- System dependency mapping
- Maximum tolerable periods of disruption (MTPOD)
- Mean time to repair (MTTR)
- Recovery time objectives (RTO)
System dependency mapping is the act of charting all of the ways the various systems in your business interact, especially with regard to what one system requires from other systems in order to function. The final map is likely to be large and comprehensive, exactly what is required when dealing with BCP. The lack of MTPOD in BCP could stand to be much direr though, as MTPOD is the most time a business process can be inoperative before irreparable harm is caused to the business. Included within MTPOD is cycle time, which is the amount of time it takes to complete one iteration of a previously failed process from the time that the process first failed. Olzak believes that moving systems to the cloud offers the benefit of quarantining catastrophic event effects, and he adds:
Problems can arise when the failure is at the provider site. SLAs, sanctions, customer audits, and contractual obligations control and monitor the MTTR for S1 in our example. The reputation of the provider, supported by discussions with existing customers, is a good measure of the provider’s willingness and ability to recover within the expected MTTR. In any case, a provider that cannot recover within RTOs for affected business processes is likely not the right solution for your business.
MTTR by comparison applies to individual components and is the average time required to return a failed step in a process to normal operation. MTTR can be affected by the type of failure, availability of replacement parts, internal monitoring capabilities, and availability of key internal personnel, among other factors. RTO is the point at which failed devices must be operational, and it cannot exceed MTTR if further disaster is to be prevented.
Addressing each of the four areas that BCP must support to the necessary extent will not be an easy task, but it is necessary to survive jarring hiccups in the business processes. BCP is no Omega-13 Device, but it will have to suffice for now.