Designing for Failure – Important Lessons

It sounds ridiculous, but designing for failure is an outstanding way to limit the impact of disruptions and catastrophic issues in your service and systems. Designing for failure is essentially the way forward with developing for the cloud. This article by Krishnan Subramanian explains the concept of designing for failure, key points to consider when developing in a cloud environment, and what the recent AWS outage means for cloud development as a whole. Subramanian explains, for instance, how distributing across multiple availability zones can help prevent a total shut-down of your cloud applications: If you are using AWS, distributing across multiple (3 or 4) Availability Zones (AZs) must work but some developers have complained that it didn’t help them. Ideally, spreading it across multiple AWS regions or, even better, multiple cloud providers will ensure the availability of your application. But this approach has issues related to costs going out of control and network latency. It is not possible to tap into multiple regions/cloud providers for all applications because of issues Global users and the expectation of high availability is making the idea of downtime a distant memory. System designers and specifiers need to start assuming that systems will fail, and design accordingly. The article provides insight into how organisations can build with that expectation – something that has become strongly apparent with the 2011 Amazon Web Services outage.

Show More

Leave a Reply


We use cookies on our website

We use cookies to give you the best user experience. Please confirm, if you accept our tracking cookies. You can also decline the tracking, so you can continue to visit our website without any data sent to third party services.