One of the subjects that I deal with frequently is resiliency; specifically, the resiliency of technology solutions. But what does it mean to be resilient? Fundamentally, it means that a system or solution needs to be engineered with these goals in mind:
- The entire solution is designed to continue to function as normally as possible in the face of failure.
- When failures occur, they are invisible to the customer.
- If a failure must be visible to the customer, the solution provides the highest level of service possible (in other words, compartmentalize failures).
This sounds straight forward in theory but is rarely so in practice. Why? There are many contributing factors, and I’ll be dealing with these in detail in subsequent posts. Some are obvious: resiliency adds cost, implementation costs must be balanced with business value and time-to-market pressures, and the fact that future failures are much more abstract the current business needs. Despite these challenges, many organizations try to do the right thing by investing in the construction of resilient solutions that ultimately fail.
These scenarios are the ones that are particularly frustrating, leaving very knowledgeable technologists wondering why such a robust system failed. In such cases, the answers are usually much more subtle: complexity of systems lead to difficulty identifying failure modes, quantifying specific resiliency needs is rarely systematic, control plans are inadequate or absent leading to the development of new and unpredictable types of failures. In all these cases, if you’ve ended up in such a scenario, it’s difficult or impossible to even quantify the operational, reputation and financial risks posed to the business – you just don’t know what you don’t know.
On this site, I’ll discuss these and other quandaries that threaten the stability of critical enterprise infrastructure. Business no longer have the luxury of tolerating unreliable technology. Five to ten years ago, the internet and related technologies were seen as new and unique – the virtual “wild west”. Because these “enabling technologies” were viewed as somehow separate from the services and products that businesses provided, failure of the technology was not a direct reflection on the quality of the product or the capability of the provider. Now, those enabling technologies have faded into the background – they are no longer new and exotic. Customers expect mobile banking solutions on their cell phone to “just work”, just as land line telephone customers expect dial-tone or homeowners expect power from their electrical outlets. Failure of technology now equates to failure of the business.
Resiliency is the mechanism to ensure that our solutions meet these demands. Resiliency may not be easy, but it is necessary.