The increasing size and complexity of software systems has led to an amplified number of potential failures and as such makes it harder to ensure software reliability. Since it is usually hard to prevent all the failures, fault tolerance techniques have become more important. An essential element of fault tolerance is the recovery from failures. Local recovery is an effective approach whereby only the erroneous parts of the system are recovered while the other parts remain available. For achieving local recovery, the architecture needs to be decomposed into separate units that can be recovered in isolation. Usually, there are many different alternative ways to decompose the system into recoverable units. It appears that each of these decomposition alternatives performs differently with respect to availability and performance metrics. We propose a systematic approach dedicated to optimizing the decomposition of software architecture for local recovery. The approach provides systematic guidelines to depict the design space of the possible decomposition alternatives, to reduce the design space with respect to domain and stakeholder constraints and to balance the feasible alternatives with respect to availability and performance. The approach is supported by an integrated set of tools and illustrated for the open-source MPlayer software.
- Software Architecture Design
- Software architecture designFault toleranceLocal recovery Availability Performance
- Local recovery
- Fault Tolerance