Increasing System Availability with Local Recovery based on Fault Localization

Hasan Sözer, Rui Abreu, Mehmet Aksit, Arjan J.C. van Gemund

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    2 Citations (Scopus)
    68 Downloads (Pure)

    Abstract

    Due to the fact that software systems cannot be tested exhaustively, software systems must cope with residual defects at run-time. Local recovery is an approach for recovering from errors, in which only the defective parts of the system are recovered while the other parts are kept operational. To be efficient, local recovery must be aware of which component is at fault. In this paper, we combine a fault localization technique (spectrum-based fault localization, SFL) with local recovery techniques to achieve fully autonomous fault detection, isolation, and recovery. A framework is used for decomposing the system into separate units that can be recovered in isolation, while SFL is used for monitoring the activities of these units and diagnose the faulty one whenever an error is detected. We have applied our approach to MPlayer, a large open-source software. We have observed that SFL can increase the system availability by 23.4% on average.
    Original languageUndefined
    Title of host publicationProceedings of the 10th International Conference on Quality Software, QSIC 2010
    Place of PublicationUSA
    PublisherIEEE Computer Society
    Pages276-281
    Number of pages6
    ISBN (Print)978-1-4244-8078-4
    DOIs
    Publication statusPublished - Jul 2010
    Event10th International Conference on Quality Software, QSIC 2010 - Zhangjiajie, China
    Duration: 14 Jul 201015 Jul 2010

    Publication series

    Name
    PublisherIEEE Computer Society
    ISSN (Print)1550-6002

    Conference

    Conference10th International Conference on Quality Software, QSIC 2010
    CountryChina
    CityZhangjiajie
    Period14/07/1015/07/10

    Keywords

    • IR-75776
    • METIS-276761
    • fault localization
    • EWI-19408
    • Fault Tolerance
    • Availability
    • Recovery

    Cite this