Towards a dependable homogeneous many-processor system-on-chip

X. Zhang

    Research output: ThesisPhD Thesis - Research UT, graduation UT

    164 Downloads (Pure)


    Nowadays, dependable computing systems are widely required in mission critical and human-life critical applications. While the advance in CMOS technology enables smaller and faster circuits, the dependability of modern ICs has worsened as a result of the shrinking dimensions of MOS transistors and the increasing complexity of semiconductor devices. For those very complex SoC with many processor cores, dependability enhancement approaches are especially important. In this thesis we first examine the important attributes of a dependable MPSoC. We then explore the possible approaches to enhance these attributes. The cost of the chosen dependability approach in terms of performance and resource ( silicon area/energy ) overhead are evaluated. The proposed dependability approach is implemented in silicon and its effectiveness is assessed using experiments and actual measurement results. In the scope of this thesis, the dependability of an MPSoC is defined as its ability to deliver expected services under given conditions. Three important dependability attributes being reliability, availability and maintainability are identified. Reliability denotes the probability that the MPSoC will fail after a certain period of time. For an MPSoC, maintainability refers to the isolation/bypass of faulty components and reconfiguration of the fault-free spare parts to maintain its functionality. Availability denotes the readiness of the MPSoC to provide correct service. The reliability of an MPSoC can be improved by using processor cores as spare. Theoretically, system reliability greatly increases as more cores are used as spares. At the same time, the area overhead for reliability enhancement also increases. Maintainability can be realized by incorporating fault detection and self-repair features into an MPSoC. By dynamically detecting faults and reconfiguring the system to circumvent them, the system can be regarded as functionally correct. with a possible drop in performance. The time spent for fault detection and system repair is combined as system down time. Faster fault detection and repair operations will decrease system down time and enable a highly available MPSoC. The dependability approach proposed in this thesis involves test aiming stuck at faults performed at the processor core level at application run-time. Once detected, faulty resources can be isolated by the so-called resource management software and core-level system repair can be performed by means of resource reconfiguration. In order to validate the feasibility of our dependability approach, a homogeneous MPSoC platform with multiple Xentium processing cores was adopted as the vehicle of our experiment. A stand-alone infrastructural IP block, namely the Dependability Manager (DM), has been designed and integrated into the MPSoC platform. The DM can generate the test vectors for the Xentium cores, broadcast them via a Network-on-Chip and then collect the test responses from the core sunder test. Since the cores under test have identical architecture, a faulty core can be detected by majority-voting the test responses. Dedicated test wrappers and NoC (reused as a TAM) were included into the platform MPSoC as well. A modified scan-based test scheme was used for a back-pressure style test data flow control by pausing and resuming the test data in the NoC. The MPSoC platform was fabricated as a Reconfigurable Fabric Device using UMC 90nm CMOS technology. The dependability overhead in terms of silicon area is about 1%. Experimental results show that the dependability test can be carried out at application run-time without interrupting the function of other applications. The inclusion of the DM into the RFD makes it a maintainable MPSoC with very short stuck-at and memory fault detection time (21ms) and reasonable MDT (hundreds of milliseconds). In conclusion, our proposed dependability approach and dependability test methods have proven to be feasible and efficient. The successful integration of the DM into the RF D and its correct operation indicate that our dependability approach can be applied to other homogeneous MPSoC platforms for dependability improvement.
    Original languageUndefined
    Awarding Institution
    • University of Twente
    • Smit, Gerardus Johannes Maria, Supervisor
    • Kerkhoff, Hans Gerard, Advisor
    Thesis sponsors
    Award date30 Oct 2014
    Place of PublicationEnschede
    Print ISBNs978-90-365-3772-8
    Publication statusPublished - 30 Oct 2014


    • METIS-306813
    • EC Grant Agreement nr.: FP7/ICT-215881
    • IR-92887
    • EWI-25145
    • Homogeneous MPSoC
    • CAES-TDT: Testable Design and Test
    • Many-Processor System-on-Chip
    • Dependable computing system

    Cite this