Low-Cost Heterogeneous Embedded Multiprocessor Architecture for Real-Time Stream Processing Applications

B.H.J. Dekens

    Research output: ThesisPhD Thesis - Research UT, graduation UT

    492 Downloads (Pure)


    SDR applications are often stream processing applications that are computationally intensive which results in a low throughput on homogeneous multi-core architectures and thus could benefit significantly from the use of stream processing accelerators. The integration of stream processing accelerators in an architecture is often facilitated by a NoC. Crossbars or mesh-based NoCs provide guaranteed throughput but tend to have unacceptably high hardware costs. We propose a low-cost heterogeneous multi-processor architecture for real-time stream processing applications together with dataflow models for real-time analysis. This architecture allows compositional temporal dataflow analysis based on independently characterized components. The proposed architecture contains a low-cost ring-shaped interconnect which provides all-to-all guaranteed throughput communication while being work-conserving. Furthermore, cost-effective integration of stream processing accelerators is enabled by combining two low-cost rings and using a small shell in each NI, thereby realizing credit-based hardware flow control for accelerators. To improve the utilization of stream processing accelerators, we propose a sharing approach to multiplex multiple real-time streams of data over accelerators. Data streams between tasks are transferred using our dual-ring interconnect. Software tasks communicate directly using our distributed software FIFO implementation while communication involving stream processing accelerators is handled by our hardware credit-based flow control. In order to reason about the worst-case behavior of our architecture, temporal dataflow models are constructed to obtain bounds on throughput and latency. Three case studies have been carried out to evaluate the hardware costs and performance of the proposed architecture. For these case studies, several instances of the proposed architecture have been implemented on a Xilinx Virtex-6 FPGA. We show that in our architecture the use of accelerators improves maximum throughput by 366% and sharing accelerators can reduce hardware costs over 63%. The results from our case studies show that our ring interconnect has a very small hardware cost and performs within the bounds derived by our dataflow analysis models. We conclude that a considerable reduction of hardware costs can be attained by replacing traditional interconnects by our dual communication ring interconnect. We also conclude that cost-effective shared accelerator integration can improve application performance which demonstrates the merit of our approach.
    Original languageEnglish
    QualificationDoctor of Philosophy
    Awarding Institution
    • University of Twente
    • Bekooij, Marco J.G., Supervisor
    Award date16 Oct 2015
    Place of PublicationEnschede
    Print ISBNs978-90-365-3915-9
    Publication statusPublished - 16 Oct 2015


    • IR-97333
    • Stream Processing
    • real-time processing
    • EWI-26354
    • Data flow
    • Data streaming
    • accelerator sharing
    • METIS-311923


    Dive into the research topics of 'Low-Cost Heterogeneous Embedded Multiprocessor Architecture for Real-Time Stream Processing Applications'. Together they form a unique fingerprint.

    Cite this