Low-Cost Heterogeneous Embedded Multiprocessor Architecture for Real-Time Stream Processing Applications

B.H.J. Dekens

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

118 Downloads (Pure)

Abstract

SDR applications are often stream processing applications that are computationally intensive which results in a low throughput on homogeneous multi-core architectures and thus could benefit significantly from the use of stream processing accelerators. The integration of stream processing accelerators in an architecture is often facilitated by a NoC. Crossbars or mesh-based NoCs provide guaranteed throughput but tend to have unacceptably high hardware costs. We propose a low-cost heterogeneous multi-processor architecture for real-time stream processing applications together with dataflow models for real-time analysis. This architecture allows compositional temporal dataflow analysis based on independently characterized components. The proposed architecture contains a low-cost ring-shaped interconnect which provides all-to-all guaranteed throughput communication while being work-conserving. Furthermore, cost-effective integration of stream processing accelerators is enabled by combining two low-cost rings and using a small shell in each NI, thereby realizing credit-based hardware flow control for accelerators. To improve the utilization of stream processing accelerators, we propose a sharing approach to multiplex multiple real-time streams of data over accelerators. Data streams between tasks are transferred using our dual-ring interconnect. Software tasks communicate directly using our distributed software FIFO implementation while communication involving stream processing accelerators is handled by our hardware credit-based flow control. In order to reason about the worst-case behavior of our architecture, temporal dataflow models are constructed to obtain bounds on throughput and latency. Three case studies have been carried out to evaluate the hardware costs and performance of the proposed architecture. For these case studies, several instances of the proposed architecture have been implemented on a Xilinx Virtex-6 FPGA. We show that in our architecture the use of accelerators improves maximum throughput by 366% and sharing accelerators can reduce hardware costs over 63%. The results from our case studies show that our ring interconnect has a very small hardware cost and performs within the bounds derived by our dataflow analysis models. We conclude that a considerable reduction of hardware costs can be attained by replacing traditional interconnects by our dual communication ring interconnect. We also conclude that cost-effective shared accelerator integration can improve application performance which demonstrates the merit of our approach.
Original languageUndefined
Awarding Institution
  • University of Twente
Supervisors/Advisors
  • Bekooij, Marco Jan Gerrit, Supervisor
Award date16 Oct 2015
Place of PublicationEnschede
Publisher
Print ISBNs978-90-365-3915-9
DOIs
Publication statusPublished - 16 Oct 2015

Keywords

  • IR-97333
  • Stream Processing
  • real-time processing
  • EWI-26354
  • Data flow
  • Data streaming
  • accelerator sharing
  • METIS-311923

Cite this

@phdthesis{019d9376b0c642e1baafd20130662d2d,
title = "Low-Cost Heterogeneous Embedded Multiprocessor Architecture for Real-Time Stream Processing Applications",
abstract = "SDR applications are often stream processing applications that are computationally intensive which results in a low throughput on homogeneous multi-core architectures and thus could benefit significantly from the use of stream processing accelerators. The integration of stream processing accelerators in an architecture is often facilitated by a NoC. Crossbars or mesh-based NoCs provide guaranteed throughput but tend to have unacceptably high hardware costs. We propose a low-cost heterogeneous multi-processor architecture for real-time stream processing applications together with dataflow models for real-time analysis. This architecture allows compositional temporal dataflow analysis based on independently characterized components. The proposed architecture contains a low-cost ring-shaped interconnect which provides all-to-all guaranteed throughput communication while being work-conserving. Furthermore, cost-effective integration of stream processing accelerators is enabled by combining two low-cost rings and using a small shell in each NI, thereby realizing credit-based hardware flow control for accelerators. To improve the utilization of stream processing accelerators, we propose a sharing approach to multiplex multiple real-time streams of data over accelerators. Data streams between tasks are transferred using our dual-ring interconnect. Software tasks communicate directly using our distributed software FIFO implementation while communication involving stream processing accelerators is handled by our hardware credit-based flow control. In order to reason about the worst-case behavior of our architecture, temporal dataflow models are constructed to obtain bounds on throughput and latency. Three case studies have been carried out to evaluate the hardware costs and performance of the proposed architecture. For these case studies, several instances of the proposed architecture have been implemented on a Xilinx Virtex-6 FPGA. We show that in our architecture the use of accelerators improves maximum throughput by 366{\%} and sharing accelerators can reduce hardware costs over 63{\%}. The results from our case studies show that our ring interconnect has a very small hardware cost and performs within the bounds derived by our dataflow analysis models. We conclude that a considerable reduction of hardware costs can be attained by replacing traditional interconnects by our dual communication ring interconnect. We also conclude that cost-effective shared accelerator integration can improve application performance which demonstrates the merit of our approach.",
keywords = "IR-97333, Stream Processing, real-time processing, EWI-26354, Data flow, Data streaming, accelerator sharing, METIS-311923",
author = "B.H.J. Dekens",
year = "2015",
month = "10",
day = "16",
doi = "10.3990/1.9789036539159",
language = "Undefined",
isbn = "978-90-365-3915-9",
publisher = "University of Twente",
address = "Netherlands",
school = "University of Twente",

}

Low-Cost Heterogeneous Embedded Multiprocessor Architecture for Real-Time Stream Processing Applications. / Dekens, B.H.J.

Enschede : University of Twente, 2015. 152 p.

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

TY - THES

T1 - Low-Cost Heterogeneous Embedded Multiprocessor Architecture for Real-Time Stream Processing Applications

AU - Dekens, B.H.J.

PY - 2015/10/16

Y1 - 2015/10/16

N2 - SDR applications are often stream processing applications that are computationally intensive which results in a low throughput on homogeneous multi-core architectures and thus could benefit significantly from the use of stream processing accelerators. The integration of stream processing accelerators in an architecture is often facilitated by a NoC. Crossbars or mesh-based NoCs provide guaranteed throughput but tend to have unacceptably high hardware costs. We propose a low-cost heterogeneous multi-processor architecture for real-time stream processing applications together with dataflow models for real-time analysis. This architecture allows compositional temporal dataflow analysis based on independently characterized components. The proposed architecture contains a low-cost ring-shaped interconnect which provides all-to-all guaranteed throughput communication while being work-conserving. Furthermore, cost-effective integration of stream processing accelerators is enabled by combining two low-cost rings and using a small shell in each NI, thereby realizing credit-based hardware flow control for accelerators. To improve the utilization of stream processing accelerators, we propose a sharing approach to multiplex multiple real-time streams of data over accelerators. Data streams between tasks are transferred using our dual-ring interconnect. Software tasks communicate directly using our distributed software FIFO implementation while communication involving stream processing accelerators is handled by our hardware credit-based flow control. In order to reason about the worst-case behavior of our architecture, temporal dataflow models are constructed to obtain bounds on throughput and latency. Three case studies have been carried out to evaluate the hardware costs and performance of the proposed architecture. For these case studies, several instances of the proposed architecture have been implemented on a Xilinx Virtex-6 FPGA. We show that in our architecture the use of accelerators improves maximum throughput by 366% and sharing accelerators can reduce hardware costs over 63%. The results from our case studies show that our ring interconnect has a very small hardware cost and performs within the bounds derived by our dataflow analysis models. We conclude that a considerable reduction of hardware costs can be attained by replacing traditional interconnects by our dual communication ring interconnect. We also conclude that cost-effective shared accelerator integration can improve application performance which demonstrates the merit of our approach.

AB - SDR applications are often stream processing applications that are computationally intensive which results in a low throughput on homogeneous multi-core architectures and thus could benefit significantly from the use of stream processing accelerators. The integration of stream processing accelerators in an architecture is often facilitated by a NoC. Crossbars or mesh-based NoCs provide guaranteed throughput but tend to have unacceptably high hardware costs. We propose a low-cost heterogeneous multi-processor architecture for real-time stream processing applications together with dataflow models for real-time analysis. This architecture allows compositional temporal dataflow analysis based on independently characterized components. The proposed architecture contains a low-cost ring-shaped interconnect which provides all-to-all guaranteed throughput communication while being work-conserving. Furthermore, cost-effective integration of stream processing accelerators is enabled by combining two low-cost rings and using a small shell in each NI, thereby realizing credit-based hardware flow control for accelerators. To improve the utilization of stream processing accelerators, we propose a sharing approach to multiplex multiple real-time streams of data over accelerators. Data streams between tasks are transferred using our dual-ring interconnect. Software tasks communicate directly using our distributed software FIFO implementation while communication involving stream processing accelerators is handled by our hardware credit-based flow control. In order to reason about the worst-case behavior of our architecture, temporal dataflow models are constructed to obtain bounds on throughput and latency. Three case studies have been carried out to evaluate the hardware costs and performance of the proposed architecture. For these case studies, several instances of the proposed architecture have been implemented on a Xilinx Virtex-6 FPGA. We show that in our architecture the use of accelerators improves maximum throughput by 366% and sharing accelerators can reduce hardware costs over 63%. The results from our case studies show that our ring interconnect has a very small hardware cost and performs within the bounds derived by our dataflow analysis models. We conclude that a considerable reduction of hardware costs can be attained by replacing traditional interconnects by our dual communication ring interconnect. We also conclude that cost-effective shared accelerator integration can improve application performance which demonstrates the merit of our approach.

KW - IR-97333

KW - Stream Processing

KW - real-time processing

KW - EWI-26354

KW - Data flow

KW - Data streaming

KW - accelerator sharing

KW - METIS-311923

U2 - 10.3990/1.9789036539159

DO - 10.3990/1.9789036539159

M3 - PhD Thesis - Research UT, graduation UT

SN - 978-90-365-3915-9

PB - University of Twente

CY - Enschede

ER -