Abstract
In this paper, we present an architecture for embedded FPGA-based deep neural network inference which is able to handle pruned weight matrices. Pruning of weights and even entire neurons reduces the amount of data and calculations significantly, thus improving enormously the efficiency and performance of the neural network inference in embedded devices. By using an HLS approach, the architecture is easily extendable and highly configurable with a free choice of parameters like the number of MAC units or the used activation function. For large neural networks, our approach competes with at least comparable performance as state-of-the-art x86-based software implementations while only using 10% of the energy.
| Original language | English |
|---|---|
| Title of host publication | Architecture of Computing Systems – ARCS 2018 |
| Subtitle of host publication | 31st International Conference, Braunschweig, Germany, April 9–12, 2018, Proceedings |
| Editors | Mladen Berekovic, Rainer Buchty, Heiko Hamann, Dirk Koch, Thilo Pionteck |
| Place of Publication | Braunschweig, Germany |
| Publisher | Springer |
| Pages | 311-323 |
| Number of pages | 13 |
| ISBN (Electronic) | 978-3-319-77610-1 |
| ISBN (Print) | 978-3-319-77609-5 |
| DOIs | |
| Publication status | Published - 1 May 2018 |
| Externally published | Yes |
| Event | 31st International Conference on Architecture of Computing Systems 2018 - Technical University of Braunschweig, Braunschweig, Germany Duration: 9 Apr 2018 → 12 Apr 2018 Conference number: 31 http://arcs2018.itec.kit.edu/ |
Publication series
| Name | Lecture notes in computer science |
|---|---|
| Volume | 10793 |
Conference
| Conference | 31st International Conference on Architecture of Computing Systems 2018 |
|---|---|
| Abbreviated title | ARCS 2018 |
| Country/Territory | Germany |
| City | Braunschweig |
| Period | 9/04/18 → 12/04/18 |
| Internet address |