TY - JOUR
T1 - Multidie 3-D Stacking of Memory Dominated Neuromorphic Architectures
AU - Giacomini Rocha, Leandro M.
AU - Bilgic, Refik
AU - Naeim, Mohamed
AU - Das, Sudipta
AU - Oprins, Herman
AU - Yousefzadeh, Amirreza
AU - Konijnenburg, Mario
AU - Milojevic, Dragomir
AU - Myers, James
AU - Ryckaert, Julien
AU - Biswas, Dwaipayan
N1 - Publisher Copyright:
© 1993-2012 IEEE.
PY - 2024/11
Y1 - 2024/11
N2 - Event-driven neuromorphic processors for artificial intelligence (AI) inference on edge/IoT devices require largeon-chip memory capacity, for efficient execution of spiking neural networks (NNs). In this work, we evaluate 3-D stacking benefits on SENECA, a digital neuromorphic accelerator core, sweeping itson-chip memory capacity from 2 up to 32 Mb in both legacy planar and advanced nanosheet CMOS logic nodes. In a planar CMOS node (GF-22 nm), two-die memory-on-logic (MoL) partitioning enables $8\times $ moreon-chip memory, and it boosts operating frequency by 7% with 26% less power than the 2-D. Moving to an advanced nanosheet technology (imec A10), multidie (up to 7 dies) MoL stacking enables a performance increase of up to 29% and power savings up to 31%. Furthermore, a core folding (CF) partitioning in A10 shows up to 16% performance improvement with 12% total power savings with respect to the 2-D implementation on the same technology. We also demonstrate no thermal overhead for multidie stacking at advanced nodes for designs exhibiting low power density. These physical design explorations lay the foundation for system technology co-optimization studies for edge devices.
AB - Event-driven neuromorphic processors for artificial intelligence (AI) inference on edge/IoT devices require largeon-chip memory capacity, for efficient execution of spiking neural networks (NNs). In this work, we evaluate 3-D stacking benefits on SENECA, a digital neuromorphic accelerator core, sweeping itson-chip memory capacity from 2 up to 32 Mb in both legacy planar and advanced nanosheet CMOS logic nodes. In a planar CMOS node (GF-22 nm), two-die memory-on-logic (MoL) partitioning enables $8\times $ moreon-chip memory, and it boosts operating frequency by 7% with 26% less power than the 2-D. Moving to an advanced nanosheet technology (imec A10), multidie (up to 7 dies) MoL stacking enables a performance increase of up to 29% and power savings up to 31%. Furthermore, a core folding (CF) partitioning in A10 shows up to 16% performance improvement with 12% total power savings with respect to the 2-D implementation on the same technology. We also demonstrate no thermal overhead for multidie stacking at advanced nodes for designs exhibiting low power density. These physical design explorations lay the foundation for system technology co-optimization studies for edge devices.
KW - 2025 OA procedure
KW - Core Folding (CF)
KW - Memory-on-Logic (MoL)
KW - Neuromorphic
KW - Performance and area
KW - Power
KW - 3-D partitioning
KW - Power, Performance, Area (PPA)
UR - http://www.scopus.com/inward/record.url?scp=85208044237&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2024.3421625
DO - 10.1109/TVLSI.2024.3421625
M3 - Article
AN - SCOPUS:85208044237
SN - 1063-8210
VL - 32
SP - 2144
EP - 2148
JO - IEEE transactions on very large scale integration (VLSI) systems
JF - IEEE transactions on very large scale integration (VLSI) systems
IS - 11
ER -