TY - GEN
T1 - Programming a multicore architecture without coherency and atomic operations
AU - Rutgers, J.H.
AU - Bekooij, Marco Jan Gerrit
AU - Smit, Gerardus Johannes Maria
N1 - 10.1145/2560683.2560697
PY - 2014/2/15
Y1 - 2014/2/15
N2 - It is hard to reason about the state of a multicore system-on-chip, because operations on memory need multiple cycles to complete, since cores communicate via an interconnect like a network-on-chip. To simplify programming, atomicity is required, by means of atomic read-modify-write (RMW) operations, a strong memory model, and hardware cache coherency. As a result, multicore architectures are very complex, but this stems from the fact that they are designed with an imperative programming paradigm in mind, i.e. based on threads that communicate via shared memory.
In this paper, we show the impact on a multicore architecture, when the programming paradigm is changed and a lambda-calculus-based (functional) language is used instead. Ordering requirements of memory operations are more relaxed and synchronization is simplified, because lambda-calculus does not have a notion of state or memory, and therefore does not impose ordering requirements on the platform. We implemented a functional language for multicores with a weak memory model, without the need of hardware cache coherency, any atomic RMW operation, or mutex--the execution is atomic-free. Experiments show that even on a system with (transparently applied) software cache coherency, execution scales properly up to 32 cores. This shows that concurrent hardware complexity can be reduced by making different choices in the software layers on top.
AB - It is hard to reason about the state of a multicore system-on-chip, because operations on memory need multiple cycles to complete, since cores communicate via an interconnect like a network-on-chip. To simplify programming, atomicity is required, by means of atomic read-modify-write (RMW) operations, a strong memory model, and hardware cache coherency. As a result, multicore architectures are very complex, but this stems from the fact that they are designed with an imperative programming paradigm in mind, i.e. based on threads that communicate via shared memory.
In this paper, we show the impact on a multicore architecture, when the programming paradigm is changed and a lambda-calculus-based (functional) language is used instead. Ordering requirements of memory operations are more relaxed and synchronization is simplified, because lambda-calculus does not have a notion of state or memory, and therefore does not impose ordering requirements on the platform. We implemented a functional language for multicores with a weak memory model, without the need of hardware cache coherency, any atomic RMW operation, or mutex--the execution is atomic-free. Experiments show that even on a system with (transparently applied) software cache coherency, execution scales properly up to 32 cores. This shows that concurrent hardware complexity can be reduced by making different choices in the software layers on top.
KW - EWI-24377
KW - Embedded system
KW - functional language
KW - METIS-303999
KW - memory model
KW - distributed shared memory
KW - IR-89490
KW - cache coherency
U2 - 10.1145/2560683.2560697
DO - 10.1145/2560683.2560697
M3 - Conference contribution
SN - 978-1-4503-2655-1
SP - 29
EP - 38
BT - Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2014)
PB - Association for Computing Machinery
CY - New York
T2 - International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2014), Orlando, FL, USA
Y2 - 15 February 2014
ER -