Programming a multicore architecture without coherency and atomic operations

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

31 Downloads (Pure)

Abstract

It is hard to reason about the state of a multicore system-on-chip, because operations on memory need multiple cycles to complete, since cores communicate via an interconnect like a network-on-chip. To simplify programming, atomicity is required, by means of atomic read-modify-write (RMW) operations, a strong memory model, and hardware cache coherency. As a result, multicore architectures are very complex, but this stems from the fact that they are designed with an imperative programming paradigm in mind, i.e. based on threads that communicate via shared memory. In this paper, we show the impact on a multicore architecture, when the programming paradigm is changed and a lambda-calculus-based (functional) language is used instead. Ordering requirements of memory operations are more relaxed and synchronization is simplified, because lambda-calculus does not have a notion of state or memory, and therefore does not impose ordering requirements on the platform. We implemented a functional language for multicores with a weak memory model, without the need of hardware cache coherency, any atomic RMW operation, or mutex--the execution is atomic-free. Experiments show that even on a system with (transparently applied) software cache coherency, execution scales properly up to 32 cores. This shows that concurrent hardware complexity can be reduced by making different choices in the software layers on top.
Original languageUndefined
Title of host publicationProceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2014)
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages29-38
Number of pages10
ISBN (Print)978-1-4503-2655-1
DOIs
Publication statusPublished - 15 Feb 2014

Publication series

Name
PublisherACM

Keywords

  • EWI-24377
  • Embedded system
  • functional language
  • METIS-303999
  • memory model
  • distributed shared memory
  • IR-89490
  • cache coherency

Cite this

Rutgers, J. H., Bekooij, M. J. G., & Smit, G. J. M. (2014). Programming a multicore architecture without coherency and atomic operations. In Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2014) (pp. 29-38). New York: Association for Computing Machinery (ACM). https://doi.org/10.1145/2560683.2560697
Rutgers, J.H. ; Bekooij, Marco Jan Gerrit ; Smit, Gerardus Johannes Maria. / Programming a multicore architecture without coherency and atomic operations. Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2014). New York : Association for Computing Machinery (ACM), 2014. pp. 29-38
@inproceedings{15e3fc1023f646899e477c7af913adfe,
title = "Programming a multicore architecture without coherency and atomic operations",
abstract = "It is hard to reason about the state of a multicore system-on-chip, because operations on memory need multiple cycles to complete, since cores communicate via an interconnect like a network-on-chip. To simplify programming, atomicity is required, by means of atomic read-modify-write (RMW) operations, a strong memory model, and hardware cache coherency. As a result, multicore architectures are very complex, but this stems from the fact that they are designed with an imperative programming paradigm in mind, i.e. based on threads that communicate via shared memory. In this paper, we show the impact on a multicore architecture, when the programming paradigm is changed and a lambda-calculus-based (functional) language is used instead. Ordering requirements of memory operations are more relaxed and synchronization is simplified, because lambda-calculus does not have a notion of state or memory, and therefore does not impose ordering requirements on the platform. We implemented a functional language for multicores with a weak memory model, without the need of hardware cache coherency, any atomic RMW operation, or mutex--the execution is atomic-free. Experiments show that even on a system with (transparently applied) software cache coherency, execution scales properly up to 32 cores. This shows that concurrent hardware complexity can be reduced by making different choices in the software layers on top.",
keywords = "EWI-24377, Embedded system, functional language, METIS-303999, memory model, distributed shared memory, IR-89490, cache coherency",
author = "J.H. Rutgers and Bekooij, {Marco Jan Gerrit} and Smit, {Gerardus Johannes Maria}",
note = "10.1145/2560683.2560697",
year = "2014",
month = "2",
day = "15",
doi = "10.1145/2560683.2560697",
language = "Undefined",
isbn = "978-1-4503-2655-1",
publisher = "Association for Computing Machinery (ACM)",
pages = "29--38",
booktitle = "Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2014)",
address = "United States",

}

Rutgers, JH, Bekooij, MJG & Smit, GJM 2014, Programming a multicore architecture without coherency and atomic operations. in Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2014). Association for Computing Machinery (ACM), New York, pp. 29-38. https://doi.org/10.1145/2560683.2560697

Programming a multicore architecture without coherency and atomic operations. / Rutgers, J.H.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria.

Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2014). New York : Association for Computing Machinery (ACM), 2014. p. 29-38.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Programming a multicore architecture without coherency and atomic operations

AU - Rutgers, J.H.

AU - Bekooij, Marco Jan Gerrit

AU - Smit, Gerardus Johannes Maria

N1 - 10.1145/2560683.2560697

PY - 2014/2/15

Y1 - 2014/2/15

N2 - It is hard to reason about the state of a multicore system-on-chip, because operations on memory need multiple cycles to complete, since cores communicate via an interconnect like a network-on-chip. To simplify programming, atomicity is required, by means of atomic read-modify-write (RMW) operations, a strong memory model, and hardware cache coherency. As a result, multicore architectures are very complex, but this stems from the fact that they are designed with an imperative programming paradigm in mind, i.e. based on threads that communicate via shared memory. In this paper, we show the impact on a multicore architecture, when the programming paradigm is changed and a lambda-calculus-based (functional) language is used instead. Ordering requirements of memory operations are more relaxed and synchronization is simplified, because lambda-calculus does not have a notion of state or memory, and therefore does not impose ordering requirements on the platform. We implemented a functional language for multicores with a weak memory model, without the need of hardware cache coherency, any atomic RMW operation, or mutex--the execution is atomic-free. Experiments show that even on a system with (transparently applied) software cache coherency, execution scales properly up to 32 cores. This shows that concurrent hardware complexity can be reduced by making different choices in the software layers on top.

AB - It is hard to reason about the state of a multicore system-on-chip, because operations on memory need multiple cycles to complete, since cores communicate via an interconnect like a network-on-chip. To simplify programming, atomicity is required, by means of atomic read-modify-write (RMW) operations, a strong memory model, and hardware cache coherency. As a result, multicore architectures are very complex, but this stems from the fact that they are designed with an imperative programming paradigm in mind, i.e. based on threads that communicate via shared memory. In this paper, we show the impact on a multicore architecture, when the programming paradigm is changed and a lambda-calculus-based (functional) language is used instead. Ordering requirements of memory operations are more relaxed and synchronization is simplified, because lambda-calculus does not have a notion of state or memory, and therefore does not impose ordering requirements on the platform. We implemented a functional language for multicores with a weak memory model, without the need of hardware cache coherency, any atomic RMW operation, or mutex--the execution is atomic-free. Experiments show that even on a system with (transparently applied) software cache coherency, execution scales properly up to 32 cores. This shows that concurrent hardware complexity can be reduced by making different choices in the software layers on top.

KW - EWI-24377

KW - Embedded system

KW - functional language

KW - METIS-303999

KW - memory model

KW - distributed shared memory

KW - IR-89490

KW - cache coherency

U2 - 10.1145/2560683.2560697

DO - 10.1145/2560683.2560697

M3 - Conference contribution

SN - 978-1-4503-2655-1

SP - 29

EP - 38

BT - Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2014)

PB - Association for Computing Machinery (ACM)

CY - New York

ER -

Rutgers JH, Bekooij MJG, Smit GJM. Programming a multicore architecture without coherency and atomic operations. In Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2014). New York: Association for Computing Machinery (ACM). 2014. p. 29-38 https://doi.org/10.1145/2560683.2560697