Portable memory consistency for software managed distributed memory in many-core SoC

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)
52 Downloads (Pure)

Abstract

Porting software to different platforms can require modifications of the application. One of the issues is that the targeted hardware supports another memory consistency model. As a consequence, the completion order of reads and writes in a multi-threaded application can change, which may result in improper synchronization. For example, a processor with out-of-order execution could break synchronization if proper fence instructions are missing. Such a bug can cause sporadic errors, which are hard to debug. This paper presents an approach that makes applications independent of the memory model of the hardware, hence they can be compiled to hardware with any memory architecture. The key is having a memory model that only guarantees the most fundamental orderings of reads and writes, and annotations to specify additional ordering constraints. As a result, tooling can transparently and properly implement fences, cache flushes, etc. when appropriate, without losing flexibility of the hardware design. In a case study, several SPLASH-2 applications are run on a 32-core software cache coherent MicroBlaze system in FPGA. Moreover, this approach also allows mapping to scratch-pad memories and a distributed shared memory architecture.
Original languageUndefined
Title of host publicationProceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2013) - 20th Reconfigurable Architectures Workshop (RAW 2013)
Place of PublicationUSA
PublisherIEEE Computer Society
Pages212-221
Number of pages10
ISBN (Print)978-0-7695-4979-8
DOIs
Publication statusPublished - 21 May 2013
Event20th Reconfigurable Architectures Workshop, RAW 2013 - Boston, United States
Duration: 20 May 201324 May 2013
Conference number: 20
http://www.ece.lsu.edu/vaidy/raw13/

Publication series

Name
PublisherIEEE Computer Society

Workshop

Workshop20th Reconfigurable Architectures Workshop, RAW 2013
Abbreviated titleRAW
CountryUnited States
CityBoston
Period20/05/1324/05/13
Internet address

Keywords

  • EWI-23363
  • IR-86199
  • METIS-297648
  • CAES-EEA: Efficient Embedded Architectures

Cite this

Rutgers, J. H., Bekooij, M. J. G., & Smit, G. J. M. (2013). Portable memory consistency for software managed distributed memory in many-core SoC. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2013) - 20th Reconfigurable Architectures Workshop (RAW 2013) (pp. 212-221). USA: IEEE Computer Society. https://doi.org/10.1109/IPDPSW.2013.14
Rutgers, J.H. ; Bekooij, Marco Jan Gerrit ; Smit, Gerardus Johannes Maria. / Portable memory consistency for software managed distributed memory in many-core SoC. Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2013) - 20th Reconfigurable Architectures Workshop (RAW 2013). USA : IEEE Computer Society, 2013. pp. 212-221
@inproceedings{4bc85d19034f4e37b88ea11c99800f79,
title = "Portable memory consistency for software managed distributed memory in many-core SoC",
abstract = "Porting software to different platforms can require modifications of the application. One of the issues is that the targeted hardware supports another memory consistency model. As a consequence, the completion order of reads and writes in a multi-threaded application can change, which may result in improper synchronization. For example, a processor with out-of-order execution could break synchronization if proper fence instructions are missing. Such a bug can cause sporadic errors, which are hard to debug. This paper presents an approach that makes applications independent of the memory model of the hardware, hence they can be compiled to hardware with any memory architecture. The key is having a memory model that only guarantees the most fundamental orderings of reads and writes, and annotations to specify additional ordering constraints. As a result, tooling can transparently and properly implement fences, cache flushes, etc. when appropriate, without losing flexibility of the hardware design. In a case study, several SPLASH-2 applications are run on a 32-core software cache coherent MicroBlaze system in FPGA. Moreover, this approach also allows mapping to scratch-pad memories and a distributed shared memory architecture.",
keywords = "EWI-23363, IR-86199, METIS-297648, CAES-EEA: Efficient Embedded Architectures",
author = "J.H. Rutgers and Bekooij, {Marco Jan Gerrit} and Smit, {Gerardus Johannes Maria}",
note = "10.1109/IPDPSW.2013.14",
year = "2013",
month = "5",
day = "21",
doi = "10.1109/IPDPSW.2013.14",
language = "Undefined",
isbn = "978-0-7695-4979-8",
publisher = "IEEE Computer Society",
pages = "212--221",
booktitle = "Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2013) - 20th Reconfigurable Architectures Workshop (RAW 2013)",
address = "United States",

}

Rutgers, JH, Bekooij, MJG & Smit, GJM 2013, Portable memory consistency for software managed distributed memory in many-core SoC. in Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2013) - 20th Reconfigurable Architectures Workshop (RAW 2013). IEEE Computer Society, USA, pp. 212-221, 20th Reconfigurable Architectures Workshop, RAW 2013, Boston, United States, 20/05/13. https://doi.org/10.1109/IPDPSW.2013.14

Portable memory consistency for software managed distributed memory in many-core SoC. / Rutgers, J.H.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria.

Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2013) - 20th Reconfigurable Architectures Workshop (RAW 2013). USA : IEEE Computer Society, 2013. p. 212-221.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Portable memory consistency for software managed distributed memory in many-core SoC

AU - Rutgers, J.H.

AU - Bekooij, Marco Jan Gerrit

AU - Smit, Gerardus Johannes Maria

N1 - 10.1109/IPDPSW.2013.14

PY - 2013/5/21

Y1 - 2013/5/21

N2 - Porting software to different platforms can require modifications of the application. One of the issues is that the targeted hardware supports another memory consistency model. As a consequence, the completion order of reads and writes in a multi-threaded application can change, which may result in improper synchronization. For example, a processor with out-of-order execution could break synchronization if proper fence instructions are missing. Such a bug can cause sporadic errors, which are hard to debug. This paper presents an approach that makes applications independent of the memory model of the hardware, hence they can be compiled to hardware with any memory architecture. The key is having a memory model that only guarantees the most fundamental orderings of reads and writes, and annotations to specify additional ordering constraints. As a result, tooling can transparently and properly implement fences, cache flushes, etc. when appropriate, without losing flexibility of the hardware design. In a case study, several SPLASH-2 applications are run on a 32-core software cache coherent MicroBlaze system in FPGA. Moreover, this approach also allows mapping to scratch-pad memories and a distributed shared memory architecture.

AB - Porting software to different platforms can require modifications of the application. One of the issues is that the targeted hardware supports another memory consistency model. As a consequence, the completion order of reads and writes in a multi-threaded application can change, which may result in improper synchronization. For example, a processor with out-of-order execution could break synchronization if proper fence instructions are missing. Such a bug can cause sporadic errors, which are hard to debug. This paper presents an approach that makes applications independent of the memory model of the hardware, hence they can be compiled to hardware with any memory architecture. The key is having a memory model that only guarantees the most fundamental orderings of reads and writes, and annotations to specify additional ordering constraints. As a result, tooling can transparently and properly implement fences, cache flushes, etc. when appropriate, without losing flexibility of the hardware design. In a case study, several SPLASH-2 applications are run on a 32-core software cache coherent MicroBlaze system in FPGA. Moreover, this approach also allows mapping to scratch-pad memories and a distributed shared memory architecture.

KW - EWI-23363

KW - IR-86199

KW - METIS-297648

KW - CAES-EEA: Efficient Embedded Architectures

U2 - 10.1109/IPDPSW.2013.14

DO - 10.1109/IPDPSW.2013.14

M3 - Conference contribution

SN - 978-0-7695-4979-8

SP - 212

EP - 221

BT - Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2013) - 20th Reconfigurable Architectures Workshop (RAW 2013)

PB - IEEE Computer Society

CY - USA

ER -

Rutgers JH, Bekooij MJG, Smit GJM. Portable memory consistency for software managed distributed memory in many-core SoC. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2013) - 20th Reconfigurable Architectures Workshop (RAW 2013). USA: IEEE Computer Society. 2013. p. 212-221 https://doi.org/10.1109/IPDPSW.2013.14