An Efficient Asymmetric Distributed Lock for Embedded Multiprocessor Systems

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)
64 Downloads (Pure)

Abstract

Efficient synchronization is a key concern in an embedded many-core system-on-chip (SoC). The use of atomic read-modify-write instructions combined with cache coherency as synchronization primitive is not always an option for shared-memory SoCs due to the lack of suitable IP. Furthermore, there are doubts about the scalability of hardware cache coherency protocols. Existing distributed locks for NUMA multiprocessor systems do not rely on cache coherency and are more scalable, but exchange many messages per lock. This paper introduces an asymmetric distributed lock algorithm for shared-memory embedded multiprocessor systems without hardware cache coherency. Messages are exchanged via a low-cost inter-processor communication ring in combination with a small local memory per processor. Typically, a mutex is used over and over again by the same process, which is exploited by our algorithm. As a result, the number of messages exchanged per lock is significantly reduced. Experiments with our 32-core system show that when having locks in SDRAM, 35% of the memory traffic is lock related. In comparison, our solution eliminates all of this traffic and reduces the execution time by up to 89%.
Original languageUndefined
Title of host publicationProceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 2012)
Place of PublicationUSA
PublisherIEEE Circuits & Systems Society
Pages176-182
Number of pages7
ISBN (Print)978-1-4673-2296-6
DOIs
Publication statusPublished - 16 Jul 2012
Event2012 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, IC-SAMOS XII - Institute of East Aegean, Samos, Greece
Duration: 16 Jul 201219 Jul 2012
Conference number: 12

Publication series

Name
PublisherIEEE Circuits & Systems Society
NumberCFP1252A-U

Conference

Conference2012 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, IC-SAMOS XII
Abbreviated titleIC-SAMOS
CountryGreece
CitySamos
Period16/07/1219/07/12

Keywords

  • IR-80925
  • EWI-22073
  • METIS-287939

Cite this

Rutgers, J. H., Bekooij, M. J. G., & Smit, G. J. M. (2012). An Efficient Asymmetric Distributed Lock for Embedded Multiprocessor Systems. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 2012) (pp. 176-182). USA: IEEE Circuits & Systems Society. https://doi.org/10.1109/SAMOS.2012.6404172
Rutgers, J.H. ; Bekooij, Marco Jan Gerrit ; Smit, Gerardus Johannes Maria. / An Efficient Asymmetric Distributed Lock for Embedded Multiprocessor Systems. Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 2012). USA : IEEE Circuits & Systems Society, 2012. pp. 176-182
@inproceedings{16f100447e4944f2a2fbaa4d1b77c9a6,
title = "An Efficient Asymmetric Distributed Lock for Embedded Multiprocessor Systems",
abstract = "Efficient synchronization is a key concern in an embedded many-core system-on-chip (SoC). The use of atomic read-modify-write instructions combined with cache coherency as synchronization primitive is not always an option for shared-memory SoCs due to the lack of suitable IP. Furthermore, there are doubts about the scalability of hardware cache coherency protocols. Existing distributed locks for NUMA multiprocessor systems do not rely on cache coherency and are more scalable, but exchange many messages per lock. This paper introduces an asymmetric distributed lock algorithm for shared-memory embedded multiprocessor systems without hardware cache coherency. Messages are exchanged via a low-cost inter-processor communication ring in combination with a small local memory per processor. Typically, a mutex is used over and over again by the same process, which is exploited by our algorithm. As a result, the number of messages exchanged per lock is significantly reduced. Experiments with our 32-core system show that when having locks in SDRAM, 35{\%} of the memory traffic is lock related. In comparison, our solution eliminates all of this traffic and reduces the execution time by up to 89{\%}.",
keywords = "IR-80925, EWI-22073, METIS-287939",
author = "J.H. Rutgers and Bekooij, {Marco Jan Gerrit} and Smit, {Gerardus Johannes Maria}",
year = "2012",
month = "7",
day = "16",
doi = "10.1109/SAMOS.2012.6404172",
language = "Undefined",
isbn = "978-1-4673-2296-6",
publisher = "IEEE Circuits & Systems Society",
number = "CFP1252A-U",
pages = "176--182",
booktitle = "Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 2012)",

}

Rutgers, JH, Bekooij, MJG & Smit, GJM 2012, An Efficient Asymmetric Distributed Lock for Embedded Multiprocessor Systems. in Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 2012). IEEE Circuits & Systems Society, USA, pp. 176-182, 2012 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, IC-SAMOS XII, Samos, Greece, 16/07/12. https://doi.org/10.1109/SAMOS.2012.6404172

An Efficient Asymmetric Distributed Lock for Embedded Multiprocessor Systems. / Rutgers, J.H.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria.

Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 2012). USA : IEEE Circuits & Systems Society, 2012. p. 176-182.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - An Efficient Asymmetric Distributed Lock for Embedded Multiprocessor Systems

AU - Rutgers, J.H.

AU - Bekooij, Marco Jan Gerrit

AU - Smit, Gerardus Johannes Maria

PY - 2012/7/16

Y1 - 2012/7/16

N2 - Efficient synchronization is a key concern in an embedded many-core system-on-chip (SoC). The use of atomic read-modify-write instructions combined with cache coherency as synchronization primitive is not always an option for shared-memory SoCs due to the lack of suitable IP. Furthermore, there are doubts about the scalability of hardware cache coherency protocols. Existing distributed locks for NUMA multiprocessor systems do not rely on cache coherency and are more scalable, but exchange many messages per lock. This paper introduces an asymmetric distributed lock algorithm for shared-memory embedded multiprocessor systems without hardware cache coherency. Messages are exchanged via a low-cost inter-processor communication ring in combination with a small local memory per processor. Typically, a mutex is used over and over again by the same process, which is exploited by our algorithm. As a result, the number of messages exchanged per lock is significantly reduced. Experiments with our 32-core system show that when having locks in SDRAM, 35% of the memory traffic is lock related. In comparison, our solution eliminates all of this traffic and reduces the execution time by up to 89%.

AB - Efficient synchronization is a key concern in an embedded many-core system-on-chip (SoC). The use of atomic read-modify-write instructions combined with cache coherency as synchronization primitive is not always an option for shared-memory SoCs due to the lack of suitable IP. Furthermore, there are doubts about the scalability of hardware cache coherency protocols. Existing distributed locks for NUMA multiprocessor systems do not rely on cache coherency and are more scalable, but exchange many messages per lock. This paper introduces an asymmetric distributed lock algorithm for shared-memory embedded multiprocessor systems without hardware cache coherency. Messages are exchanged via a low-cost inter-processor communication ring in combination with a small local memory per processor. Typically, a mutex is used over and over again by the same process, which is exploited by our algorithm. As a result, the number of messages exchanged per lock is significantly reduced. Experiments with our 32-core system show that when having locks in SDRAM, 35% of the memory traffic is lock related. In comparison, our solution eliminates all of this traffic and reduces the execution time by up to 89%.

KW - IR-80925

KW - EWI-22073

KW - METIS-287939

U2 - 10.1109/SAMOS.2012.6404172

DO - 10.1109/SAMOS.2012.6404172

M3 - Conference contribution

SN - 978-1-4673-2296-6

SP - 176

EP - 182

BT - Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 2012)

PB - IEEE Circuits & Systems Society

CY - USA

ER -

Rutgers JH, Bekooij MJG, Smit GJM. An Efficient Asymmetric Distributed Lock for Embedded Multiprocessor Systems. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 2012). USA: IEEE Circuits & Systems Society. 2012. p. 176-182 https://doi.org/10.1109/SAMOS.2012.6404172