TY - JOUR

T1 - Formal verification of parallel prefix sum and stream compaction algorithms in CUDA

AU - Safari, Mohsen

AU - Huisman, Marieke

N1 - Funding Information:
This work is supported by NWO grant 639.023.710 for the Mercedes project.
Publisher Copyright:
© 2022 The Author(s)

PY - 2022/4/12

Y1 - 2022/4/12

N2 - GPUs are an important part of any High Performance Computing (HPC) architecture. To make optimal use of the specifics of a GPU architecture, we need programming models that naturally support the parallel execution model of a GPU. CUDA and OpenCL are two widely used examples of such programming models. Furthermore, we also need to redesign algorithms such that they adhere to this parallel programming model, and we need to be able to prove the correctness of these redesigned algorithms. In this paper we study two examples of such parallelized algorithms, and we discuss how to prove their correctness (data race freedom and (partial) functional correctness) using the VerCors program verifier. First of all, we prove the correctness of two parallel algorithms solving the prefix sum problem. Second, we show how such a prefix sum algorithm is used as a basic block in a stream compaction algorithm, and we prove correctness of this stream compaction algorithm, taking advantage of the earlier correctness proof for the prefix sum algorithm. The proofs as described in this paper are developed over the CUDA implementations of these algorithms. In earlier work, we had already shown correctness of a more high-level version of the algorithm. This paper discusses how we add support to reason about CUDA programs in VerCors, and it then shows how we can redo the verification at the level of the CUDA code. We also discuss some practical challenges that we had to address to prove correctness of the actual CUDA-level verifications.

AB - GPUs are an important part of any High Performance Computing (HPC) architecture. To make optimal use of the specifics of a GPU architecture, we need programming models that naturally support the parallel execution model of a GPU. CUDA and OpenCL are two widely used examples of such programming models. Furthermore, we also need to redesign algorithms such that they adhere to this parallel programming model, and we need to be able to prove the correctness of these redesigned algorithms. In this paper we study two examples of such parallelized algorithms, and we discuss how to prove their correctness (data race freedom and (partial) functional correctness) using the VerCors program verifier. First of all, we prove the correctness of two parallel algorithms solving the prefix sum problem. Second, we show how such a prefix sum algorithm is used as a basic block in a stream compaction algorithm, and we prove correctness of this stream compaction algorithm, taking advantage of the earlier correctness proof for the prefix sum algorithm. The proofs as described in this paper are developed over the CUDA implementations of these algorithms. In earlier work, we had already shown correctness of a more high-level version of the algorithm. This paper discusses how we add support to reason about CUDA programs in VerCors, and it then shows how we can redo the verification at the level of the CUDA code. We also discuss some practical challenges that we had to address to prove correctness of the actual CUDA-level verifications.

KW - CUDA

KW - Deductive verification

KW - GPU verification

KW - Prefix sum

KW - Separation logic

KW - Stream compaction

KW - UT-Hybrid-D

UR - http://www.scopus.com/inward/record.url?scp=85126118610&partnerID=8YFLogxK

U2 - 10.1016/j.tcs.2022.02.027

DO - 10.1016/j.tcs.2022.02.027

M3 - Article

AN - SCOPUS:85126118610

SN - 0304-3975

VL - 912

SP - 81

EP - 98

JO - Theoretical computer science

JF - Theoretical computer science

ER -