The Bend Project: Creating a Parallel Computing Language for GPUs

Published first releases of the project Bend, which develops a high-level programming language for parallel computing that is positioned as an alternative to low-level languages ​​such as CUDA and Metal, with the expressive syntax and development capabilities of languages ​​such as Python and Haskell. The project code is written in Rust and distributed by licensed under Apache 2.0.

From opportunities Bend language is characterized by fast object management, the ability to use higher order functions, closures, sequelsunbounded recursion, pattern matching, recursive comparisons (fold) and loops (bend), integer, string and list types. Supported two options syntax – in Python and Haskell style. Programs do not require specifying annotations that control parallelization, explicitly creating threads, or setting locks. Parallelization is performed automatically, for example, when calculating the expression “((1 + 2) + (3 + 4))”, operations “1 + 2” and “3 + 4” will be performed simultaneously on different computing cores.


Bend programs can run on massively parallel hardware, such as GPUs, with nearly linear performance growth as a function of the number of cores. Bend code is compiled into a low-level intermediate representation HVM2 (Higher-order Virtual Machine 2), which is then compiled into a C and CUDA representation. Currently the project only supports execution on NVIDIA GPUs.

Regarding performance, the test application with the implementation bitonic sortwhen executed in one CPU thread, the Apple M3 Max was completed in 12.15 seconds, when using 16 threads – in 0.96 seconds, and when using an NVIDIA RTX 4090 GPU with 16k threads – in 0.21 seconds.

 def sort(d, s, tree): switch d: case 0: return tree case _: (x,y) = tree lft = sort(d-1, 0, x) rgt = sort(d-1, 1, y) return rots(d, s, lft, rgt) def rots(d, s, tree): switch d: case 0: return tree case _: (x,y) = tree return down(d, s, warp(d-1, s, x, y)) ...

Thanks for reading: