Introducing the ZLUDA toolkit: Run CUDA applications on AMD GPUs

Project ZLUDA prepared an open implementation of the technology CUDA for AMD GPUs, allowing you to run unmodified CUDA applications with performance close to the performance of applications running without layers. The project provides binary compatibility with existing CUDA applications compiled using the CUDA compiler for NVIDIA GPUs. The implementation runs on top of the stack developed by AMD ROCm and runtime HIP (Heterogeneous-computing Interface for Portability). The project code is written in Rust and distributed by under MIT and Apache 2.0 licenses. Supports work on Linux and Windows.

The layer for organizing CUDA work on systems with AMD GPUs has been developed over the past two years, but the project has a longer history and was originally created to ensure CUDA work on Intel GPUs. The change is explained by the fact that at first the ZLUDA developer was an Intel employee, but in 2021 this company considered providing the ability to run CUDA applications on Intel GPUs to be of no business interest and did not speed up the development of the initiative.


At the beginning of 2022, the developer left Intel and was contracted by AMD to develop a layer for CUDA compatibility. During development, AMD asked not to advertise AMD's interest in the ZLUDA project and not to make commits to the public ZLUDA repository. Two years later, AMD decided that running CUDA applications on AMD GPUs was not of business interest, which, according to the terms of the contract, allowed the developer open your own achievements. Since GPU manufacturers have stopped funding the project, its fate now depends on the interest of the community and the receipt of proposals for cooperation from other companies. Without external support, the project will only be able to develop in directions that are personally interesting to the author, such as DLSS (Deep Learning Super Sampling).

In its current form, the quality level of the implementation is assessed as an alpha version. However, ZLUDA can already be used to run many CUDA applications, including Geekbench, 3DF Zephyr, Blender, Reality Capture, LAMMPS, NAMD, waifu2x, OpenFOAM and Arnold. Provides minimal support for cuDNN, cuBLAS, cuSPARSE, cuFFT, NCCL and NVML primitives and libraries.

The first launch of CUDA applications running ZLUDA occurs with noticeable delays due to the fact that ZLUDA compiles GPU code. There is no such delay in subsequent runs, since the compiled code is stored in the cache. When running compiled code, performance is close to native. When running Geekbench on an AMD Radeon 6800 XT GPU, the ZLUDA version of the CUDA benchmark suite performed noticeably better than the OpenCL version.

Support for the official CUDA Driver API and the reverse-engineered portion of the undocumented CUDA API is implemented in ZLUDA by replacing function calls with similar functions provided in the HIP runtime, which is similar in many ways to CUDA. For example, the cuDeviceGetAttribute() function is replaced by hipDeviceGetAttribute(). Compatibility with NVIDIA libraries such as NVML, cuBLAS and cuSPARSE is ensured in a similar way – for such libraries, ZLUDA provides translation libraries with the same name and the same set of functions, built as add-ons over similar AMD libraries.


GPU application code compiled into a view PTX (Parallel Thread Execution), is first translated by a special compiler into an intermediate representation LLVM IR, on the basis of which binary code for AMD GPUs is generated.

Thanks for reading: