PANews reported on April 23 that DeepSeek has open-sourced TileKernels, a high-performance GPU operator library, on its GitHub page. This project is based on TileLang. The library is deeply optimized for training and inference of Large Language Models (LLMs), and its operator performance is approaching the limits of hardware computational intensity and memory bandwidth.
TileKernels covers MoE routing, FP8/FP4 quantization, and various fusion operators, and is already in use within the DeepSeek internal environment. This library is currently compatible with NVIDIA SM90 and the latest SM100 (Blackwell) architectures, and requires CUDA 13.1 or later.

