DeepSeek releases DeepGEMM: an efficient FP8 GEMM library that optimizes V3/R1 training and inference

PANews reported on February 26 that DeepSeek launched DeepGEMM on the third day of its OpenSourceWeek, a CUDA library that supports FP8 GEMM and can be used for dense matrix calculations and mixture of experts (MoE) architecture to optimize the training and inference of V3/R1 models.

DeepGEMM key features:

• Ultra-high performance: 1350+ FP8 TFLOPS on Hopper GPU

• Minimal dependencies: no heavy dependencies, simple code like tutorials

• JIT compilation: no need for pre-compilation, automatic optimization at runtime

• The core code is only about 300 lines, but outperforms expert-optimized kernels for most matrix sizes

• Support dense layout and two MoE layouts

Share to:

Author: PA一线

This content is for market information only and is not investment advice.

Follow PANews official accounts, navigate bull and bear markets together