DeepSeek releases DeepGEMM: an efficient FP8 GEMM library that optimizes V3/R1 training and inference

PANews reported on February 26 that DeepSeek launched DeepGEMM on the third day of its OpenSourceWeek, a CUDA library that supports FP8 GEMM and can be used for dense matrix calculations and mixture of experts (MoE) architecture to optimize the training and inference of V3/R1 models.

DeepGEMM key features:

• Ultra-high performance: 1350+ FP8 TFLOPS on Hopper GPU

• Minimal dependencies: no heavy dependencies, simple code like tutorials

• JIT compilation: no need for pre-compilation, automatic optimization at runtime

• The core code is only about 300 lines, but outperforms expert-optimized kernels for most matrix sizes

• Support dense layout and two MoE layouts

Share to:

Author: PA一线

This content is for market information only and is not investment advice.

Follow PANews official accounts, navigate bull and bear markets together
PANews APP
Robinhood launches ALGO trading, covering users in New York State.
PANews Newsflash