DeepSeek releases DeepGEMM: an efficient FP8 GEMM library that optimizes V3/R1 training and inference

PANews reported on February 26 that DeepSeek launched DeepGEMM on the third day of its OpenSourceWeek, a CUDA library that supports FP8 GEMM and can be used for dense matrix calculations and mixture of experts (MoE) architecture to optimize the training and inference of V3/R1 models.

DeepGEMM key features:

• Ultra-high performance: 1350+ FP8 TFLOPS on Hopper GPU

• Minimal dependencies: no heavy dependencies, simple code like tutorials

• JIT compilation: no need for pre-compilation, automatic optimization at runtime

• The core code is only about 300 lines, but outperforms expert-optimized kernels for most matrix sizes

• Support dense layout and two MoE layouts

Share to:

Author: PA一线

This content is for informational purposes only and does not constitute investment advice.

Follow PANews official accounts, navigate bull and bear markets together
Recommended Reading
2026-01-01 09:37
2025-12-23 13:00
2025-12-22 09:24
2025-12-04 07:40
2025-12-02 00:14
2025-11-27 13:45

Popular Articles

Industry News
Market Trends
Curated Readings

Curated Series

App内阅读