DeepSeek DeepGEMM releases major updates including Mega MoE and FP4 Indexer.

PANews reported on April 16th that DeepSeek's open-source matrix operation library, DeepGEMM, has initiated a merge request titled "Public release 26/04, " introducing new features such as Mega MoE and FP4 Indexer . This update merges dispatch , linear1/SwiGLU/linear2 , and combine in MoE into a single mega-kernel , and optimizes overlap between NVLink communication and tensor core computation. Currently, it only supports FP8 x FP4 MoE , EP≤8 , and requires PyTorch≥2.9 . It also adds FP4 Indexer (for MQA logits , supporting larger MTP ), FP8 x FP4 GEMM , PDL , and DeepEPv2 MoE GEMM layout , optimizes GEMM heuristics and kernel, speeds up JIT compilation, and fixes issues such as JIT crashes and partial kernel hangs under distributed file systems . This release is only related to DeepGEMM development and is unrelated to internal model releases.

Share to:

Author: PA一线

This content is for market information only and is not investment advice.

Follow PANews official accounts, navigate bull and bear markets together
PANews APP
BTC fell below $66,000, down 0.12% on the day.
PANews Newsflash