DeepSeek DeepGEMM releases major updates including Mega MoE and FP4 Indexer.

PANews reported on April 16th that DeepSeek's open-source matrix operation library, DeepGEMM, has initiated a merge request titled "Public release 26/04, " introducing new features such as Mega MoE and FP4 Indexer . This update merges dispatch , linear1/SwiGLU/linear2 , and combine in MoE into a single mega-kernel , and optimizes overlap between NVLink communication and tensor core computation. Currently, it only supports FP8 x FP4 MoE , EP≤8 , and requires PyTorch≥2.9 . It also adds FP4 Indexer (for MQA logits , supporting larger MTP ), FP8 x FP4 GEMM , PDL , and DeepEPv2 MoE GEMM layout , optimizes GEMM heuristics and kernel, speeds up JIT compilation, and fixes issues such as JIT crashes and partial kernel hangs under distributed file systems . This release is only related to DeepGEMM development and is unrelated to internal model releases.

Share to:

Author: PA一线

This content is for market information only and is not investment advice.

Follow PANews official accounts, navigate bull and bear markets together
PANews APP
Chainlink partners with the parent company of the Swiss Stock Exchange to put over €2 trillion in equity data on the blockchain.
PANews Newsflash