DeepSeek launches NSA mechanism to improve long-context training and reasoning efficiency | PANews

DeepSeek launches NSA mechanism to improve long-context training and reasoning efficiency

PANews reported on February 18 that DeepSeek announced the launch of NSA (Sparse Attention Mechanism), which is highly consistent with hardware and supports native training, and is designed to achieve ultra-fast long-context training and reasoning. Through optimized design for modern hardware, NSA significantly reduces pre-training costs while accelerating reasoning without affecting model performance.

According to official introduction, NSA performs well in common benchmarks, long context tasks, and instruction-based reasoning, and performs comparable to or better than the full attention model.

Share to:

Author: PA一线

This content is for market information only and is not investment advice.

Follow PANews official accounts, navigate bull and bear markets together

PANews WeChat Group

Telegram Discussion Group

Telegram News Channel

Recommended Reading

PA一线

01/27/2026, 05:55 AM

DeepSeek releases DeepSeek-OCR 2, enabling AI to "see" an image in the same logical order as humans.

PA一线

01/21/2026, 12:58 AM

DeepSeek's new model MODEL1 code has been leaked, suggesting a completely new architecture.

PA一线

01/09/2026, 01:18 PM

Foreign media reports: DeepSeek will release its next-generation flagship AI model in February.

PA一线

01/01/2026, 09:37 AM

DeepSeek publishes a new paper authored by Liang Wenfeng: Proposes a new mHC architecture to improve the training stability of large models.

博闻札记

12/23/2025, 01:00 PM

Reinforcement Learning: A Paradigm Shift in Decentralized AI Networks

Tim

12/22/2025, 09:24 AM

Six major AI paradigm shifts in 2025: From RLVR training and Vibe Coding to Nano Banana