PANews reported on February 18 that DeepSeek announced the launch of NSA (Sparse Attention Mechanism), which is highly consistent with hardware and supports native training, and is designed to achieve ultra-fast long-context training and reasoning. Through optimized design for modern hardware, NSA significantly reduces pre-training costs while accelerating reasoning without affecting model performance.
According to official introduction, NSA performs well in common benchmarks, long context tasks, and instruction-based reasoning, and performs comparable to or better than the full attention model.
