R1 One-Year Anniversary Reveal MODEL1: Technical Clues Left by DeepSeek on GitHub

2026-01-21 00:00:28

DeepSeek-R1 has been released for a full year. At this milestone, a new model silhouette has emerged in the GitHub codebase. According to the latest news, while updating the FlashMLA code repository, 28 mentions of “MODEL1” appeared across 114 files, appearing as a different architecture from the known V32 (DeepSeek-V3.2). These scattered code clues outline DeepSeek’s ongoing iteration of new architectures.

Signals of Innovation in the Code

Differences in Technical Details

MODEL1 and V32 show clear differences in code implementation, mainly in three key aspects:

Optimization adjustments of KV cache layout
Improvements in sparsity handling mechanisms
Innovations in FP8 decoding methods

All these changes point in the same direction: memory optimization. In practical applications of large model inference, managing the KV cache directly affects inference speed and VRAM usage; sparsity handling relates to model efficiency; FP8 decoding involves balancing computational precision and speed. These are areas the industry is actively working to breakthrough.

Why a New Architecture

V32 is an iterative version of V3, representing an optimization within the same generation series. Meanwhile, MODEL1 appears as an independent model identifier in the code, indicating that this is not just a simple parameter adjustment but likely an architectural innovation. This distinction is relatively rare in DeepSeek’s code management, hinting at the importance of MODEL1.

The R&D Capabilities Behind

The appearance of MODEL1 reflects DeepSeek’s sustained technical investment. According to publicly available information, DeepSeek’s R1 training cost is approximately $294,000, with a total V3 budget of $5.57 million. While these costs are not high compared to top Silicon Valley labs, continuously launching new architectures and models requires stable funding.

This support comes from DeepSeek’s behind-the-scenes Quantum Square Quantification. By 2025, Quantum Square’s average revenue yield reaches 56.55%, with a management scale exceeding 70 billion yuan, and annual revenue estimated to surpass 5 billion RMB. Such cash flow enables DeepSeek to focus on long-term technological R&D without external financing pressures.

Possible Future Directions

Based on the optimization directions in the code, MODEL1 may make breakthroughs in:

Further improving inference efficiency, especially on mobile or edge computing scenarios
Finding a new balance point between model parameter scale and performance
Designing dedicated architectures for specific application scenarios

These directions align with the current mainstream trend in large model development—seeking optimal solutions in efficiency, cost, and performance rather than blindly increasing parameters.

Summary

The exposure of MODEL1 at the one-year mark of R1 is both a natural continuation of technological innovation and a reflection of DeepSeek’s R&D rhythm. From the code details, this company is focused on engineering optimization rather than hype. Compared to other enterprises, DeepSeek has two advantages: ample R&D resources and continuous technical accumulation. The appearance of MODEL1 is just a milestone in this process. The next questions are: when will this new architecture be officially released, and how much will it improve performance? These answers may be revealed very soon.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.