Gate Square “Creator Certification Incentive Program” — Recruiting Outstanding Creators!
Join now, share quality content, and compete for over $10,000 in monthly rewards.
How to Apply:
1️⃣ Open the App → Tap [Square] at the bottom → Click your [avatar] in the top right.
2️⃣ Tap [Get Certified], submit your application, and wait for approval.
Apply Now: https://www.gate.com/questionnaire/7159
Token rewards, exclusive Gate merch, and traffic exposure await you!
Details: https://www.gate.com/announcements/article/47889
R1 One-Year Anniversary Reveal MODEL1: Technical Clues Left by DeepSeek on GitHub
DeepSeek-R1 has been released for a full year. At this milestone, a new model silhouette has emerged in the GitHub codebase. According to the latest news, while updating the FlashMLA code repository, 28 mentions of “MODEL1” appeared across 114 files, appearing as a different architecture from the known V32 (DeepSeek-V3.2). These scattered code clues outline DeepSeek’s ongoing iteration of new architectures.
Signals of Innovation in the Code
Differences in Technical Details
MODEL1 and V32 show clear differences in code implementation, mainly in three key aspects:
All these changes point in the same direction: memory optimization. In practical applications of large model inference, managing the KV cache directly affects inference speed and VRAM usage; sparsity handling relates to model efficiency; FP8 decoding involves balancing computational precision and speed. These are areas the industry is actively working to breakthrough.
Why a New Architecture
V32 is an iterative version of V3, representing an optimization within the same generation series. Meanwhile, MODEL1 appears as an independent model identifier in the code, indicating that this is not just a simple parameter adjustment but likely an architectural innovation. This distinction is relatively rare in DeepSeek’s code management, hinting at the importance of MODEL1.
The R&D Capabilities Behind
The appearance of MODEL1 reflects DeepSeek’s sustained technical investment. According to publicly available information, DeepSeek’s R1 training cost is approximately $294,000, with a total V3 budget of $5.57 million. While these costs are not high compared to top Silicon Valley labs, continuously launching new architectures and models requires stable funding.
This support comes from DeepSeek’s behind-the-scenes Quantum Square Quantification. By 2025, Quantum Square’s average revenue yield reaches 56.55%, with a management scale exceeding 70 billion yuan, and annual revenue estimated to surpass 5 billion RMB. Such cash flow enables DeepSeek to focus on long-term technological R&D without external financing pressures.
Possible Future Directions
Based on the optimization directions in the code, MODEL1 may make breakthroughs in:
These directions align with the current mainstream trend in large model development—seeking optimal solutions in efficiency, cost, and performance rather than blindly increasing parameters.
Summary
The exposure of MODEL1 at the one-year mark of R1 is both a natural continuation of technological innovation and a reflection of DeepSeek’s R&D rhythm. From the code details, this company is focused on engineering optimization rather than hype. Compared to other enterprises, DeepSeek has two advantages: ample R&D resources and continuous technical accumulation. The appearance of MODEL1 is just a milestone in this process. The next questions are: when will this new architecture be officially released, and how much will it improve performance? These answers may be revealed very soon.