Cognition AI and Applied Compute jointly developed the SWE-Check model, which uses reinforcement learning to detect code bugs, with speed and cost significantly outperforming state-of-the-art models. Although the gap with Claude Opus 4.6 has narrowed in evaluations, further optimization is still needed. The model employs linear rewards and a two-stage training approach, aiming to improve detection accuracy and operational efficiency. The preview version is now available on Windsurf Next.

MeNews

2026-05-08 09:06:33

Abstract generation in progress

ME News report, April 15 (UTC+8). According to Beating Monitoring, Cognition AI, the parent company of the AI programming tool Windsurf, has partnered with AI training company Applied Compute to train a model specifically for code bug detection, SWE-Check, using reinforcement learning. The model analyzes the user’s current code changes (diff), automatically flags potential bugs, and provides repair suggestions.

In evaluations where the test data follows the same distribution as the training data, SWE-Check’s F1 score has matched Claude Opus 4.6 (the gap has narrowed from 0.09 to 0). In cross-distribution evaluations, the gap has shrunk from 0.49 to 0.29—still behind leading models, but with clear progress.

Its key advantages are speed and cost: SWE-Check runs an order of magnitude faster than state-of-the-art models, and its inference costs have also been significantly reduced. As a result, it enables instant, free bug detection directly within the IDE, which cannot be achieved by making direct calls to large models such as Opus 4.6.

Two training design choices are especially worth noting:

Reward linearization: The team aims to optimize the global F-beta metric, but this metric cannot be directly decomposed into individual samples. They convert the global metric into a per-sample computable reward function using a first-order approximation, allowing training to effectively climb the global metric. In early versions, the false positive rate was too high, so the team adjusted beta from 1 to 0.5 to emphasize precision.
Two-stage post-training: In the first stage, the model purely maximizes bug-detection capability without penalizing latency. In the second stage, latency penalties are introduced based on the real statistical distribution of how long users take to switch away after triggering detection. This staged approach outperforms optimizing both objectives at the same time, because simultaneous optimization can easily fall into local optima—for example, learning to be very fast but with shallow analysis.

A preview version of SWE-Check has been launched in Windsurf Next (shortcut: cmd+U). It will later be rolled into the official Windsurf release.

(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
809.37K Popularity
#
BitcoinFallsBelow80K
95.04M Popularity
#
IranUSConflictEscalates
99.03K Popularity
#
OilPriceRollerCoaster
310.11K Popularity
#
DailyPolymarketHotspot
858.81K Popularity

Sitemap

Windsurf trained a specialized bug-catching small model using RL, and in internal evaluations, it has matched Claude Opus 4.6.

Trending Topics

GateSquareMayTradingShare

BitcoinFallsBelow80K

IranUSConflictEscalates

OilPriceRollerCoaster

DailyPolymarketHotspot

Pin