Alibaba’s Qwen3-Coder, a new open-source AI model, now rivals top proprietary systems in software engineering and agentic tasks, setting a new benchmark.
Chinese e-commerce giant Alibaba’s latest release, Qwen3-Coder, strengthens the case that open-source models are now capable of matching proprietary AI systems, at least when it comes to software engineering tasks.
Featuring a 480B-parameter Mixture-of-Experts design and support for extended context windows, the model aims to tackle long-horizon agentic tasks that require planning, tool use and iterative feedback.
Alibaba has also released an open agentic CLI tool, Qwen Code, to showcase how the model performs in practice. The model also leads open-source performance on tasks like agentic tool use and browser automation, hinting at broader ambitions beyond pure code generation.
The Benchmarks Show a Closing Gap
Across coding, tool use and browser automation, Qwen3-Coder doesn’t just close the gap with closed models like Claude 4 Sonnet; it often matches them.
On the SWE-bench Verified benchmark, Qwen3-Coder clocks 67% accuracy, which rises to 69.6% with extended multi-turn interactions. That’s just below Claude 4 Sonnet’s 70.4%, but well ahead of GPT-4.1 at 54.6% and Gemini 2.5 Pro at 49.0%.
Its long-context support, with native capacity of 256k tokens, expandable to 1 million tokens; large-scale RL training and custom agent tooling like Qwen Code position it as a serious player in developer workflows—one that’s not just open, but optimised.
Qwen3-Coder now leads all open-source models on the Artificial Analysis Intelligence Index with a score of 62, nearly on par with Claude 3.7 Sonnet (62.3), as highlighted by Cline, an autonomous coding agent, on X. It outpaces DeepSeek, Kimi K2 and even closed models like GPT-3.5, reflecting its broad competence across reasoning, coding and long-context benchmarks.
Its benchmark dominance stretches further. In tasks like agentic browser use, where it scores 49.9% on WebArena, or real-world tool use with 77.5% on TAU-Bench Retail, Qwen3-Coder often leads the open pack and, occasionally, outperforms everyone.
Reactions on X Say the Quiet Part Out Loud
The launch has not gone unnoticed by the developer community. Qwen3-Coder surpassed Kimi-K2 in less than two weeks since its debut.
“Despite a significant initial lead, open source models are catching up to closed source and seem to be reaching escape velocity,” said Cline, the autonomous coding agent on X.
Others were more direct in their praise.
“It can write code better than most programmers,” Lukasz Olejnik, an assistant professor at SGH Warsaw School of Economics, said. He noted that Qwen3-Coder supports 358 programming languages and understands entire large-scale projects, adding, “Most importantly, it’s completely free and open source…Meanwhile, OpenAI was expected to release an open-source model, but that hasn’t quite happened.”
That sentiment—a mix of admiration and gentle prodding—has echoed across technical circles. With Claude, Gemini and GPT-4 still largely locked down, Qwen3-Coder has become an unlikely rallying point for the open community.
More Focus on Open Source AI Models Going Forward
There’s little doubt that Qwen3-Coder has tilted the field. In the Artificial Intelligence Index, it now leads all open models with an index score of 62, just behind Claude 3.7 and Claude Opus 4 and comfortably ahead of GPT-3.5. For an open source model, that is noteworthy.
Meanwhile, OpenAI’s much-awaited open-weights model, originally due in June, is delayed. In the meantime, however, Alibaba has already shipped.
On July 23, the White House shared America’s AI Action Plan, in which it weighed in on encouraging open-source and open-weight AI models. It mentioned that these models benefit commercial and government adoption of AI because many businesses and governments have sensitive data that they cannot send to closed model vendors.
Furthermore, the US government mentioned that the models are essential for academic research, which often relies on access to the weights and training data of a model to perform scientifically rigorous experiments.
“We need to ensure America has leading open models founded on American values. Open-source and open-weight models could become global standards in some areas of business and in academic research worldwide,” the action plan paper read.
“While the decision of whether and how to release an open or closed model is fundamentally up to the developer, the Federal government should create a supportive environment for open models,” it added.
For now, it seems like Qwen and China are setting the pace for open-source coding, not GPT and the United States.