← Back to research
·5 min read·company

Tongyi DeepResearch

Tongyi DeepResearch — Alibaba's open deep research agent. RL-trained Qwen3-30B-A3B model leads benchmarks on BrowseComp, GAIA, and Humanity's Last Exam. 19.4k stars, Apache-2.0, hosted on OpenRouter with a free tier — but no new model version since the September 2025 release.

Key takeaways

  • SOTA-class open deep research agent — RL-trained Qwen3-30B-A3B (30.5B total params, 3.3B activated per token) leads benchmarks on BrowseComp, GAIA, HLE, WebWalkerQA, FRAMES, and SimpleQA
  • Fine-tuned approach vs prompt-based: fully automated synthetic data pipeline powers agentic pre-training, supervised fine-tuning, and end-to-end reinforcement learning (GRPO with token-level policy gradients)
  • 19.4k stars, Apache-2.0, ~47k HuggingFace downloads/month as of June 2026. Hosted on OpenRouter (including a free variant) and deployed in production inside Alibaba (Amap itineraries, Tongyi FaRui legal agent)
  • Momentum has cooled: the 30B-A3B weights from September 2025 remain the only release, and the GitHub repo has been quiet since late February 2026 — the technical report (updated May 2026) points to successor work still unreleased

FAQ

What is Tongyi DeepResearch?

An RL-trained deep research agent from Alibaba's Tongyi Lab. Uses a specialized Qwen3-30B-A3B model (30.5B params, 3.3B active) trained via GRPO reinforcement learning on synthetic agentic data. Leads multiple deep research benchmarks and is documented in a full technical report on arXiv.

How does it compare to GPT Researcher or dzhng/deep-research?

Those are prompt-based (use any frontier LLM). Tongyi trains a specialized model specifically for deep research tasks, achieving higher benchmark scores but requiring more setup and compute to run locally — or you can call the hosted model on OpenRouter, which offers a free variant.

Is Tongyi DeepResearch still actively developed?

Partially. As of June 2026 the September 2025 30B-A3B weights are still the only model release and repo activity stalled in February 2026, but the team updated the arXiv technical report in May 2026 and has said to expect a next generation of agentic models.

Overview

Tongyi DeepResearch is Alibaba's open-source deep research agent — a fine-tuned Qwen3-30B-A3B model (30.5 billion total parameters, only 3.3 billion activated per token) trained specifically for long-horizon, deep information-seeking tasks. It leads benchmarks across BrowseComp, GAIA, Humanity's Last Exam, WebWalkerQA, FRAMES, and SimpleQA.

Unlike prompt-based approaches (GPT Researcher, dzhng/deep-research), Tongyi takes the fine-tuned route: a fully automated synthetic data pipeline powers agentic pre-training, supervised fine-tuning, and end-to-end reinforcement learning using Group Relative Policy Optimization (GRPO) with token-level policy gradients.

Key stats (as of June 2026): 19,360 GitHub stars, 1,481 forks, Apache-2.0 license, Python. ~47,500 HuggingFace downloads in the last month.


Status & Releases

As of June 11, 2026, the September 2025 Tongyi-DeepResearch-30B-A3B weights remain the only model release — no v2 or successor checkpoint has shipped. The GitHub repo's last push was February 27, 2026, and the project publishes no tagged releases.

The team did publish a full technical report on arXiv (2510.24701) in October 2025 and updated it in May 2026, documenting the synthetic-data pipeline, agentic mid-training, and the IterResearch "Heavy" test-time-scaling mode — and the launch blog promises a "next generation of agentic models" that has not yet materialized.

Distribution has widened since launch: the model is now hosted on OpenRouter with roughly 131K context, including a free variant, alongside the original HuggingFace and ModelScope downloads.


Technical Innovation

Three key technical contributions:

  1. Fully automated synthetic data pipeline — Scalable data synthesis for agentic pre-training, SFT, and RL. No human annotation required
  2. Large-scale continual pre-training — Extends model capabilities on diverse agentic interaction data while maintaining freshness and reasoning
  3. End-to-end RL — Strictly on-policy GRPO with token-level gradients, leave-one-out advantage estimation, and selective negative sample filtering

At inference it supports two paradigms: vanilla ReAct (to measure intrinsic ability) and the IterResearch-based Heavy mode for maximum performance.

The key insight: a smaller, specialized model trained end-to-end for agentic search outperforms much larger frontier models on research benchmarks.


Pricing & Licensing

Free and open: Apache-2.0 weights on HuggingFace and ModelScope, so the only cost of self-hosting is compute — community reports put a quantized 30B-A3B comfortably on a single consumer GPU or Apple Silicon machine. For zero-setup access, OpenRouter hosts the model with ~131K context, including a free-tier variant.


Adoption

Beyond the ~47k monthly HuggingFace downloads and 19.4k stars, the model runs in production inside Alibaba: it powers personalized travel itineraries on Amap and underpins Tongyi FaRui, Alibaba's legal research agent. IBM's coverage highlighted the cost angle — Gabe Goodhart, IBM's Chief Architect of AI Open Innovation, called it "a really cool step forward" as an open LLM "that you can run on a personal workstation that measures up to a frontier research system."


Competitive Position

Strengths: SOTA-class benchmarks. Proven that fine-tuning beats prompting for research tasks. Efficient (3.3B active params). Open weights, free hosted option, real production deployments.

Weaknesses: Requires meaningful compute to run unquantized. Less flexible than prompt-based approaches (locked to Qwen architecture). Chinese-language ecosystem dominates community.


Cautions

  • No new model release since September 2025 and repo activity stalled in February 2026 — the "next generation" remains a promise, not a release
  • Benchmark leadership claims are self-reported by the Tongyi Lab team; independent replication is thin
  • Skeptics note the model is ultimately a Qwen3 MoE fine-tune, so its edge depends on the training recipe rather than a new architecture

What Developers Say

The November 2025 Hacker News thread (365 points) was broadly positive, with hands-on reports skewing favorable:

"Just tried this out with my web search mcp, extremely impressed with it. Never seen deep research this good from a model so small." — Nymbo, Hacker News

"It's a Qwen 3 MoE fine tune..." — mehdibl, Hacker News (the recurring skeptical take)

"At this point 'deep research' is more of a pattern... long-running research tasks that drive a search tool." — simonw, Hacker News, on where Tongyi fits in the category

Much of the thread was practical self-hosting discussion (llama.cpp, quantizations, consumer GPUs), which is itself a signal: developers treat it as a model you actually run, not just a benchmark artifact.


Bottom Line

Tongyi DeepResearch remains the strongest open-weights bet in autoresearch: a specialized 30B MoE that matches frontier research agents, free under Apache-2.0, cheap to run, and now one OpenRouter call away. The trade is freshness — nine months without a new checkpoint in a category where frontier hosted agents iterate monthly.

Recommended for: Teams that want self-hosted or low-cost deep research, privacy-sensitive workloads, and researchers studying agentic RL training recipes.

Not recommended for: Teams that want a turnkey hosted product with support, or LLM-flexible pipelines — prompt-based tools like GPT Researcher swap models freely; Tongyi is its model.

Outlook: Watch for the promised next-generation agentic models from Tongyi Lab — the May 2026 technical-report update suggests the research continues, but if no successor ships in 2026, prompt-based agents on newer frontier models will erode its benchmark lead.


Research by Ry Walker Research • methodology