FullduplexFullduplex/
the verticalsv13 / 17#alibaba#qwen-audio§ 06 sections · 05 figures

Alibaba DAMO and the Qwen Audio team: the most-downloaded open audio lab closed only its flagship.

A Chinese big-tech lab populated the middle lane between fully-open and fully-closed. By April 2026 Qwen is the largest open audio family on Hugging Face, with more than one billion cumulative downloads. Yet Qwen3.5-Omni, released March 30, 2026, ships API-only. Read as open-base-closed-frontier: the first concrete signal from a Chinese lab of a third option between Meta and Mistral on one side, OpenAI and Anthropic on the other.

verticals · v13 of 17 · subject profile
A Chinese big-tech lab that shipped a field-defining open audio model three years running, passed one billion downloads on Hugging Face, and then closed only the flagship. The base stays on Apache 2.0, the frontier moves behind an API. The first visible middle lane.
subject: Alibaba DAMO / Tongyi Lab · Hangzhou · Qwen 2023–~1B+ downloads · 200k+ derivatives · 32/36 open SoTA

1. One billion downloads, and one X post

March 3, 2026, just after 11 in the morning. A researcher named Lin Junyang posted 11 English words on X: me stepping down. bye my beloved qwen. The post drew 6.5 million views in 24 hours. He had led Alibaba’s large-model Qwen team for seven years, and up to the night before the resignation announcement he was overseeing the release work for the new Qwen3.5 model.

There is a number worth fixing here. The Qwen family that Lin held together had, as of March 2026, passed one billion cumulative downloads on Hugging Face (the model-distribution site), with more than 200,000 derivative models. In February 2026 alone, Qwen recorded 153.6 million downloads. That is more than double the combined total of the next eight companies, including Meta, DeepSeek, and OpenAI. According to SCMP, a single Alibaba family accounts for more than half of global open-source AI downloads.

Narrow the lens to audio and a different set of numbers appears. Qwen3-Omni (released September 2025) is a 30B Mixture-of-Experts model. Concretely, the design places several experts side by side and activates only the parts needed for each input. Of 30B total parameters, only 3B are active at any moment. The GitHub repository has 3,500 stars, and a single variant has 410,000 downloads. According to the arXiv 2509.17765 technical report, it takes open-source SoTA on 32 of 36 speech and audio-visual benchmarks, with a theoretical latency of 234 ms, text dialogue in 119 languages, speech understanding in 19, and speech generation in 10.

Then on March 30, 2026, when Qwen3.5-Omni was released, the weights stopped shipping. Three variants, Plus, Flash, and Light, all accessible only through the API. Winbuzzer put it plainly: “Qwen3.5-Omni is closed-source.”

The thesis, placed early. Alibaba’s Qwen Audio program is the first case from a Chinese big-tech lab to make visible the posture of closing only the flagship. Yet the open weights and blueprints of the Qwen3 generation underneath it are now scattered in places that cannot be pulled back. In restaurant terms, the signature dish has been moved to a closed kitchen, but the recipes and the cooking tools have already been handed out to chefs all over town.

Three points to organize the argument.

Point 1 — The open base cannot be pulled back

The Apache 2.0 license on Qwen3-Omni (the most permissive license, with a patent-grant clause) cannot be retracted. The 410,000 weight downloads that have already happened will survive as derivative models and community fine-tunes, even if Alibaba deletes the repository tomorrow.

Point 2 — The blueprints have already been distributed

Thinker-Talker (a design that splits a text-writing Thinker from an audio-emitting Talker), TMRoPE (a method that embeds time-alignment between video and audio into positional encoding), and the OmniFlatten flatten curriculum (a staged curriculum that folds speech and text into a single stream for training). Each one is already public as an arXiv paper, and StepFun, Moonshot, Tencent, and FlashLabs have taken them into the spines of their own models.

Point 3 — A closure is evidence, not proof

The Qwen3.5-Omni closed release is a single event. Whether Qwen4 and Qwen5 return to open, move to fully closed, or settle at this middle posture will become clear over the next 12 to 24 months.

fig.f7 · three timelines·········
Three timelines stack on a single quarterSeptember 2025 to March 2026, vertical dashed line at March 3, 2026Mar 3, 2026ModelQwen3-OmniSep 26, 2025Omni-FlashDec 1, 2025Qwen3.5-OmniMar 30, 2026OrgHuibin → MetaJan 2026Lin + Yu departMar 3 (rep.)StructFMTF splitearly Mar 2026
Figure F7. Model, organization, and structure timelines converging on a single quarter between September 2025 and March 2026. The model axis runs Qwen3-Omni (9/26) → Omni-Flash (12/1) → Qwen3.5-Omni (3/30). The org axis shows Huibin moving to Meta (reported, January) and Lin plus Yu resigning (3/3, primary confirmation still pending). The struct axis marks the launch of the Foundation Model Task Force. The correlation is clear, but causation cannot be asserted from outside.

2. From iDST to Tongyi Lab — twelve years of compounding

For a single lab to ship this much output, the time still has to be there.

Alibaba’s speech AI research goes back to iDST (Institute of Data Science and Technologies, an in-house research unit) founded in 2014. In October 2017, iDST was absorbed into DAMO Academy. DAMO, written 達摩院, is a long-horizon basic research institute that started under a three-year charter. Concretely, it works like an internal think tank aimed at science-grade quality rather than fast commercialization. The founding director is Jingren Zhou, an ACM Fellow. Out of this came Paraformer (a Chinese ASR baseline) and SenseVoice (multilingual ASR).

Later, under a sub-organization called FunAudioLLM, CosyVoice was released. v1 and v2 in 2024, v3 in 2025, all Apache 2.0, multilingual streaming TTS with zero-shot voice cloning (the ability to reproduce a speaker’s voice from a few seconds of sample audio). It became, in effect, the reference open-source TTS of 2024.

The Qwen Audio line began a year later. Qwen-Audio (November 2023) connected a Whisper-derived speech encoder to the Qwen LLM. Qwen2-Audio (July 2024) released 7B weights under Apache 2.0. OmniFlatten (October 2024, paper-only, accepted to ACL 2025 Findings) introduced the flatten curriculum that folds four streams into two. Qwen2.5-Omni (March 2025) brought in Thinker-Talker and TMRoPE, releasing 3B and 7B under Apache 2.0. Qwen3-Omni (September 2025) scaled that up to a 30B MoE.

In short, Alibaba is a lab that has produced a field-defining open audio model three years running. Even among commercial labs of comparable size, this cadence (release frequency) is rarely observed.

fig.f1 · release timeline·········
2023202420252026Qwen-AudioApache 2.0Qwen2-AudioApache 2.0OmniFlattenpaper-onlyQwen2.5-OmniApache 2.0Qwen3-OmniApache 2.0Qwen3-Omni-Flashclosed upgradeQwen3.5-OmniAPI-onlyLicense switch(Dec 2025 → Mar 2026)
Figure F1. Qwen audio release timeline from November 2023 to March 2026. Green is Apache 2.0, yellow is paper-only, red is API-only. Between December 2025 and March 2026 only the flagship tier shifted to red.

3. The people behind the release list

The name holding the 2026 Qwen Audio program together is Lin Junyang. According to HelloChinaTech, he joined DAMO Academy in 2019 and became technical lead at the founding of Tongyi Lab (通义实验室, the organizational home of the Qwen large-model program) at the end of 2022. He is lead author on Qwen-Audio and Qwen2-Audio, and a senior co-author on the Qwen2.5-Omni technical report. First-author continuity on this lab’s audio flagship has been carried by a single researcher across three generations. For commercial AI research that continuity is unusual.

“Junyang leaving Qwen means the end of an era.”— Wenting Zhao, Qwen researcher (X, March 3, 2026)

The context around Lin’s resignation, as far as it is visible from outside (the personnel details below draw on press reporting, with primary-source confirmation still in progress). According to TechCrunch and Bloomberg, he was leading the release work for the Qwen 3.5 small model until 24 hours before the resignation. On the same day, Yu Bowen, reported to be the post-training lead, is reported to have resigned. Two months earlier, in January 2026, Qwen Code lead Huibin was reported to have moved to Meta.

Alibaba CEO Eddie Wu formally accepted the resignation and, together with Group CTO Wu Zeming and Alibaba Cloud CTO Jingren Zhou, launched a Foundation Model Task Force. Wu told SCMP that “Advancing foundation models is a core strategic priority for our future.” On the Q3 FY2026 earnings call he projected $100B in AI and cloud revenue over five years and $53B in infrastructure over three years. At that scale, Qwen Audio is no longer “one team inside a small lab.” It has been relocated to the core of a public company’s AI strategy.

On the institutional layer above this sits Jingren Zhou. Microsoft Research principal scientist, founding director of DAMO Academy in 2017, and from March 2026 Alibaba Cloud CTO and co-chair of the Foundation Model Task Force. The FunAudioLLM sub-organization (a speech-specialist unit inside Tongyi) keeps developing CosyVoice, SenseVoice, Paraformer, and Qwen3-TTS in parallel with the Omni line. In other words, as of 2026 the “Qwen Audio team” is the joint output of three organizational layers: the Tongyi Lab speech group, the FunAudioLLM sub-team, and the Qwen large-model program.

fig.f5 · release lineage·········
Qwen Audio lineage, 2023-2026Release cadence, architectural milestones, and the people who carried themQwen-AudioNov 2023Whisper encoder+ Qwen LLMQwen2-AudioJul 20247B, voice-chatmode addedOmniFlattenOct 20244→2 streamflatten (paper)Qwen2.5-OmniMar 2025Thinker-Talker+ TMRoPEQwen3-OmniSep 202530B MoE3B activeOmni-FlashDec 2025closedupgrade3.5-OmniMar 2026API-onlyPlus/FlashAuthor / team continuityLin Junyang (lead Qwen-Audio, Qwen2-Audio; senior Qwen2.5-Omni) · Yu Bowen (post-training)FunAudioLLM (CosyVoice, SenseVoice, Paraformer) · Tongyi Speech Lab (OmniFlatten)Jingren Zhou (DAMO founding director 2017 → Alibaba Cloud CTO 2026, institutional parent)
Figure F5. Release lineage from Qwen-Audio (November 2023) to Qwen3.5-Omni (March 2026). Green stroke marks Apache 2.0 weight releases, yellow marks paper-only (OmniFlatten), red marks closed API. Each box lists the architectural motif, and the lower author-team layer shows continuity across three generations.

4. The four structural capabilities downstream depends on

What this lab supplies to downstream is not individual models. It is the base that the Chinese-language speech AI ecosystem already treats as given. Four capabilities, organized.

First — Release cadence on the open frontier

Five field-defining open models in three years. The usual open audio lab ships one and stops, or releases irregularly every few years. Alibaba has held roughly the same pace for three years. Simon Willison wrote in March 2026: “the Qwen 3.5 family, an outstanding family of open weight models.”

Second — Scale that academia cannot reach

Qwen2.5-Omni was pretrained on 1.2 trillion tokens across 119 languages. Qwen3-Omni takes open-source SoTA on 32 of 36 benchmarks. Qwen3.5-Omni is reported by secondary coverage to have been pretrained on 113-language TTS plus a corpus of “more than 100 million hours of audio and video,” with primary-source confirmation still pending. To put the size in context, academic labs and small open-source groups cannot fund the compute for pretraining at a 100-million-hour scale. An industrial lab is left to fill that slot, and Alibaba has filled it for three years.

Third — Design templates that downstream copies

The OmniFlatten flatten curriculum, the Qwen2.5-Omni Thinker-Talker, and TMRoPE are cited as standard templates in 2024–2025 Family 2 (interleaved-flatten) papers. Concretely, StepFun Step-Audio-R1.1, Moonshot Kimi-Audio, Tencent Covo-Audio, and FlashLabs Chroma all embed these primitives in the spine of their own models. Whichever way the flagship license lands, the entire Family 2 field is already running on Alibaba’s design footprint. The Qwen2.5-Omni official blog captures the intent in a line: “Thinker functions as a large language model tasked with text generation, while Talker is a dual-track autoregressive model.”

Fourth — A substrate for the community

More than one billion downloads across the Qwen family, more than 200,000 derivative models. That is the largest single open model family in the world. Academic researchers, benchmark authors, and second-tier model builders can stack their work on top of this base tier. The flagship moving behind an API does not break that layer.

What this lab supplies has the character of a Linux distribution. Not an individual application, but a base others can freely build on. A commercial flagship moving behind a paid API does not reduce the substrate’s value as a public good.

fig.f2 · licensing posture matrix·········
LabBase-tier licenseFlagship-tier licensePosture
Alibaba (Qwen)Apache 2.0 (Qwen3-Omni)Closed API (Qwen3.5-Omni Plus/Flash)Open-base closed-frontier
TencentCC BY 4.0 (Covo-Audio)CC BY 4.0 (Covo-Audio-Chat-FD)Open-all
ByteDanceSALMONN-omni paperClosed (Doubao, Seeduplex)Research-open, product-closed
BaichuanCommunity license with carve-outsCommunity license with carve-outsSemi-open
Moonshot (Kimi)MIT (Kimi-Audio)MIT (Kimi-Audio)Open-all
StepFunApache 2.0 (Step-Audio 2)Step-Audio-R1.1 license SOURCE_NEEDEDOpen-cadence continuing
DeepSeekPermissivePermissiveOpen-all
Figure F2. Licensing posture matrix across major Chinese AI labs (Alibaba / Tencent / ByteDance / Baichuan / Moonshot / StepFun / DeepSeek). Two axes: base-tier license and flagship-tier license. Alibaba is the only lab where a split (green base, red flagship) is visible inside a single product family.

5. “Qwen3 is still open, so isn’t this talk of a pivot premature?”

Take the counterargument head-on. Qwen3-Omni is still open as of April 2026. Isn’t calling this an “open-to-closed pivot” speculation rather than evidence? Honestly, partly yes and partly no.

The model timeline is clean. Qwen3-Omni (September 26, 2025) was the last fully-open flagship. Qwen3-Omni-Flash (December 1, 2025) was the first closed upgrade, available only through Alibaba Cloud Model Studio, with Alibaba itself reporting a +3.2-point gain on VoiceBench. Then at Qwen3.5-Omni (March 30, 2026), a generational break. Plus and Flash are API-only, and the license for Light is not disclosed at the time of writing. The fact of two consecutive flagship generations closing is evidence.

The organizational timeline is murkier. Caixin Global and VentureBeat reported Lin’s resignation. A colleague’s “end of an era” post, and another colleague’s post reading “I know leaving wasn’t your choice,” are the externally visible information. These are observations from outside, and the causal chain of decisions inside Alibaba is not visible. Eddie Wu, for his part, has said in SCMP “We will further scale up investment in AI research and development,” committing publicly to an expansion rather than a contraction.

The structural timeline is consistent. In March 2026, Tongyi Lab was reorganized from vertical integration (the Qwen team moving as one) to a horizontal functional split (by pre-training, post-training, visual, and image generation). The Foundation Model Task Force was placed above that. Closure of the license, the reported departures, and the functional split all fall within the same six-month window, which is consistent. Consistency is not causation.

Three plausible hypotheses for the closure. First, commercial pressure. Inside Alibaba Cloud’s AI revenue line, Qwen3.5-Omni Flash at $0.10/$0.80 per 1M tokens works as a monetization vehicle. Second, Chinese generative-AI regulation. The CAC Interim Measures require a security assessment for services with “public opinion attributes,” and a closed API concentrates responsibility inside the filing envelope. Third, the January 2026 easing of US export controls on H200 (not a reason to close, but a background tailwind that makes running a frontier API at scale economically workable). Read plainly, commercial pressure is first, regulation second, export-control easing a backdrop.

What matters is posture, not a single-quarter event. Alibaba has moved from open-weights-by-default to open-base-closed-frontier. This is the first clear signal from a Chinese big-tech lab, which means a new middle lane has been populated between the fully-open posture of Meta and Mistral and the fully-closed posture of OpenAI and Anthropic. DeepSeek is open-all, ByteDance Doubao is all-closed, Tencent Covo-Audio holds CC BY 4.0 open, Moonshot Kimi-Audio holds MIT open, StepFun Step-Audio 2 holds Apache 2.0 open. Alibaba is the only one that has made an internal split visible.

fig.f4 · open-to-closed bifurcation·········
Permissive(Apache/MIT)Closed(API-only)Base tier stays openFlagship tier closesQwen-AudioQwen2-AudioOmniFlattenQwen2.5-OmniQwen3-OmniQwen3-Omni-FlashQwen3.5-OmniFork point: Sep 2025
Figure F4. Open-to-closed bifurcation inside the Alibaba audio stack. The X axis is generation, the Y axis is license posture (permissive at the top, closed at the bottom). The fork point is Qwen3-Omni (September 2025). The base tier stays on the permissive line, the flagship tier drops to the closed line.

6. The footprint left behind, and the signals to watch for the next five years

Place the argument so far inside the broader map of the speech-to-speech industry.

Within the four-family taxonomy organized in Article 03, Family 2 (interleaved-flatten) is the most populated family, and its spine has been built by the Alibaba line (OmniFlatten → Qwen2.5-Omni → Qwen3-Omni → Qwen3.5-Omni). When the top of the spine closes, the open-frontier torch is redistributed to StepFun Step-Audio-R1.1, Moonshot Kimi-Audio, Tencent Covo-Audio, and FlashLabs Chroma. The open frontier is not destroyed. It is reshaped.

On benchmark reproducibility, this adds one more entry to the “commercial API dependency” problem that Articles 07 and 08 foregrounded. Full-Duplex-Bench v1 through v3, Artificial Analysis S2S, and VocalBench-ZH have all used the open weights of Qwen3-Omni as a reference. Once Qwen3.5-Omni becomes API-only, the usual reproducibility costs (rate limits, version changes, regional availability) come with it. Third-party evaluators will have to choose whether to anchor Chinese-language benchmarks to Qwen3-Omni (frozen at September 2025) or Qwen3.5-Omni API (a moving target).

On training-data transparency, the arguments in Article 04 and Article 10 intersect with Alibaba’s licensing bifurcation. Qwen2.5-Omni disclosed 1.2 trillion tokens of multimodal data in its technical report, with no breakdown of audio hours. The “more than 100 million hours” figure for Qwen3.5-Omni is from secondary coverage, with primary confirmation pending.

The insight worth keeping: open-base-closed-frontier is the first concrete signal from a Chinese lab of a “third option” populated between the Meta and Mistral playbook and the OpenAI and Anthropic playbook. Alibaba’s move has defined how to keep the base tier (Qwen3-Omni Apache 2.0, Qwen3-TTS, CosyVoice v3) as a public good for the community while moving only the flagship behind the API.

Three signals to watch over the next five years.

Signal 1 — Whether Qwen4 and Qwen5 keep an open base

As of April 2026, only the Qwen3.5 generation is closed. Qwen3 and its sibling Qwen3-TTS repo remain open. If Qwen4 releases the base tier openly, this middle posture is a stable strategy. If not, it was transitional.

Signal 2 — Whether other Chinese labs follow or collect at the extremes

StepFun, Moonshot, and Tencent currently keep their flagships open, while ByteDance Doubao is closed. The distribution one year from now will show whether Alibaba’s posture is contagious.

Signal 3 — Whether a Fisher-scale dialogue corpus originates inside China

Alibaba has demonstrated that the scale (more than 100 million hours) and the architecture (Thinker-Talker, TMRoPE, flatten curriculum) are industrially reachable. The rate-limiting step is whether a full-duplex dialogue data layer follows under licenses compatible with open-weights training.

Alibaba’s three-year-long open-weights wager was a bet that architectural infrastructure compounds. Qwen3-Omni Apache 2.0, the most-copied Family 2 template, more than one billion downloads, FunAudioLLM still shipping Apache 2.0 releases. The payoff at the base tier holds regardless of where the flagship lands. What changed in March 2026 is the height of the open floor, not its existence.

The line Andrej Karpathy wrote about DeepSeek on X in December 2024 fits Alibaba’s position too: “frontier-grade LLM trained on a joke of a budget.” The fact that the open research frontier moves on a different axis from the capital and compute of Western big tech is the precondition for interpreting the next five years of Chinese big-tech labs.

Benchmark collaboration. Fullduplex.ai is building multilingual live evaluation, paralinguistic audio-level scoring, and reproducibility protocols that accommodate both open-weights and closed-API STS models. If your lab or team is designing evaluation infrastructure that incorporates Qwen3-Omni open weights or the Qwen3.5-Omni closed API as a reference, we would like to collaborate on shared test sets and reproducible measurement protocols. Contact hello@fullduplex.ai.