Qwen 3.5 and 3.6: Alibaba and the ambition to cover every use case

Alibaba's strategy with the Qwen series is different from that of the other manufacturers. While Meta releases three models, DeepSeek releases two, and Mistral releases one, in 2026 Alibaba simultaneously covers mobile devices with fewer than 1 billion parameters and datacenter servers with 397 billion — all in the same family, with the same base architecture, under the Apache 2.0 license.

Qwen 3.5: A complete family

Qwen 3.5 was released in March 2026 with an unusual proposition: eight model sizes in a single version. The small models — 0.8B, 2B, 4B, and 9B — were released in the same cycle as the 397B-A17B flagship (397 billion total parameters, 17 billion active via MoE).

The logic is one of platform: a company that adopts Qwen 3.5 can use the 9B model for local inference at the edge, the 27B or 35B for medium-sized on-premises servers, and the 397B via API for high-complexity tasks — all with the same model family, consistent behaviors, and fine-tuning that is portable across sizes.

Hybrid architecture

Qwen 3.5's technical differentiator is the combination of architectures: Gated Delta Networks (linear attention) integrated with a sparse MoE system. Linear attention reduces the quadratic growth of computational cost as context increases — critical for models that support up to 1 million tokens.

All models in the family are natively multimodal, processing text, images, and video via early fusion of multimodal tokens — the technical equivalent of having been trained with all modalities from the start, rather than having received vision as an add-on module. Support for 201 languages and dialects.

Benchmarks

The 9B model scores 70.1 on MMMU-Pro (a visual reasoning benchmark), 22.5% above GPT-5-Nano on the same benchmark. The 397B-A17B flagship competes with frontier closed models on reasoning and agentic tasks.

Qwen 3.6: Focus on code and agents

In April 2026, Alibaba released Qwen 3.6, specifically the 3.6-35B-A3B model (35 billion total, 3 billion active per inference). It is a model designed for coding and agentic tasks, not for general use.

The 3.6-35B-A3B's results on Terminal-Bench 2.0 reach 51.5, and on SWE-Bench Verified they reach 73.4 — within the competitive range of much larger coding models. Released under Apache 2.0, it runs comfortably on consumer hardware with a single GPU of adequate memory.

Qwen 3.6-Plus, the larger version of the family, has a standard context of 1 million tokens and agentic benchmarks comparable to Claude Opus 4.5. The cost via API is US$ 0.38 per million input tokens — for comparison, Gemini 3.1 Pro costs US$ 2.00 on the same benchmark. The cost difference is 5x for similar performance on many tasks.

What makes Alibaba different

There are two aspects that set the Qwen strategy apart from the rest.

The first is the scale of language support. 201 languages is more than any other frontier model. For applications in Asian, Middle Eastern, and African markets, where other models' support for local languages is superficial, the Qwen models have a practical advantage.

The second is the consistency of the family. Most competitors release models with distinct architectures for each market segment — one model for edge, another for cloud, another for coding. Alibaba maintains a coherent family with consistent behavior and training, which simplifies the workflow for anyone who needs multiple sizes for different deployment targets.

The strategic context

Like DeepSeek, Alibaba operates under American export restrictions on advanced hardware. The architectural efficiency of Qwen 3.5 and 3.6 — especially in the combination of linear attention with MoE for long contexts — is in part a response to these restrictions.

The result: models that are competitive with the state of the art while using less compute per inference. For those running these models in production, that translates into lower cost and higher throughput per available GPU.

Apache 2.0 across the entire family removes legal barriers to enterprise adoption. With Llama 4 restricted in Europe and DeepSeek under MIT, the Qwen 3.5/3.6 models become one of the most legally straightforward options for global use without regional restrictions.

Position in the ecosystem

In May 2026, the Qwen series occupies a specific space: it is not the best model on any single benchmark, but it is one of the few options that offers the full range of sizes, native multimodality, extensive language support, and a truly open license in a single family. For those who need broad coverage with a unified platform, it is the lowest-friction choice available.