← Blog

Llama 4 and DeepSeek V4: open source has truly reached the frontier

11 jun 2026

In April 2026, Meta launched Llama 4 Scout and Llama 4 Maverick — the first natively and genuinely multimodal open source models, with a Mixture of Experts architecture, and the largest context window ever available in any model: 10 million tokens in Scout. In the same month, DeepSeek launched V4 with 1.6 trillion total parameters and 49 billion active — the largest open source model in history in terms of total parameters. The 2024 thesis — "open source is 2 years behind closed" — had to be revised.

Llama 4: the Meta family

Llama 4 arrived with two models available and a third in development.

The Scout has 17 billion active parameters and 16 experts (109B total). The 10 million token context window is the most impressive figure of the launch — equivalent to processing an entire research library, a company's complete code repository, or several years of meeting transcripts in a single call. It fits on one H100 GPU with INT4 quantization. It is multimodal by default: it processes text, image, audio, and video.

The Maverick uses the same 17B active parameters but with 128 experts and 400 billion total parameters. The context is 1 million tokens. On LMArena — a blind comparison of human preferences — it scored an Elo of 1,417, surpassing GPT-4o and Gemini 2.0 Flash. GPQA Diamond: 69.8%.

The Behemoth, with 288 billion active parameters and an estimated 2 trillion total, was announced to be in training in April 2025. As of April 2026, it has not yet been publicly released — instabilities in MoE routing at scale have been reported as a factor in the delay.

License: The Llama 4 Community License allows commercial use for organizations with fewer than 700 million monthly active users. Attribution is mandatory. Use of vision capabilities is restricted for entities domiciled in the European Union.

DeepSeek V4: the frontier at open cost

DeepSeek V4 represents DeepSeek's greatest technical ambition to date. The V4-Pro has 1.6 trillion total parameters with 49 billion active per inference — an activation ratio of only 3%, the most efficient in the industry. The V4-Flash uses 284 billion total with 13 billion active.

The main architectural innovation is Compressed Sparse Attention (CSA): tokens are compressed into summary representations, and each new token attends only to the top-k most relevant ones instead of the entire sequence. This allows a context of 1 million tokens with manageable memory consumption. The V4-Pro-Max scored 80.6% on SWE-Bench Verified — the highest score ever recorded in autonomous bug resolution in real code.

Price and license: V4-Flash at US$ 0.14/M input tokens; V4-Pro at US$ 1.74/M. Both under MIT or Apache 2.0 — completely free for commercial use, modification, and redistribution. V4 was trained on Huawei Ascend hardware instead of NVIDIA GPUs, demonstrating that dependence on American infrastructure can be circumvented.

Qwen 3.5: Alibaba's bet

Alibaba entered 2026 with Qwen 3.5-397B, an MoE model with 17 billion active parameters and 512 experts. The native context is 262 thousand tokens with extension to 1 million. The model surpasses GPT-5.2 on IFBench — the instruction-following benchmark — with 76.5 vs. 75.4. Qwen3.5-9B (only 9B parameters) surpasses GPT-OSS-120B on GPQA Diamond: 81.7% vs. 71.5%.

The license is Apache 2.0, with the only restriction being an approval process for operators with more than 100 million monthly users.

What changed in open source competitiveness

The difference between open source and closed models is now more nuanced than "better vs. worse." On pure mathematics benchmarks (AIME), DeepSeek V3.2 scores 96.0% — comparable to GPT-5.2. In instruction following, Qwen 3.5 surpasses OpenAI's proprietary models. In cost per self-hosted inference, the advantage is 10 to 100 times versus closed APIs.

The gap persists in complex agentic tasks (benchmarks like Terminal-Bench and SWE-Bench Pro), in security and alignment (open source models have less public validation), and in advanced multimodality (native video is still limited in the open ecosystem).

The strategic implication for technology managers is clear: for high-volume workloads, regulated privacy, or deep vertical customization, self-hosting frontier open source models has gone from an experimental option to a concretely competitive alternative.

Recibe las publicaciones

Nuevos artículos sobre IA, Vibe Code y Builder Code — por correo o Telegram.

o
Recibir en Telegram

Al suscribirte, aceptas recibir correos/mensajes y la Política de Privacidad. Puedes cancelar cuando quieras. Sin spam.