Llama 4: Meta redefines open source with native MoE and multimodality

Meta has released relevant open models before — Llama 2 and Llama 3 had real impact on the community. But Llama 4, released in April 2026, represents a shift in scale and architecture that goes beyond the previous versions. It is Meta's first family of open models with a native Mixture of Experts architecture and multimodal capability from training onward.

The Llama 4 family

Llama 4 arrived with three models, at different stages of availability.

Scout has 109 billion total parameters and 17 billion active per inference, with 16 experts in the MoE. It is the model designed to run on a single NVIDIA H100 GPU — which makes it accessible to those who don't have cluster infrastructure. Its context window is 10 million tokens, the largest among open models at the time of release.

Maverick has 400 billion total parameters with the same 17 billion active, but with 128 experts in the MoE routing. It requires a DGX H100 system or an equivalent multi-GPU setup. On the available multimodal benchmarks, it surpasses GPT-4o and Gemini 2.0 Flash.

Behemoth is in a different category: 2 trillion total parameters and 288 billion active. It has not yet been made publicly available — it was announced mainly as a "teacher model", used to improve Scout and Maverick via co-distillation. When — and if — it is released, it will represent the largest open source model ever made available.

Why MoE matters here

Llama 3 was a dense model. Every token processed activated all the parameters. Llama 4 changed that: with MoE, Scout processes each token using only 17 billion of the 109 billion available parameters. The computational cost of inference drops significantly.

For those who use models in production — especially at high volumes — this difference has a direct impact on cost per token and on system throughput. Scout was designed specifically to be viable on single-GPU hardware in a way that Llama 3-70B simply was not, despite having superior performance.

Native multimodality

"Native" here has a specific meaning: Scout and Maverick were trained on text and image data from the start, they did not have visual capability added via later fine-tuning. This tends to result in better integration between the modalities — the model reasons about images the same way it reasons about text, without the architectural separation of models that received vision as an add-on.

Both were trained on data covering 200 languages, with in-depth support for 12 of them, including Arabic, Spanish, German, and Hindi. The training corpus totals 40 trillion tokens.

The 10 million token window

Scout's 10 million token context was the largest among open models at release. To put it in perspective: 10 million tokens are approximately 7.5 million words — the equivalent of several complete books, or an entire codebase of a medium-sized project.

In practice, this opens up use cases that were previously exclusive to proprietary APIs with premium pricing: analysis of complete documentation, ingestion of large codebases, reasoning over extensive datasets in a single call.

Licensing: the point to watch

Llama 4 uses the Meta Llama license, which allows commercial use for most companies. But there are two important restrictions that differ from truly open licenses like MIT.

Companies with more than 700 million monthly active users need a special license from Meta. And, at the time of release, users and companies domiciled in the European Union were prohibited from using or distributing the models — a restriction with significant practical implications for global operations.

This places Llama 4 in a different category from DeepSeek V4 (MIT) and Gemma 4 (Apache 2.0) in terms of unrestricted freedom of use. For most companies, it is not a problem. For platform-scale operations or those with significant European presence, it requires legal analysis.

Meta's strategic positioning

Meta does not release open models out of altruism. The strategy is consistent: by establishing Llama as the foundation of the open source ecosystem, Meta ensures that its hardware (MTIA), its AI products (Meta AI), and its infrastructure (PyTorch) remain central to global AI development.

Llama 4 Scout running on an H100, with 10M context and native multimodality, is the most convincing version of that argument Meta has ever made. The model is not just good enough for production use — for many use cases, it is the best available.