The Release
Meta is introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and their first built using a mixture-of-experts (MoE) architecture.
Meta says that Llama 4 is its first cohort of models to use a mixture of experts (MoE) architecture, which is more computationally efficient for training and answering queries. MoE architectures basically break down data processing tasks into subtasks and then delegate them to smaller, specialized "expert" models. Maverick, for example, has 400 billion total parameters, but only 17 billion active parameters across 128 "experts." (Parameters roughly correspond to a model's problem-solving skills.)
The Competitive Edge
According to Meta's internal testing, Maverick, which the company says is best for "general assistant and chat" use cases like creative writing, exceeds models such as OpenAI's GPT-4o and Google's Gemini 2.0 on certain coding, reasoning, multilingual, long-context, and image benchmarks.
Importantly, there are three new models in total: Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth. All were trained on "large amounts of unlabeled text, image, and video data" to give them "broad visual understanding," Meta says.
The Licensing Question
Some developers may take issue with the Llama 4 license. Users and companies "domiciled" or with a "principal place of business" in the EU are prohibited from using or distributing the models, likely the result of governance requirements imposed by the region's AI and data privacy laws.
My Take: Llama 4 signals that open-source is no longer a competitor to proprietary models—it's a replacement architecture. The MoE design is smart engineering: fewer active parameters means cheaper inference on commodity hardware. Meta's EU restrictions are pragmatic capitulation to regulators, not strategic choice. For developers, this is huge: production-grade multimodal models without vendor lock-in or per-token pricing.