Mistral Medium 3.5 and Grok 4.3: When Efficiency Becomes a Competit...

Not every advance in LLMs needs to come from trillion-parameter models. Late April and early May 2026 brought two releases that bet on a different premise: frontier-level performance at substantially lower cost, with architectures that make deliberate choices about what to optimize.

Mistral Medium 3.5: The Best Open Dense Model for Code

The Mistral Medium 3.5, released on May 2, 2026, is a bet against the tide. While virtually every major model in 2026 uses Mixture of Experts, the Mistral Medium 3.5 is a dense model — 128 billion total parameters, all active on every inference.

The choice is not architectural naivety. It is a product decision: dense models have more predictable behavior across varied hardware, more consistent per-inference latency, and none of the edge cases that arise when MoE routing activates unusual experts. For self-hosted deployment, that has real operational value.

What the Medium 3.5 Delivers

The benchmark numbers are precise about where the model sits. On SWE-Bench, the Medium 3.5 scores 77.6% — better than any available open-source dense model. It falls below Claude Sonnet 4.6 and DeepSeek V4 Pro among higher-capacity models, but at half the cost of Sonnet.

The Medium 3.5 runs on four GPUs. It does not require a cluster of 8 or 16 GPUs like the larger MoE models. For operations that lack datacenter-scale infrastructure but want real code performance in production, that is the pitch: 77.6% SWE-Bench on hardware that fits in a four-GPU server.

Specifications:
128 billion parameters (dense)
256,000 token context window
$1.50 per million input tokens via API
Available as open weights under a modified MIT license

The modified MIT license matters: it allows commercial use without restrictions in most cases, with specific attribution requirements. More open than the Meta Llama license, less open than pure MIT.

Positioning in the Ecosystem

The Medium 3.5 replaces Devstral 2 and Magistral in Mistral's lineup, consolidating chat, reasoning, and code capabilities into a single model. Mistral explicitly stated it will no longer maintain separate models for each function — Medium 3.5 is the unified flagship.

For engineering teams that prefer a single production model for varied tasks — rather than routing between multiple specialized models — this simplifies system architecture.

Grok 4.3: Native Reasoning and Aggressive Pricing

Grok 4.3, from xAI, entered beta on April 17, 2026, with general API access from May 1. It is the most significant xAI release since Grok 4.20.

What Changed from 4.20

The 4.3 incorporates native reasoning — the model "thinks" before responding, similar to the approach of DeepSeek R1 and OpenAI o3. Reasoning is integrated into standard inference, not a separate mode that needs to be explicitly activated.

The context window is 1 million tokens. Native video input — the ability to directly process video files, not just static images — sets the 4.3 apart from most competitors that still process video via extracted frames.

Key benchmarks:
Intelligence Index: 53 (market median: 35)
CaseLaw v2: first place among all tested models
CorpFin: first place among all tested models
300+ Elo gain in GDPval-AA versus Grok 4.20

CaseLaw and CorpFin are legal and financial reasoning benchmarks, respectively. Leading those categories signals relevant specialization for specific professional sectors.

Pricing and Access

Grok 4.3 via API is priced at $1.25 per million input tokens — aggressively below GPT-5.5 and Claude Opus 4.7, and competitive with Gemini 3.1 Pro. The model has no open weights; it is accessible exclusively through xAI's API.

xAI does not have the integration ecosystem of OpenAI or Anthropic, but the pricing and benchmarks in legal and financial reasoning create a clear niche: professional services firms that need sophisticated reasoning at controlled cost.

The Logic of Efficiency

What Mistral Medium 3.5 and Grok 4.3 share is a positioning that does not compete directly with GPT-5.5 or Claude Opus 4.7 at the absolute performance ceiling. They compete at the second tier — models that deliver 85-90% of frontier performance at 30-50% of the cost.

For most production use cases, that second tier is sufficient. The difference between 87% and 77% on SWE-Bench matters for software engineering automation at scale. For document analysis, content generation, customer support, and most enterprise workflows, it does not.

Pricing acting as a form of technical competition is one of the most important dynamics of 2026. It is not just the best models that reshape markets — it is the good-enough models at prices that make adoption irresistible.

Mistral Medium 3.5 and Grok 4.3: When Efficiency Becomes a Competitive Advantage