The Headline Is Rarely the Strongest Model
When a lab announces a new model and the press runs with it, owners are tempted to assume the name in the headline is the most capable thing available. It usually is not. Anthropic's recent attention went to Sonnet 5, but Sonnet 5 is the balanced mid tier of the Claude family. The flagship, the most capable tier, is Opus 4.8. Below Sonnet sits Haiku, the fast and low-cost tier, alongside Fable 5, a specialized member of the same family. OpenAI ships the same shape: GPT-5.6 arrives as Sol, the flagship, Terra, the balanced tier positioned at roughly half the cost of the prior generation at similar performance, and Luna, the fast and lowest-cost tier.
This is now the standard structure across every serious lab, and it exists for a reason. A single model cannot be the strongest, the cheapest, and the fastest at once. So labs split the family into tiers and let buyers choose. The headline tends to land on whichever release is most newsworthy or most widely deployed, which is often the mid tier rather than the flagship. Reading the press will tell you what launched. It will not tell you which tier belongs in your business.
Reserve the Flagship, Run the Mid Tier, Scale the Fast Tier
The flagship tier earns its premium on genuinely hard problems: dense legal or financial reasoning, multi-step analysis where one wrong assumption invalidates the output, complex code, and work where a mistake is expensive to catch later. For that class of task, paying for Opus 4.8 or Sol is the cheap option, because the cost of a weak answer dwarfs the cost of the better model. This is where you do not economize.
The mid tier is where most of the day actually runs. Sonnet 5 or Terra will handle drafting, summarizing, customer replies, research synthesis, and the steady stream of routine knowledge work to a standard most teams will not be able to distinguish from the flagship. The fast tier then takes the high-volume, low-stakes load, the classification and tagging and bulk processing that runs thousands of times an hour, where speed and cost per call matter more than the last few points of capability. Match the tier to the job and you spend money where it changes the outcome.
How an Owner Should Actually Choose
You do not need to track version numbers to make this decision well. You need a short rule. Ask what happens if the answer is wrong. If a wrong answer is expensive, slow to detect, or hard to reverse, route the task to the flagship. If a wrong answer is cheap to spot and fix, the mid tier is the right call and the savings are real. If the task runs at high volume and each call is low stakes, the fast tier is built for exactly that. The same logic holds whether you standardize on the Claude family or the GPT family, because both ship the same three roles.
The trap is buying by name. Standardizing everything on the flagship means overpaying across thousands of routine calls for capability that never gets used. Standardizing everything on the model in the headlines means quietly under-powering the handful of hard problems that justified bringing AI in at all. Neither is a strategy. The owners who get this right treat the tier as a deliberate choice per workload, the same way they would never put the senior partner on the photocopying or hand the merger to the intern.
Read next: When You Cannot Buy The Best Model · Claude Science And Your R&D Diligence