What DeepSeek announced
On 30 June 2026 DeepSeek said the official version of V4 will ship in mid July, graduating the preview that has been available since 24 April, as reported by TechNode. The headline feature is not a benchmark. It is a price mechanism: for the first time on a major AI API, tokens will cost different amounts at different times of day, with rates doubling during the daily windows of 9:00 to 12:00 and 14:00 to 18:00, which correspond to Chinese business hours, and off-peak pricing unchanged.
The models themselves are substantial: a 1 million token context window becomes standard across the lineup, V4-Pro is a 1.6 trillion parameter mixture-of-experts design with 49 billion active parameters, and V4-Flash a 284 billion parameter model with 13 billion active. DeepSeek's documentation adds a hard deadline: the older deepseek-chat and deepseek-reasoner endpoints become inaccessible after 24 July, so existing integrations must migrate whether they like the new meter or not.
Why a model lab is pricing like a power company
Time-of-day pricing exists in one kind of market: fixed capacity, fluctuating demand. Power grids invented it because storage was expensive and peak demand set the size of the whole system. That an AI lab now reaches for the same tool is an admission worth more than any keynote: inference capacity is finite, GPUs do not queue politely, and the marginal token at 10:30 on a Tuesday costs the operator more than the same token at midnight.
It also breaks a comfortable assumption. The industry has spent two years telling buyers that intelligence gets cheaper every quarter. Per token, that remains true. But the new mechanism means the price of the same request is no longer a constant, and budget owners who planned on flat unit costs now own a small energy-trading problem. Once one vendor demonstrates that customers accept surge pricing, others have every incentive to follow.
The European clock advantage
For European buyers, the geography of the peak windows is unusually kind. The reported peak hours fall at 3:00 to 6:00 and 8:00 to 12:00 Central European summer time, and 2:00 to 5:00 and 7:00 to 11:00 in London and Lisbon. From noon in Frankfurt or Paris, the entire working afternoon and evening run off-peak. A European company using DeepSeek pays the discounted rate for most of its business day, while a Chinese competitor pays double during its own.
The practical move is architectural, not contractual: separate latency-critical calls from deferrable ones. Nightly batch jobs, embeddings, re-indexing, evaluation runs and report generation can be scheduled into off-peak windows with a queue and a cron entry. That discipline is worth building even if you never use DeepSeek, because time-of-day pricing has now been demonstrated, and your own vendor's version of it is a product-management meeting away.
What to do before mid July
Three actions fit in the two weeks before release. First, anyone running the retiring deepseek-chat or deepseek-reasoner endpoints needs a migration plan before 24 July, tested, not planned. Second, teams using any metered AI API should tag their workloads deferrable or interactive now, so scheduling is a config change later. Third, whoever owns the AI budget should model spend under a two-tier price and ask each vendor one question at renewal: do you commit to time-independent pricing for the term of this contract, or not. The answer, either way, is information.
Read next: Nvidia Now Earns Rent on Its Own Chips · OpenAI Offers Washington a Stake



