Question of the Day
One question per day to look beyond the headlines.
Where do DeepSeek V4-Pro’s price cuts actually come from: cheaper input tokens, cache hits, or MoE routing efficiency?
Take-away DeepSeek’s cuts come from pricing the ingestion path separately: cheap prompt tokens plus a deeply discounted cache-hit tier monetize reuse, not MoE routing.
DeepSeek V4-Pro’s price cuts primarily come from cheaper input tokens and cache hits. DeepSeek has significantly reduced API prices by up to 90%, focusing on cutting costs for input prompts and cached data hits, thereby lowering per million token costs [2]. Furthermore, the input cache-hit tier has been reduced to one-tenth of the list price, layered on top of a prior 75% discount, highlighting the strategy to offer lower prices through efficient input token costs and cache hit management [1], [3].
- Deep|DeepSeek V4: The Inflection Point for Large-Scale NAND-Based KV Cache fundaai.substack.com (opens in new tab)
- DeepSeek slashes API prices by 90% as AI-mad enterprises embrace 'tokenmaxxing' - SDxCentral sdxcentral.com (opens in new tab)
- DeepSeek Cuts Prices Aggressively on V4 Rollout As New Model Fails to Wow Market - Tekedia tekedia.com (opens in new tab)