Where do DeepSeek V4-Pro’s price cuts actually come from: cheaper input tokens, ...

2026-05-01

Latest

2026-05-03

Question of the day · 2026-05-02 ·

One question per day to look beyond the headlines.

Where do DeepSeek V4-Pro’s price cuts actually come from: cheaper input tokens, cache hits, or MoE routing efficiency?

Take-away DeepSeek’s cuts come from pricing the ingestion path separately: cheap prompt tokens plus a deeply discounted cache-hit tier monetize reuse, not MoE routing.

DeepSeek V4-Pro’s price cuts primarily come from cheaper input tokens and cache hits. DeepSeek has significantly reduced API prices by up to 90%, focusing on cutting costs for input prompts and cached data hits, thereby lowering per million token costs [2]. Furthermore, the input cache-hit tier has been reduced to one-tenth of the list price, layered on top of a prior 75% discount, highlighting the strategy to offer lower prices through efficient input token costs and cache hit management [1], [3].

Sources · 2026-05-03

Deep|DeepSeek V4: The Inflection Point for Large-Scale NAND-Based KV Cache fundaai.substack.com (opens in new tab)
DeepSeek slashes API prices by 90% as AI-mad enterprises embrace 'tokenmaxxing' - SDxCentral sdxcentral.com (opens in new tab)
DeepSeek Cuts Prices Aggressively on V4 Rollout As New Model Fails to Wow Market - Tekedia tekedia.com (opens in new tab)

Question of the Day

Where do DeepSeek V4-Pro’s price cuts actually come from: cheaper input tokens, cache hits, or MoE routing efficiency?