Why is Google splitting TPU 8 into a training chip (8t) and an inference chip (8...

2026-04-21

Latest

2025-01-01

Question of the day · 2026-04-22 ·

One question per day to look beyond the headlines.

Why is Google splitting TPU 8 into a training chip (8t) and an inference chip (8i)?

Take-away Training and inference stress different bottlenecks—training needs pod-scale compute, inference needs low-latency memory (HBM/SRAM)—so split TPUs optimize perf/$.

Google is splitting the TPU 8 into separate training and inference chips, namely the TPU 8t and TPU 8i, to address different specialized AI workloads, thereby optimizing efficiency and scalability in production deployments [2]. The TPU 8t is designed for compute-intensive training workloads, providing significant enhancements such as approximately 3x the compute performance per pod compared to its predecessor [2], [3]. On the other hand, the TPU 8i focuses on latency-sensitive inference tasks, incorporating features like high-bandwidth memory and increased on-chip SRAM to improve performance [2]. This two-chip design allows Google to better serve the distinct demands of training and inference in AI processing, aiming to achieve improved performance-per-dollar by 80% over previous generations [1], [3].

Sources · 2026-04-23

Question of the Day

Why is Google splitting TPU 8 into a training chip (8t) and an inference chip (8i)?