Infrastructure · intermediate

TPU (Tensor Processing Unit)

TPUs are Google's custom AI accelerators, designed specifically for the matrix and reduction operations of neural networks. Used to train Gemini and large parts of Google's AI stack.

Published May 29, 2026

Explanation

TPUs differ from GPUs in their use of large systolic arrays for matrix multiplication and a focus on training-friendly numerics (bfloat16). They are available externally only through Google Cloud (TPU v4, v5p, v5e, Trillium).

Workloads compiled for JAX or TensorFlow tend to run very efficiently on TPUs; PyTorch support via PyTorch/XLA is increasingly viable but historically less polished.

Examples

Gemini trained on TPU v5p pods.
JAX-based research labs frequently target TPU.

Frequently asked

What is TPU?

TPUs are Google's custom AI accelerators, designed specifically for the matrix and reduction operations of neural networks. Used to train Gemini and large parts of Google's AI stack.

What is an example of tpu?

Gemini trained on TPU v5p pods.

How is TPU related to GPU?

TPU and GPU are both infrastructure concepts. GPUs are the parallel processors that train and run nearly every modern AI model. Their throughput on matrix multiplication is what makes deep learning practical.

Is TPU considered intermediate?

TPU is generally considered intermediate-level material in the AI and LLM space.

GPUInfrastructure

GPUs are the parallel processors that train and run nearly every modern AI model. Their throughput on matrix multiplication is what makes deep learning practical.

InferenceInference

Inference is what happens when you actually run a trained model on new input. For LLMs that means generating tokens one at a time, with sampling and a KV cache.

PretrainingTraining

Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.

Side-by-side comparisons

Sources

Google Cloud TPU