Skip to main content
2025-01-01

Question of the Day

Question of the day · 2026-04-30 ·

One question per day to look beyond the headlines.

Which part of an AI model’s “value” does distillation copy: weights, data, or just behavior?

Take-away Distillation captures a teacher’s decision surface (routing/logit patterns) into a new student’s weights, so “value” transfers via behavior—not copying weights or original data.

Distillation copies the behavior and intelligence of the original model rather than specific weights or data [1]. It transfers the routing intelligence from a larger teacher model to a smaller student model, aiming to preserve the quality of the model's actions or decisions [1]. During this process, explicit data might be synthesized and used for training, but the core idea is to replicate the refined behavior patterns embodied in the teacher model [1].

Sources · 2026-05-01