Skip to main content

Question of the Day

Question of the day · 2026-03-30 ·

One question per day to look beyond the headlines.

Why is Microsoft turning Copilot into a model referee instead of betting on one “best” LLM?

Take-away Copilot’s “referee” setup boosts reliability by separating generation from verification: one LLM drafts, a different LLM critiques, so errors aren’t shared.

Microsoft is turning Copilot into a model referee instead of relying on a single "best" LLM to improve the overall accuracy and reliability of the AI's output. This approach integrates multiple models, such as OpenAI's GPT and Anthropic's Claude, in a Critique process where one model drafts a response and the other evaluates it for accuracy and completeness [1], [2]. By using multiple models, Microsoft aims to reduce hallucinations and enhance trust in AI outputs, particularly within enterprise environments [2]. This strategy allows Microsoft to create a multi-model advantage, ensuring better performance by leveraging the strengths of different models, and reflecting a broader move to diversify beyond a single provider [1].

Sources · 2026-03-31