Question of the Day
One question per day to look beyond the headlines.
Why is Microsoft turning Copilot into a model referee instead of betting on one “best” LLM?
Take-away Copilot’s “referee” setup boosts reliability by separating generation from verification: one LLM drafts, a different LLM critiques, so errors aren’t shared.
Microsoft is turning Copilot into a model referee instead of relying on a single "best" LLM to improve the overall accuracy and reliability of the AI's output. This approach integrates multiple models, such as OpenAI's GPT and Anthropic's Claude, in a Critique process where one model drafts a response and the other evaluates it for accuracy and completeness [1], [2]. By using multiple models, Microsoft aims to reduce hallucinations and enhance trust in AI outputs, particularly within enterprise environments [2]. This strategy allows Microsoft to create a multi-model advantage, ensuring better performance by leveraging the strengths of different models, and reflecting a broader move to diversify beyond a single provider [1].