Which AI Hallucinates Least? GPT-5, Claude, Gemini Tested
New benchmark data shows GPT-5 leads with 8% hallucination rate, but the gaps are narrowing. Here's what each model gets wrong.
In-depth coverage, analysis, and updates on Benchmarks in AI and tech. 6 articles on AI Pulse.
New benchmark data shows GPT-5 leads with 8% hallucination rate, but the gaps are narrowing. Here's what each model gets wrong.
Meta's open-weights model outperforms OpenAI's flagship on HumanEval and MATH benchmarks. Anyone can run it locally.
The new model scores higher than PhD-level humans on medical, legal, and scientific reasoning tests. Sam Altman warns the next version will be 'qualitatively different.'
The new model tops GPT-5 on 14 out of 15 benchmarks. Researchers say benchmarks are broken anyway.
GPT-5 Pro scores 18.3% on the new benchmark. The previous version? 70.2%. Francois Chollet's test exposes what AI still can't do — and it's not what you'd expect.
An in-depth comparison of the three leading AI models across benchmarks, capabilities, and real-world use cases