Benchmarks - Latest News & Analysis

In-depth coverage, analysis, and updates on Benchmarks in AI and tech. 6 articles on The Pulse Gazette.

Which AI Hallucinates Least? GPT-5, Claude, Gemini Tested

New benchmark data shows GPT-5 leads with 8% hallucination rate, but the gaps are narrowing. Here's what each model gets wrong.

research The Pulse Gazette Feb 8, 2026

Llama 4 Outperforms GPT-5 in Coding and Math

Meta's open-weights model outperforms OpenAI's flagship on HumanEval and MATH benchmarks. Anyone can run it locally.

research The Pulse Gazette Feb 8, 2026

GPT-5 Beats Human Experts on Every Major Benchmark

The new model scores higher than PhD-level humans on medical, legal, and scientific reasoning tests. Sam Altman warns the next version will be 'qualitatively different.'

news The Pulse Gazette Feb 7, 2026

Gemini 2.5 Crushes Benchmarks—But Does It Matter?

The new model tops GPT-5 on 14 out of 15 benchmarks. Researchers say benchmarks are broken anyway.

news The Pulse Gazette Feb 4, 2026

ARC-AGI-2 Test: Why GPT-5 Failed Human-Level AI

GPT-5 Pro scores 18.3% on the new benchmark. The previous version? 70.2%. Francois Chollet's test exposes what AI still can't do — and it's not what you'd expect.

research The Pulse Gazette Feb 3, 2026

GPT-5 vs Claude Opus 4 vs Gemini Ultra: The 2026 AI Showdown

An in-depth comparison of the three leading AI models across benchmarks, capabilities, and real-world use cases

tools The Pulse Gazette Feb 1, 2026