How an Independent Benchmark Team Turned 4-of-40 Models Passing Hard QA into a Majority Win by March 2026
https://www.livebinders.com/b/3698939?tabid=832fa6b6-886d-c247-10d7-743378e56a30
How an independent benchmarking lab discovered only 4 of 40 models beat coin flip on "hard" questions In late 2025, an independent benchmarking group (OpenBench Labs) published a reproducible evaluation showing that, on a 1,000-item "hard