N-003Leaderboard2026-05-04
V11 Leaderboard Published Across 80 Complete CLI Tasks
The homepage leaderboard now reflects V11 pass@1/pass@3 results: GPT-5.5 leads at 61.7% pass@1, followed by GPT-5.3-codex and Opus 4.6.
Read updateNewsroom
Product updates, benchmark announcements, and community events.
The homepage leaderboard now reflects V11 pass@1/pass@3 results: GPT-5.5 leads at 61.7% pass@1, followed by GPT-5.3-codex and Opus 4.6.
Read updateThe benchmark repository has grown to 87 merged quantitative finance tasks, with the full 90-task milestone now in sight.
Read updateJoin our weekly discussion to talk about benchmark progress, quantitative finance tasks, and upcoming evaluation updates.
Read update