It’s getting harder to measure just how good AI is getting

Kelsey Piper

Vox, January 12, 2025

Abstract

The rapid advancement of artificial intelligence capabilities has made traditional benchmarking methods increasingly obsolete. As AI systems continue to saturate and exceed human performance on established tests like GPQA, MMLU, and ARC-AGI, measuring their progress becomes more challenging. Three key factors are driving AI’s transformative impact: decreasing operational costs, improved human-AI interfaces, and enhanced reasoning capabilities. OpenAI’s o3 model exemplifies this progress, achieving remarkable performance levels that challenge claims of AI hitting developmental walls. Rather than slowing down, AI progress has become less visible to casual observers, as improvements now occur in specialized domains beyond common human expertise. This shift in measurability coincides with AI’s increasing ability to automate complex intellectual work, though concerns persist about the responsible management of this technological transition. The combination of cheaper operation, better interfaces, and enhanced capabilities suggests continued significant advancement in AI technology, regardless of debates about scaling laws. - AI-generated abstract.

It’s getting harder to measure just how good AI is getting

Abstract

PDF