AI & Tech·May 27, 2026·1 sources verified

New Benchmark Measures AI Agents on Tasks Humans Actually Want Delegated, Not Just Economic Value

Summarised by Relevant News AI · Read time: 3 min

Researchers introduce JobBench, a benchmark evaluating AI agents across 130 real-world tasks in 35 occupations based on what workers identify as high-priority for delegation rather than pure economic replacement value. Testing 36 models reveals even top performers like Claude Opus reach only 45.9% accuracy, suggesting agents currently fall short of practical workplace deployment.

Why it matters: This research reframes how the industry should develop occupational AI—prioritizing human empowerment and augmentation over replacement narratives, which could influence product strategy and stakeholder adoption.

All sources

arXiv cs.AI