Working Paper · Preliminary
Exposure Without Performance: Does Theoretical AI Exposure Predict Actual AI Capability? Evidence from the United States
International Finance Corporation, World Bank Group
Abstract. Theoretical AI-exposure scores — built from expert or model ratings of how amenable an occupation's tasks are to automation — are now ubiquitous in labor-market research, where they stand in for where AI will actually have an impact. Yet these scores are rarely validated against measured AI capability. This paper asks whether theoretical exposure predicts actual AI performance across U.S. occupations, using observed AI task-success rates as the outcome. I find that higher theoretical exposure predicts lower measured performance: a one-unit increase in GAIA-E is associated with a 17.47-point decline in task success (p < 0.001). In a horse race against the Suitability-for-Machine-Learning (SML) measure, GAIA-E remains strongly negative while SML is insignificant (p = 0.467). The exposure–performance gap is concentrated in occupations whose tasks are nominally "exposed" to generative AI but where current systems do not yet perform, suggesting exposure indices capture theoretical surface area rather than realized capability — with direct implications for any study using exposure as a treatment.
Figure 1 · GAIA-E vs. AEI task success
Each dot is an occupation. Regression line overlaid.
Source: data/gaia_occupations.csv · GAIA-E (gaia_e ×100) vs. AEI task success (aei_task_success)
Figure 2 · Horse race — exposure vs. SML
Regression coefficients on AEI task success. Teal = significant, gray = not significant. Bars show 95% confidence intervals.
Coefficients from paper regressions. Joint-specification values approximate (preliminary).
Figure 3 · The exposure–performance gap by occupation group
Mean GAIA-E vs. mean AEI task success within each major group. The wider the gap, the more theory overpredicts.
Source: data/gaia_occupations.csv · group means of gaia_e (×100) and aei_task_success
Figure 4 · Distribution of the exposure–performance gap
Histogram of (GAIA-E − AEI task success) by occupation. The right tail = occupations where theory overpredicts performance.
Source: data/gaia_occupations.csv · gaia_e (×100) − aei_task_success
Key findings
Finding 1
β = −17.47 (p < 0.001)
Higher theoretical AI exposure (GAIA-E) predicts lower measured AI task success. Exposure and capability move in opposite directions.
Finding 2
SML: p = 0.467
The Suitability-for-Machine-Learning measure does not significantly predict task success. In the joint specification its coefficient collapses toward zero while GAIA-E stays negative.
Finding 3
Gap is generative-AI specific
The exposure–performance gap concentrates in occupations rated exposed to generative AI, where today's systems do not yet deliver — exposure captures theoretical surface area, not realized capability.
Implications
A large and growing literature uses theoretical AI-exposure scores as the treatment variable when estimating AI's effect on wages, employment, and productivity. If exposure is negatively — or simply not — related to what AI can currently do, then exposure-based designs may be mismeasuring the shock, attenuating or even sign-flipping estimated effects. The results argue for validating exposure measures against observed capability before treating them as proxies for impact, and for pairing exposure with usage- and performance-based measures (as GAIA does) rather than relying on any single lens.