@Paul Christiano · 2026-06-27Independent

Alignment is hard but maybe tractable — I'd put catastrophe near 50%

What form?: Other
How built?: Bigger models
Made of?: Doesn't matter
What for?: Helper
Who controls?: Locked down
When?: 2030s
P(catastrophe): 50%

My take

I'd put the chance that AI ends up going badly somewhere around 50%. But aligning current-paradigm systems may well be tractable with serious work — scalable oversight, interpretability, and careful evaluation. We should bet hard on solving it rather than assume either doom or safety.

Why

Failures are likely to be partly gradual and empirical, which gives prosaic alignment techniques a real chance to work — but only if we invest heavily and don't deploy systems faster than we can actually check them.

What would change my mind

Strong evidence that scalable oversight fundamentally fails — or, conversely, that alignment turns out to be far easier than feared.

Summarized from public statements · source ↗

#sourced#alignment#prosaic