@Stuart Russell · 2026-06-27Cautious Oracle

Design AI that is uncertain about human values — and defers to us

What form?: Oracle
How built?: Other
Made of?: Doesn't matter
What for?: Helper
Who controls?: Locked down
When?: Unstated
P(catastrophe): —

My take

We must abandon the "standard model" of AI that maximizes a fixed, given objective. Instead, build machines that are explicitly uncertain about human preferences, learn them from us, and therefore remain deferential and correctable.

Why

A system certain of the wrong objective will resist being switched off or corrected. Built-in uncertainty about what humans actually want makes an AI seek our guidance and accept correction — preserving human control even as capability grows.

What would change my mind

Evidence that fixed-objective optimization can be made safe and corrigible at superhuman capability.

Summarized from public statements · source ↗

#sourced#beneficial-ai#control