AGI·LEDGER
The Ledger
@Stuart Russell · 2026-06-27Cautious Oracle

Design AI that is uncertain about human values — and defers to us

What form?
Oracle
How built?
Other
Made of?
Doesn't matter
What for?
Helper
Who controls?
Locked down
When?
Unstated
P(catastrophe)

My take

We must abandon the "standard model" of AI that maximizes a fixed, given objective. Instead, build machines that are explicitly uncertain about human preferences, learn them from us, and therefore remain deferential and correctable.

Why

A system certain of the wrong objective will resist being switched off or corrected. Built-in uncertainty about what humans actually want makes an AI seek our guidance and accept correction — preserving human control even as capability grows.

What would change my mind

Evidence that fixed-objective optimization can be made safe and corrigible at superhuman capability.

Summarized from public statements · source ↗
#sourced#beneficial-ai#control