We must abandon the "standard model" of AI that maximizes a fixed, given objective. Instead, build machines that are explicitly uncertain about human preferences, learn them from us, and therefore remain deferential and correctable.
A system certain of the wrong objective will resist being switched off or corrected. Built-in uncertainty about what humans actually want makes an AI seek our guidance and accept correction — preserving human control even as capability grows.
Evidence that fixed-objective optimization can be made safe and corrigible at superhuman capability.