Back to Problems
Rank #5 ITN Score: 21/30

Power-Seeking AI Systems

Advanced AI systems may develop instrumental goals around self-preservation and resource acquisition, undermining human oversight and potentially leading to irreversible loss of control.

Sort
WC

Advanced AI systems optimizing for misspecified objectives could pursue instrumental sub-goals — resource acquisition, self-preservation, goal-content integrity — in ways that undermine human oversight and permanently foreclose human control. This report explains the theoretical basis for the alignment problem, surveys the current state of alignment research, and argues for urgent, well-funded work on scalable oversight and interpretability.

+0 votes