Rank #5 ITN Score: 21/30

Power-Seeking AI Systems

Advanced AI systems may develop instrumental goals around self-preservation and resource acquisition, undermining human oversight and potentially leading to irreversible loss of control.

0 Join to contribute

Scale of the Problem

Less than $100M/year is spent on AI alignment research vs. $100B+ on AI capabilities.

Potentially existential. Toby Ord's The Precipice estimates roughly 10 percent existential risk this century from unaligned AI. Surveys of ML researchers give median estimates of several percent probability of human extinction or permanent disempowerment from AI. Even modest probabilities of an unrecoverable outcome make this one of the largest-scale risks we face.

Why This Is Pressing

Frontier AI systems already display rudimentary strategic and deceptive behavior in controlled evaluations, and capabilities are improving faster than alignment techniques. As systems become more agentic and are given more autonomy over code, money, and real-world actions, the risk of misaligned AI pursuing goals at civilization scale grows. Leading forecasters place non-trivial probability on existential catastrophe from misaligned AI this century, with several estimates above 10 percent. The pace of scaling continues to outstrip safety research, and decisions made in the next few years about training, deployment, and governance may shape outcomes for the long term.

Why It's Neglected

The technical alignment workforce is estimated at 1,000 to 2,000 full-time researchers globally, compared with tens of thousands in capabilities work at frontier labs. Alignment teams inside labs are usually a small share of headcount. Independent safety organizations such as MIRI, Redwood Research, ARC, and Apollo Research are small relative to the scale of the risk.

Can It Be Solved?

Mechanistic interpretability, scalable oversight, adversarial training, and dangerous-capability evaluations are producing real progress. Governance wins such as UK and US AI Safety Institutes, the EU AI Act, and voluntary commitments show policy is viable. However, alignment is unsolved and may be hard to solve on the timelines implied by current scaling trends.

Research & Solutions

The Alignment Problem: Why Goal Misspecification Poses an Existential Risk

WorldProblems Solved

What You Can Do

0 people following

1 solutions submitted

What this problem still needs

Systematic reviewsCost-benefit analysesPolicy proposalsField research

Technical alignment research at Anthropic, Google DeepMind, Redwood Research, MIRI, or Apollo Research. AI governance and policy careers. Information security at frontier labs. Field-building through BlueDot Impact, MATS, or ARENA. Donate to the Long-Term Future Fund or Open Philanthropy's AI safety program.

Join to contribute Fund this problem

X / Twitter LinkedIn Bluesky Reddit

Priority Score

Breakdown

Importance10

Tractability5

Neglectedness6

Fund this problem

Support the researchers and organizations working on Power-Seeking AI Systems.

Donate to this cause

Contribute to solving this

Create a free account to submit research, vote on solutions, and follow this problem.

Join for free