
Adapting Trust and Safety blueprints
to AI Safety for Mitigating P(Doom)
AI Safety could reduce P(Doom)!
In my view, the possibility of an AI-induced existential catastrophe—a scenario where many frontier AI developers' executives and experts now estimate a sobering 10-30% chance of P(Doom)—represents the most profound challenge of our time. The field of AI Safety is dedicated to reducing this probability. However, as AI capabilities advance, I believe safety cannot remain a purely theoretical or technical discipline; it must become an operational one. I believe the established field of Trust & Safety (T&S)—forged over decades of managing adversarial risk on global platforms—provides the essential blueprint to adapt and evolve the processes, policies, and personnel required to make AI Safety practical, scalable, and effective in the real world.
Deconstructing 'Doom'
To get a handle on this, I find it helpful to break "P(Doom)" down into distinct risk pathways. This includes the classic "AI takeover" scenario, often called a Decisive Existential Risk, where a runaway intelligence rapidly surpasses human capabilities. But it also includes more gradual pathways. Here’s a look at the primary vectors through which advanced AI could lead to a catastrophic outcome, based on my evaluation.
Malicious Use
Intentional use of AI by human actors to cause harm, such as designing bioweapons, launching cyberattacks, or deploying autonomous weapons.
AI Race Dynamics
Competitive pressures forcing nations and corporations to cut corners on safety, leading to premature deployment of unsafe systems.
Organizational Risks
Failures of human process and safety culture within AI labs, such as accidental model leaks, theft, or systemic underinvestment in safety.
Emergent & Agentic Risks
The "alignment problem": losing control of AIs that act as autonomous agents with emergent goals (e.g., power-seeking, deception).
Evolving the AI Safety Playbook
From where I stand, the convergence of T&S and AI Safety isn't just a theory; it's an active process driven by necessity. I see adapting T&S frameworks and evolving them to manage the risks of frontier AI models, transforming AI Safety into an empirical, operational discipline.
The P(Doom) Mitigation Profile
Select a T&S function to see its effectiveness profile against the four primary catastrophic risk vectors.
The Human Element
Beyond processes and frameworks, I believe the most valuable asset in this convergence is the T&S professional. Their unique skills and adversarial mindset are essential for building safe AI.
Adversarial Thinking
Trained to think like an attacker, anticipating how systems can be abused or exploited. Indispensable for effective AI red teaming.
Risk Triage
Skilled in rapidly identifying, categorizing, and prioritizing a constant influx of diverse risks, from immediate harms to long-term threats.
Policy Nuance
Expertise in crafting robust yet flexible rules for complex systems, essential for developing the "constitutions" for agentic AI.
Crisis Management
Experienced in managing high-stakes digital crises, vital for preparing for AI safety incidents that could unfold at unprecedented speed.
Scaled Systems Intuition
A deep, intuitive understanding of how small changes can cascade into systemic failure in large, complex socio-technical systems.
Socio-Technical View
Views safety not just as a technical problem, but as an interaction between technology, people, policies, and incentives.
A Strategic Roadmap
To truly capitalize on this convergence, it's clear that we need deliberate action from everyone involved. The ultimate goal shouldn't be a one-time "solution" to alignment, but rather the creation of a permanent, adaptive, global T&S-like function for AI.
For AI Labs
- Elevate T&S to a core strategic function.
- Establish permanent, empowered red teams.
- Invest in socio-technical safety research.
For Policymakers
- Consult T&S veterans for practical governance.
- Standardize incident reporting and analysis.
- Promote multi-stakeholder collaboration.
For the T&S Community
- Expand the T&S mandate to include agentic AI.
- Develop specialized AI safety curricula.
- Build bridges with AI Safety Institutes.