Adapting Trust and Safety blueprints

to AI Safety for Mitigating P(Doom)

AI Safety could reduce P(Doom)!

AI Safety could reduce P(Doom)!

In my view, the possibility of an AI-induced existential catastrophe—a scenario where many frontier AI developers' executives and experts now estimate a sobering 10-30% chance of P(Doom)—represents the most profound challenge of our time. The field of AI Safety is dedicated to reducing this probability. However, as AI capabilities advance, I believe safety cannot remain a purely theoretical or technical discipline; it must become an operational one. I believe the established field of Trust & Safety (T&S)—forged over decades of managing adversarial risk on global platforms—provides the essential blueprint to adapt and evolve the processes, policies, and personnel required to make AI Safety practical, scalable, and effective in the real world.

Deconstructing 'Doom'

To get a handle on this, I find it helpful to break "P(Doom)" down into distinct risk pathways. This includes the classic "AI takeover" scenario, often called a Decisive Existential Risk, where a runaway intelligence rapidly surpasses human capabilities. But it also includes more gradual pathways. Here’s a look at the primary vectors through which advanced AI could lead to a catastrophic outcome, based on my evaluation.

Malicious Use

Intentional use of AI by human actors to cause harm, such as designing bioweapons, launching cyberattacks, or deploying autonomous weapons.

THREAT PROFILE
Impact:Catastrophic
Likelihood:High
T&S Tractability:High

AI Race Dynamics

Competitive pressures forcing nations and corporations to cut corners on safety, leading to premature deployment of unsafe systems.

THREAT PROFILE
Impact:Catastrophic
Likelihood:High
T&S Tractability:Medium

Organizational Risks

Failures of human process and safety culture within AI labs, such as accidental model leaks, theft, or systemic underinvestment in safety.

THREAT PROFILE
Impact:High
Likelihood:Medium
T&S Tractability:High

Emergent & Agentic Risks

The "alignment problem": losing control of AIs that act as autonomous agents with emergent goals (e.g., power-seeking, deception).

THREAT PROFILE
Impact:Existential
Likelihood:Unknown
T&S Tractability:Medium

Evolving the AI Safety Playbook

From where I stand, the convergence of T&S and AI Safety isn't just a theory; it's an active process driven by necessity. I see adapting T&S frameworks and evolving them to manage the risks of frontier AI models, transforming AI Safety into an empirical, operational discipline.

The P(Doom) Mitigation Profile

Select a T&S function to see its effectiveness profile against the four primary catastrophic risk vectors.

The Human Element

Beyond processes and frameworks, I believe the most valuable asset in this convergence is the T&S professional. Their unique skills and adversarial mindset are essential for building safe AI.

Adversarial Thinking

Trained to think like an attacker, anticipating how systems can be abused or exploited. Indispensable for effective AI red teaming.

Risk Triage

Skilled in rapidly identifying, categorizing, and prioritizing a constant influx of diverse risks, from immediate harms to long-term threats.

Policy Nuance

Expertise in crafting robust yet flexible rules for complex systems, essential for developing the "constitutions" for agentic AI.

Crisis Management

Experienced in managing high-stakes digital crises, vital for preparing for AI safety incidents that could unfold at unprecedented speed.

Scaled Systems Intuition

A deep, intuitive understanding of how small changes can cascade into systemic failure in large, complex socio-technical systems.

Socio-Technical View

Views safety not just as a technical problem, but as an interaction between technology, people, policies, and incentives.

A Strategic Roadmap

To truly capitalize on this convergence, it's clear that we need deliberate action from everyone involved. The ultimate goal shouldn't be a one-time "solution" to alignment, but rather the creation of a permanent, adaptive, global T&S-like function for AI.

For AI Labs

  • Elevate T&S to a core strategic function.
  • Establish permanent, empowered red teams.
  • Invest in socio-technical safety research.

For Policymakers

  • Consult T&S veterans for practical governance.
  • Standardize incident reporting and analysis.
  • Promote multi-stakeholder collaboration.

For the T&S Community

  • Expand the T&S mandate to include agentic AI.
  • Develop specialized AI safety curricula.
  • Build bridges with AI Safety Institutes.
Next
Next

EU AI Act vs. Code of Practice: What's the Difference?