Global coordination problems via game theory

Why AGI is not like climate change

July 24, 2025

Most global coordination problems get easier with time. Overfishing and climate change look like Prisoner’s Dilemmas in the short run, but the game transforms once mutual defection has done enough damage to make cooperation visibly worth it. The hard part of governance is buying enough time and trust for that transformation to happen.

AGI breaks the pattern. The conditions that drive the transformation elsewhere — depleting resources, repeated interactions, visible consequences — don’t apply, or apply too slowly. What’s left is a permanent Prisoner’s Dilemma with a defection prize unlike anything we’ve faced, and a misalignment between individual players and humanity that is built into the payoff structure itself.

The rest of this post makes the asymmetry visible using ordinal payoffs (1 = Worst, 4 = Best), with each cell written as (Player A, Player B, Humanity). Adding humanity as a third player is the trick — it lets us see exactly where individual incentives and collective welfare come apart. (The rankings illustrate the direction of incentives, not interpersonally comparable utilities.)

The Standard Pattern: From Short-Term Dilemma to Long-Term Cooperation

The core of most global coordination challenges lies in overcoming short-term incentives to secure a better long-term future. The goal of governance is to build the trust needed to make this shift.

Example 1: Overfishing

In the short term, the game is a Prisoner’s Dilemma. The immediate incentive is to overfish for a quick profit, especially if you fear others will do the same. This logic pushes everyone toward a (2, 2, 1) tragedy of the commons.

	Others: Fish Sustainably	Others: Overfish
You: Fish Sustainably	(3, 3, 3)	(1, 4, 2)
You: Overfish	(4, 1, 2)	(2, 2, 1) - Mutual Ruin

In the long term, the game becomes a Stag Hunt. A healthy, sustainable fishery is vastly more valuable than any short-term gain. The (4, 4, 4) outcome of mutual cooperation becomes the best prize for the players and for Humanity.

	Others: Fish Sustainably	Others: Overfish
You: Fish Sustainably	(4, 4, 4) - Sustainable Bounty	(1, 3, 2)
You: Overfish	(3, 1, 2)	(2, 2, 1)

This transformation has a concrete mechanism: repeated interactions let cooperators punish defectors, and the resource itself degrades as defection accumulates, shrinking the defector’s prize over time (an overfished sea pays nobody). The entire challenge of fisheries governance is creating institutions (treaties, quotas, enforcement) that give players the confidence to aim for the long-term (4, 4, 4) prize before the resource collapses.

Example 2: Climate Change

Climate change follows the same pattern on a global scale.

In the short term, it is a Prisoner’s Dilemma. The incentive is to pollute to avoid costly abatement (the “free-rider” problem), pushing nations toward a (2, 2, 1) outcome of a worsening climate.

	Others: Abate Pollution	Others: Pollute
Your Nation: Abate Pollution	(3, 3, 3)	(1, 4, 2)
Your Nation: Pollute	(4, 1, 2)	(2, 2, 1) - Status Quo Deterioration

In the long term, it becomes a Stag Hunt. The catastrophic costs of a runaway climate make a stable planet the ultimate prize, making the (4, 4, 4) outcome the most rational goal for all.

	Others: Abate Pollution	Others: Pollute
Your Nation: Abate Pollution	(4, 4, 4) - Stable Climate	(1, 3, 2)
Your Nation: Pollute	(3, 1, 2)	(2, 2, 1)

The same mechanism is at work — accumulating damage, repeated rounds, and a shrinking prize for defection as the underlying system degrades. The diplomatic effort is focused on convincing nations to abandon the short-term game and coordinate for the superior long-term prize.

Why AGI Is Different: A Permanent Dilemma

The transformations that rescue other coordination problems don’t have an obvious analogue for AGI:

No resource depletion. Overfishing destroys the fishery, shrinking the defector’s prize. AGI capability doesn’t deplete with use — if anything, a first-mover compounds their lead.
No repeated rounds. Climate cooperation accrues over decades of small, reversible decisions. The most pessimistic AGI race models look more like a single round: whoever gets there first locks in.
No visible degradation. A collapsing fishery is something everyone can see. A capabilities lead, an alignment failure mode, or an internal safety culture is largely invisible to the other player.

So the same payoffs stay on the table at long horizons:

The AGI Payoff Matrix (Player A, Player B, Humanity)

	Player B: Cooperate	Player B: Defect
Player A: Cooperate	(3, 3, 4) - Mutual Safety	(1, 4, 2) - Sucker & Temptation
Player A: Defect	(4, 1, 2) - Temptation & Sucker	(2, 2, 1) - Mutual Ruin

The misalignment is structural

Compare mutual cooperation across the three games:

Long-term overfishing and climate: (4, 4, 4) — players and humanity all get their best outcome simultaneously.
AGI: (3, 3, 4) — humanity’s best, but each player still sees a higher (4) they could have grabbed by defecting unilaterally.

That gap is the misalignment, made structural. Even at the cooperative equilibrium, the players’ preferences don’t line up with humanity’s — and the game is built so that they can’t. The arrangement that’s best for humanity is one each player can plausibly tell themselves they’re underselling.

This is the deeper version of the AI alignment problem. Aligning the model to its operator isn’t enough if the operators themselves aren’t aligned to humanity. See here for more on this argument.

The dominant strategy

Given those payoffs, no matter what Player B does, Player A’s best response is to Defect (4 > 3; 2 > 1). Defection is dominant for both players, and two rational players land at (2, 2, 1) — a high-risk race that is the worst possible outcome for Humanity. Unlike fisheries or climate, no shift in time horizon changes this; the (4) temptation doesn’t shrink as the game plays out.

Why the prize looks absolute

The defection prize sustains its perceived (4) value because of what AGI is believed to deliver:

Existential security. The first actor to build a controllable superintelligence could end strategic competition globally and permanently.
Economic singularity. Whoever controls the first AGI captures most or all of the value created by automation.
Fear of irrelevance. The (1) is not just a strategic setback; it’s the risk of your nation, culture, or company being permanently sidelined.

These claims are contested in detail, but they don’t have to be true to drive the game. They only have to be believed by the people making the build-or-cooperate decision.

The escape hatch: the prize may not be real

The strongest counterargument to this whole frame is that the unilateral (4) is conditional on solving alignment, and racing makes solving alignment less likely. If a racer is more likely to lose control of what they build, the true expected payoff for “Defect while the other cooperates” is closer to (1) than (4) for the racer too — and the matrix starts to collapse toward a Stag Hunt.

This is exactly the belief update AI safety research is trying to force into common knowledge. The game doesn’t have to stay a permanent Prisoner’s Dilemma; it stays that way only as long as both players believe they can race and win. The transformation here isn’t automatic the way it is in fisheries — it has to come from evidence and persuasion rather than from the resource depleting on its own.

What about nuclear weapons?

The obvious objection: nuclear weapons share most of these features. A permanent prize, existential stakes, a first-mover advantage, no Stag Hunt transformation — and yet we got partial cooperation through MAD, the NPT, and test ban treaties.

So why is AGI different from nukes? The honest answer is that AGI is nukes with several of the stabilizers missing:

No second strike. MAD works because retaliation is credible — you can absorb a first strike and respond. If Player A reaches aligned superintelligence first, Player B has no analogous retaliation capacity.
Hard to verify. Nuclear tests produce seismic, atmospheric, and supply-chain signatures. AGI development happens in datacenters and ships as model weights; verification regimes are still largely speculative.
Faster timelines. Decades of nuclear arms racing left room to build diplomatic infrastructure around it. AGI timelines may compress this to years.
Self-destruction, not deterrence. The MAD-equivalent for AGI isn’t “I retaliate if you launch” but “your own creation destroys you” — which deters less because it’s probabilistic and contested.

AGI isn’t unique in being a Prisoner’s Dilemma over an existential prize. It’s that the institutional and physical stabilizers that turned nukes into a (precarious) equilibrium aren’t here yet.

What could change the game

If the transformation doesn’t happen on its own, what could induce it?

Verifiable compute monitoring. The closest historical analogue is nuclear test verification — making racing detectable changes the payoff for unilateral defection. Compute is one of the few legible inputs to AGI development.
Shared evidence of alignment failure. Concrete demonstrations that powerful models fail in ways their operators didn’t intend collapse the belief that the racer cleanly captures (4).
Mutual vulnerability. Cyber, model exfiltration, and open-weights diffusion mean a “winner” is unlikely to stay a winner. Making this legible shifts the perceived durability of the prize.
Common-knowledge constraints. Treaty-style commitments on training compute or specific capabilities — even partial — change the equilibrium, because each player needs to know that the other knows.

None of these guarantees a transformation. Each chips at one of the assumptions holding the Prisoner’s Dilemma in place.

The shape of the problem

Most global coordination challenges resolve when actors come to see a larger mutual prize and shift from Prisoner’s Dilemma to Stag Hunt. AGI doesn’t have the resource dynamics or repeated interactions that drive that shift elsewhere, and unlike nuclear weapons, the stabilizers that produced even precarious cooperation aren’t here. The misalignment between players and humanity is built into the payoff structure itself — and the game stays a Prisoner’s Dilemma until evidence about the real expected value of defection becomes common knowledge among the people racing.