Agency — a derivation

Four attempts, a categorical pivot, and the system-level definition

October 23, 2025

This is the derivation companion (Post 1b) to the agency post (Post 1). That post states the result — $K_{system}$, the system-level agency definition — and shows how it behaves on the failure modes of competing objectives (utilitarian sum, negative utilitarianism, Rawlsian maximin). This post walks the path that produced the definition: four attempts at scoring an individual’s agency, the failure that forces a categorical move, and the system-level definition that follows.

It exists separately because the derivation is interesting on its own terms. The dead ends teach things the final definition doesn’t. Read it if you want the reasoning; skip it if you want the result.

Where in the series: $K_{system}$ is the candidate aggregation target proposed by the procedural framework (Post 0). Whether mechanisms exist that approximate it is the work of Post 2. What pursues the target once specified is the work of Post 3. Readers wanting the empirical motivation for the whole project should start with the why-care post.

The search for a formal definition: an example with commuters

To make this concrete, let’s use a simple multi-agent universe: a city grid with commuters. We will start by trying to define an individual’s agency.

Attempt 1: Agency as Immediate Options

tldr: count the current actions available

A simple start is to define a commuter’s agency as the number of roads available from their current position. If they are at a four-way intersection, their agency is 4; on a one-way street, it’s 1. This is intuitive but myopic; it fails to capture any sense of long-term potential. A road leading to a dead-end is not as valuable as one leading to an open highway.

Attempt 2: Agency as Future Trajectories

tldr: count the possible unique futures

A better model is to consider a commuter’s potential futures. We could define their agency as the total number of unique paths they could take from their current position. For a commuter on a 10x10 grid, this would be a vast number of possible routes.

However, this has a critical flaw: it treats all futures as equally plausible, ignoring the inherent probabilities of the environment. Imagine some roads are perfectly paved highways, while others are treacherous gravel paths with a 50% chance of causing a flat tire on any given block. A simple count would treat a 10-mile path on the highway and a 10-mile path on the gravel as equivalent futures. But in reality, the gravel path is far less likely to be completed successfully. The model counts all theoretical possibilities without grounding them in the probabilistic nature of the world.

Attempt 3: Agency as Probabilistic Futures

tldr: weight possible futures by their likelihood

To fix this, we can weight each future path by its probability of successful completion. The 10-mile highway path might have a 99.9% probability, while the 10-mile gravel path’s probability would be astronomically lower. This model might correctly predict a commuter is most likely to end up near a highway exit.

This is a significant improvement, as it reflects real-world constraints. However, the definition is still passive. It doesn’t capture the commuter’s intent. What if their goal is to get to a remote cabin only accessible by the gravel road? The model measures likely futures, not the power to override the likely future to achieve a specific goal.

Attempt 4: Agency as Achievable Goals

tldr: measure the number of goals achievable

This leads to our most sophisticated individual-level definition. Let’s define agency as the size of the set of goals an individual can achieve. This is closely related to the concept of controllability in control theory — the ability to steer the future toward a desired objective.

The definition is powerful because it is active, forward-looking, and purposeful. And yet it fails for the most fundamental reason: an individual’s set of achievable goals is not an independent property of that individual.

Consider the goal: “Park in the last open spot at the destination.” Your ability to achieve this goal—your control over this outcome—is entirely contingent on whether another commuter shares the same goal. Your agency can be created or destroyed purely by a change in someone else’s intent.

This is the core insight. Because an individual’s agency is so fundamentally entangled with the goals and strategies of the entire system, any attempt to calculate it as a single, isolated number is a futile exercise. The object of measurement is wrong.

This forces our conclusion: a meaningful definition of agency cannot be a score assigned to an individual. It must be a measure of the entire system’s capacity to handle the interacting, and often conflicting, goals of all its members.

A more robust definition: agency as a system’s capacity

This leads us to a final, more holistic definition. Instead of trying to calculate an isolated score for each individual, we will define a single agency score for the entire social system. This score measures the system’s overall capacity to enable its members to achieve their goals.

The logic is as follows:

First, we imagine a “joint goal vector,” $\gamma = (g_1, g_2, …, g_n)$. This is a specific combination of goals, one for each person in the society (e.g., commuter 1 wants to get to the hospital, commuter 2 to the airport, etc.). We will “stress test” our society by seeing how well it can satisfy this vector.
For this given goal vector, the $N$ individuals play a non-cooperative game. Each person $i$ acts independently, choosing a strategy $\sigma_i$ to maximize the probability of achieving their own goal $g_i$.
This is not a simple optimization problem. The stable outcome of these simultaneous, interacting choices is a Nash Equilibrium. This is a strategy profile $\sigma^*(\gamma)$ where no one can improve their outcome by unilaterally changing their strategy. It’s the point where everyone is doing the best they can, given what everyone else is doing.
At this equilibrium, we can define a “success score” for this joint goal. We define success as an outcome where everyone succeeds, which mathematically is the product of their individual success probabilities: $\prod_i P(g_i \mid \sigma^*(\gamma))$.
Finally, the total agency of the system, $K_{system}$, is the sum of these success scores over every possible joint goal vector the society could ever face.

This gives us a measure of the society’s general, goal-achieving power.

The formal definition

The total agency of a social system is:

\[K_{system} = \sum_{\gamma \in G_{joint}} \left[ \prod_i P(g_i \mid s_t, \sigma^*(\gamma)) \right]\]

Where $\sigma^(\gamma)$ is the Nash Equilibrium strategy profile that satisfies the following system of $N$ conditions: for every player $i$, their strategy $\sigma_i^$ in the profile $\sigma^*(\gamma)$ must be the solution to their personal optimization problem:

\[\sigma_i^* = \underset{\sigma_i}{\mathop{\text{argmax}}} P(g_i \mid s_t, \sigma_i, \sigma_{-i}^*)\]

A parallel in artificial intelligence

Interestingly, this definition of agency has a strong parallel to formal definitions of intelligence in the AI community. One of the most well-known is the Legg-Hutter model for a universal intelligence:¹

\[\Upsilon(\sigma) = \sum_{\mu \in E} 2^{-K(\mu)} V(\mu, \sigma)\]

The parallel is clear. Both our $K_{system}$ and their $\Upsilon$ are summing up an agent’s (or a system’s) performance over a vast space of possible goals or environments. We have defined freedom as a measure of a society’s collective problem-solving capacity—its ability to empower its citizens to achieve their chosen objectives.

Embedding fairness in the definition

The product operator $\prod_i P(g_i)$ does load-bearing work. It defines a “successful outcome” for a given joint goal vector as one of mutual success — a single $P(g_i) = 0$ collapses the score for that vector to zero, embedding non-domination directly. This is the Nash Social Welfare Function, a standard object in social choice.²³

The product is not strictly better than the sum. The companion agency post discusses the gladiator problem (where sum tolerates crushing minorities), the floor sensitivity of the product (which inherits Rawlsian-flavoured behaviour), and the connection between Nash-product and negative-utilitarian-sum under $s_i = -\log u_i$. Those are the what-vs-how questions. This post is about how we arrive at the what.

On the equality of goals

This framework is neutral about which goals people should have. The sum $\sum_{\gamma \in G_{joint}}$ treats every combination of human aspiration as equally worthy of consideration.

However, the consequences of these goals are not treated equally by the system’s dynamics. This influence is captured in the Nash Equilibrium $\sigma^*(\gamma)$. Your goal to become a doctor creates positive externalities that alter the equilibrium, making it easier for others to achieve their goals (e.g., “survive an illness”). Conversely, consider an individual whose goal is to “get to work as fast as possible by driving recklessly.” Their actions would create negative externalities, drastically lowering the success probabilities $P(g_i)$ for countless other commuters. Our $K_{system}$ formula inherently penalizes societies that allow such destructive goals to destabilize the system and harm others.

Where this lands

$K_{system}$ is not directly computable for any real society. The sum runs over an effectively infinite space of joint goal vectors, and the inner game has billions of players. It is a conceptual objective function — a target that lets us study toy systems formally and ask which mechanisms approximate it.

The agency post takes this definition as input and shows how it behaves on the standard objections to utilitarian, Rawlsian, and capabilities-based objectives. Post 2 takes up the next question — what kind of mechanism aggregates preferences toward $K_{system}$ — and frames political philosophy as mechanism design:

\[m^* = \underset{m}{\text{argmax}} \; K_{system}(m)\]

The best society $m^$ is the one whose rules induce a game whose Nash equilibrium produces the highest capacity for collective goal-pursuit, across all possible goal distributions. Whether democracy-shaped mechanisms approximate $m^$ is the question that post takes up.

Legg, S., & Hutter, M. (2007). Universal Intelligence: A Definition of Machine Intelligence. Minds and Machines, 17(4), 391-444. ↩
The formal study of social welfare functions was pioneered by Kenneth Arrow in his 1951 book Social Choice and Individual Values. ↩
This formulation is known as the Nash Social Welfare Function, derived from the principles in John Nash’s 1950 paper, The Bargaining Problem. ↩