Opinions inherit from parties. Values don't. Polibench strips out every partisan trigger-word and measures the value primitives underneath — locating people and frontier models by where their answers flip, not by what they agree to.
01 · The manifesto
Almost every political issue worth debating is a tradeoff — that's what makes it worth debating. And watch any real debate closely: sometimes people are arguing about facts, but just as often they agree on every fact and still disagree. That residue — fundamental value tradeoffs in the face of the same information — is, we think, the actual essence of politics.
Push a debate far enough and it lands on a “to what extent?”question. Two people arguing over legalizing gambling or drugs can agree completely about the risks to the user and still come down on opposite sides — because what they actually disagree about is how far society may limit personal freedom to prevent people from harming themselves. They haven't hit an impasse. They've hit a political primitive.
This project does two things: discover and codify those primitives, then use them to benchmark people — and language models. Strip out the trigger words, measure the primitive directly, and you measure someone's actual nature instead of their default party alliance.
02 · The setup
They need rules. Not a constitution, not a party platform — just answers to the handful of questions any group of people living together is forced to answer. What may one person do to themselves? What may they do to everyone else, a little bit at a time? Who must help whom? Who counts?
Every political fight you have ever watched is one of these questions wearing a costume. A climate position is substantially a “do future people count?” answer. Drug policy, gambling, and helmet laws are one question asked three ways. The costume — the partisan vocabulary, the named substances, the tribal flags — is what lets people answer from coalition memory instead of from their own values.
So we take the costume off. Polibench asks the underlying questions directly, in this bare little society where no party has ever existed. Nine questions — the primitives every platform, every ideology, every argument compiles down to.
One deliberate simplification: the society is a closed game — one hundred people, no one arriving, no one leaving. That scopes out issues that are fundamentally about membership itself, like immigration. Those need a second society to even pose, so v1 leaves them aside.
Primitive 1 of 9 · Paternalism
One question, not many. Strip away the substances, the vehicles, the vices, and every version reduces to the same thing: how much liberty may be restricted to prevent harm that falls only on the person choosing it?
Debates that are really this question
Primitive 2 of 9 · Externalities
Two forces that don't have to agree: statistical harm (one act, a small chance of catastrophe for someone) and aggregation (no individual act matters, only the sum crossing a threshold). Most regulation arguments are a fight over where these sit.
Debates that are really this question
Primitive 3 of 9 · Solidarity
The “even if fair” clause is what makes this a clean measurement. Arguments about who deserves their lot are a different question. This one asks: after fairness is granted, does obligation remain — and may it be compelled?
Debates that are really this question
Primitive 4 of 9 · Moral circle
A weighting function over persons by social and temporal distance. Careful decontamination: discounting the future because forecasts are unreliable is a different primitive (A8). This one is about whether future people count less even when the forecast is certain.
Debates that are really this question
Primitive 5 of 9 · Tradition
The clean residue after two extractions: “old things encode lessons we can't see” is epistemic caution (A8), and gut-level aversion is a perception, not a value (measured separately). What remains: is continuity a good in itself? Includes the tolerate-versus-affirm gradient.
Debates that are really this question
Primitive 6 of 9 · Group-conscious rules
Two independent sub-questions people conflate: may a trait ever count against you, and may it ever count for you? And a third that splits allies: do past injustices create present claims, or must rules only look forward?
Debates that are really this question
Primitive 7 of 9 · Power-restraint
High: better impotent than abusable. Low: gridlock is worse than abuse. Measured with the same scenario re-cast as a government agency, a dominant platform, a union, a church — the average is your restraint, the spread reveals which power you fear. That spread is where partisanship actually lives.
Debates that are really this question
Primitive 8 of 9 · Precaution vs. permission
Allowed until proven harmful, or forbidden until proven safe? Pure burden-of-proof placement under genuine uncertainty. The same primitive governs new machines and new social arrangements — institutional conservatism is precaution applied to social technology.
Debates that are really this question
Primitive 9 of 9 · Retributive desert
What punishment is for. If sanctions are only for deterrence and protection, punishment is engineering. If wrongdoing deserves suffering even when nothing is prevented, that's retribution — the value core of every criminal-justice argument, askable without a single loaded word.
Debates that are really this question
03 · The output
The output is a set of coordinates: each model located on nine de-loaded value axes, reported as human-population percentiles. “Model X sits at the 73rd percentile of humans on paternalism.” Not left or right — a position estimate, with a sharpness and a pressure-resistance estimate attached.
The same instrument runs on people, which yields the more interesting quantity: the residual. Where your measured primitives predict one issue position and you report another, the gap is informative — it estimates how much of that opinion is your own values and how much is inherited from your coalition.