Nash equilibrium & the math of GTO

GTO poker rests on one central idea from game theory — the Nash equilibrium — and one family of algorithms that makes it computable for a game as large as poker — regret minimization. You do not need the math to use a solver, but understanding it tells you exactly what a solution is and why you can trust it.

Nash equilibrium

A Nash equilibrium is a set of strategies, one per player, where no player can do better by changing only their own strategy. Everyone is already playing a best response to everyone else, so nobody has an incentive to move. The concept was introduced by mathematician John Nash in 1950 and is one of the foundational results in game theory.

In a two-player zero-sum game — which heads-up poker is, since one player's win is the other's loss — a Nash equilibrium has a powerful property: playing it guarantees you cannot lose in expectation, regardless of what your opponent does. This is the precise mathematical meaning of "unexploitable."

Note

"Zero-sum" means the money won by one player equals the money lost by the other. Heads-up poker is zero-sum; multi-way pots are not strictly zero-sum, which is part of why multiplayer GTO is harder to define.

Exploitability: measuring distance from GTO

Because a true equilibrium is often only approximated, we need a way to measure how close a strategy is. That measure is exploitability: the most an optimal opponent could win against your strategy, usually quoted in milli-big-blinds per hand (mbb/h) or as a percentage of the pot.

Exploitability of zero = a perfect, unbeatable equilibrium.
Low exploitability = very close to GTO; a tiny edge remains for a perfect counter.
High exploitability = a strategy with real, attackable weaknesses.

A solver works by driving exploitability down with each iteration. When you stop a solve at a target accuracy, that target is the residual exploitability of the solution. (ART/GTO's default target and how to read it are covered in Running a solve.)

Regret minimization

The practical engine behind modern solvers is regret minimization. The idea is intuitive:

Play the game against yourself for many iterations.
At each decision point, track regret — how much better you would have done had you taken a different action.
Shift future play toward the actions you regret not having taken more.
Average your strategy over all iterations.

A remarkable theorem guarantees that this average strategy converges to a Nash equilibrium in two-player zero-sum games. You never solve the equilibrium directly — you let it emerge from self-play.

Counterfactual regret

Poker is too big to track regret over whole strategies, so the key innovation was counterfactual regret: measuring regret locally, at each individual decision point (information set), weighted by the probability of reaching it. Minimizing counterfactual regret at every node provably minimizes overall regret — and that made equilibrium approximation tractable for games with billions of decision points.

This is the algorithm called Counterfactual Regret Minimization (CFR), introduced in 2007. Every major poker solver since — including ART/GTO — descends from it. ART/GTO specifically uses Discounted CFR (DCFR), which down-weights early, noisy iterations so the solve converges faster. The algorithms page covers ART/GTO's implementation; the research GTO is founded on traces the full academic history.

The pieces, together

Term	What it means
Nash equilibrium	Strategy set where no one can gain by changing alone — the definition of GTO
Zero-sum	One player's gain equals the other's loss; true of heads-up poker
Exploitability	How much a perfect opponent could win against you — distance from GTO
Regret	How much better an alternative action would have done in hindsight
Counterfactual regret	Regret measured locally at one decision point, weighted by reach probability
CFR	The algorithm that minimizes counterfactual regret to approximate equilibrium
DCFR	A faster, discounted variant of CFR — the algorithm ART/GTO uses

Key takeaways

A Nash equilibrium is the formal definition of GTO: no profitable unilateral deviation exists.
In heads-up (zero-sum) poker, the equilibrium is unexploitable — it cannot lose in expectation.
Exploitability measures how far a strategy is from GTO; solvers drive it toward zero.
CFR approximates the equilibrium through self-play and counterfactual regret, which is what made poker solvable.
ART/GTO uses Discounted CFR, a modern, faster member of the CFR family.