How the solver works

You do not need this page to use ARTGTO. But understanding what happens during a solve helps you choose good settings, judge when a solution is "done", and trust the numbers you see. No math background needed.

The big idea: learning from regret

ARTGTO uses an algorithm family called CFR (Counterfactual Regret Minimization). The specific variant is DCFR (Discounted CFR). Here is the idea in plain terms.

The solver plays the spot against itself, over and over. Both players start with no plan at all. After each pass through the game tree, the solver asks, at every decision point and for every hand:

Note

"How much better would this hand have done with a different action than the one my current strategy chose?"

That difference is called regret. If betting would have earned more than checking, the bet accumulates positive regret. Next pass, actions with positive regret get played more often — in proportion to how much regret they have. Actions that keep disappointing get played less, eventually never.

Both players adapt to each other at the same time. Player A's new strategy creates new problems for player B, B adapts, which creates new problems for A, and so on. Over many passes, this argument settles down: each strategy becomes the best possible answer to the other. That stable point is the equilibrium — the GTO strategy you study.

Two refinements make DCFR much faster than plain CFR:

Discounting. The first passes are noisy — both players are still playing badly, so the regrets they generate are based on bad opposition. DCFR shrinks the weight of old regrets over time, so early mistakes wash out quickly instead of haunting the strategy for thousands of passes.
Alternating updates. Instead of both players updating at once, they take turns: one player updates its strategy while the other holds still, then they swap. Each player always reacts to the opponent's latest strategy, not a stale one. Measured on poker trees, this converges 1.3 to 1.8 times faster.

Iterations

One iteration is one full pass: each player gets one update turn. The Iterations control sets the maximum number of passes (default: 250).

You can watch this live in the status bar:

Solving — iter 5/250 · exploit 2.567%

The counter on the left is the iteration; the number on the right is how far the strategy still is from perfect (see next section). The solve stops when it reaches the iteration limit or the accuracy target, whichever comes first. Most spots hit the target well before the limit.

Exploitability: how "done" is measured

A solved strategy is never mathematically perfect — it gets arbitrarily close. Exploitability measures how close.

The question it answers: if a perfect opponent knew your full strategy and played the best possible counter-strategy against it, how much would they win per hand? ARTGTO expresses this as a percentage of the pot. To compute it, the solver periodically walks the whole tree and finds the genuinely best response to the current strategy — not a guess, an exact check.

Exploitability	Meaning
2.5%	Early in the solve. The strategy still leaks noticeably.
0.30%	The default target. A perfect counter-strategy gains at most 0.3% of the pot per hand.
0.10%	Very tight. Useful for research; rarely changes any practical conclusion.

The default Target Exploit % of 0.3 is the standard convergence threshold for practical study: tight enough that the remaining error is far below what any human (or any real opponent) could exploit, loose enough that solves finish in reasonable time. Pushing lower costs disproportionately more iterations for differences you will not see in the strategy charts.

When the solve finishes, the status bar reports the final figure:

Done — 250 iters in 2:34 · exploit 0.143%

Note

If you use 16-bit compress, accuracy on strongly polarized boards cannot be refined below roughly 0.2% of the pot, no matter how many iterations you add. The compressed numbers simply do not have enough precision left to express finer adjustments. At the default 0.3% target this floor is irrelevant; it only matters if you chase targets below 0.2%. See Tree building for what compression does.

HS-DCFR vs. DCFR

DCFR has internal tuning parameters that control how aggressively old regrets are discounted. ARTGTO offers two ways to set them, in Settings under the Solver section, via the Algorithm dropdown:

Option	What it does
`HS-DCFR (tuned — fewer iterations)`	The default. Instead of fixed values, the tuning parameters follow a schedule across the run: gentle early, aggressive late, shaped by your iteration budget. Reaches the same target in 6-13% fewer iterations on the benchmark board set.
`DCFR (legacy, fixed α/β/γ)`	The classic algorithm with fixed, paper-standard parameter values. Kept for comparison and reproducing older results.

Both reach the same equilibrium — HS-DCFR is not an approximation, it just gets there sooner. There is no accuracy reason to prefer the legacy option; keep the default unless you need to reproduce a solve made with it.

screenshot

Settings window, Solver section with the Algorithm dropdown open

Determinism

ARTGTO's default solve path is fully deterministic:

Same inputs, same result. The same board, ranges, bet sizing, settings, and Solver cores count produce bit-identical strategies and EVs every time. You can re-run a solve a year later and get the same file.
Changing the thread count can change the path, not the quality. With a different Solver cores setting, the solver adds up numbers in a different order, which can nudge the convergence trajectory. On a spot sitting exactly at the target threshold, this can mean it stops a few iterations earlier or later. The equilibrium it converges to is the same.

Why this matters: deterministic solves are comparable. If two solutions of the same spot differ, something in the inputs differed — not luck.

What you see during a solve

The full sequence in the status bar, in order:

Status	What is happening
`Idle`	Nothing running.
`Building game tree…`	Laying out every action sequence. See Tree building.
`Building game tree (iso)…`	Same, with suit-equivalent cards grouped to save memory.
`Precomputing showdowns…`	Pre-calculating hand-vs-hand winners on every runout, so iterations run fast.
`Solving — iter 5/250 · exploit 2.567%`	The iterations described above.
`Done — 250 iters in 2:34 · exploit 0.143% · river-bucketed @ 200`	Finished. The `river-bucketed @ N` suffix appears only if the low-memory fallback engaged.
`Failed: [error message]`	The solve could not complete; the message says why.

Next steps

Tree building — what the solver builds before it starts iterating.
Quick start — run a solve and watch these stages live.