Algorithms
ARTGTO ships two solver algorithms. Both find the same Nash equilibrium — they differ only in how fast they get there.
What CFR does
Counterfactual Regret Minimization (CFR) is the standard algorithm behind every modern GTO solver. It works by playing the game against itself millions of times, tracking how much each decision "regrets" not having taken a different action. Over many iterations, it adjusts the strategy to minimize total regret. The math guarantees that the average strategy converges to Nash equilibrium.
ARTGTO uses Discounted CFR (DCFR), a refinement that puts more weight on recent iterations and less on early noisy ones. This converges faster than basic CFR without changing the result.
HS-DCFR (tuned) — the default
HS-DCFR is a Hyperparameter-Scheduled variant of DCFR. Instead of using fixed discount parameters throughout the solve, it adjusts them on a schedule as iterations progress.
In plain terms: the solver starts aggressive (exploring widely), then gradually tightens as it approaches equilibrium. The schedule is shaped to your iteration budget, so it works whether you run 100 iterations or 1000.
Result: HS-DCFR reaches a given exploitability target in roughly 6-13% fewer iterations than legacy DCFR. On a 250-iteration solve, that is 15-30 fewer iterations — shaving real time off every solve. It converges to the same equilibrium, so accuracy is identical.
This is the default. You do not need to change anything to use it.
DCFR (legacy)
The original fixed-parameter DCFR. Uses constant discount parameters throughout the solve:
- alpha = 1.5 (positive-regret discount)
- beta = 0.0 (negative-regret discount — halved every iteration)
- gamma = 2.0 (strategy averaging weight exponent)
- Warmup = 30 iterations (strategy averaging starts after 30 iterations, skipping the noisy early phase)
- Periodic reset = on (resets the strategy average at powers of 4 to discard stale early data)
These were tuned empirically and work well. DCFR is available for users who want exact backward compatibility with earlier versions of ARTGTO.
Where to switch
Go to Settings → Solver → Algorithm. You will see two options:
- HS-DCFR (tuned — fewer iterations) — the default.
- DCFR (legacy, fixed alpha/beta/gamma) — the original.
The choice applies to the next solve. No restart needed.
Why results are identical in the limit
Both algorithms run the same underlying DCFR engine (step_dcfr_alternating). The only difference is the values of three discount parameters (alpha, beta, gamma) and whether they change over time.
CFR's convergence guarantee says: for any positive-weighted averaging scheme (which both use), the average strategy converges to Nash equilibrium as iterations go to infinity. The schedule only affects how fast you get there, not where you end up. At any finite budget, the two results are very slightly different (different iteration counts at the same exploitability), but both are within the solver's reported exploitability bound.