Extended Data Table 7 Results of a tournament between different variants of AlphaGo

Evaluating positions using rollouts only (α_rp, α_r), value nets only (α_vp, α_v), or mixing both (α_rvp, α_rv); either using the policy network p_σ(α_rvp, α_vp, α_rp), or no policy network (α_rvp, α_vp, α_rp), that is, instead using the placeholder probabilities from the tree policy p_τ throughout. Each program used 5 s per move on a single machine with 48 CPUs and 8 GPUs. Elo ratings were computed by BayesElo.

Quick links

Search