Extended Data Table 7 Results of a tournament between different variants of AlphaGo

From: Mastering the game of Go with deep neural networks and tree search

  1. Evaluating positions using rollouts only (αrp, αr), value nets only (αvp, αv), or mixing both (αrvp, αrv); either using the policy network pσ(αrvp, αvp, αrp), or no policy network (αrvp, αvp, αrp), that is, instead using the placeholder probabilities from the tree policy pτ throughout. Each program used 5 s per move on a single machine with 48 CPUs and 8 GPUs. Elo ratings were computed by BayesElo.