Extended Data Fig. 5: The neural network, featuring symmetrized axial attention layers.
(a) An overview of the neural network, where the input is the current state (tensor and past actions) and the outputs are distributions over the next action to play (given by the policy head) and over the estimated return (given by the value head). (b) A symmetrized axial attention layer, the building block of the neural network torso. Symmetrized axial attention incorporates a symmetrization operation after each axial attention layer to favour exchange of information between rows and columns.