Fig. 2: Overview of AlphaTensor.
From: Discovering faster matrix multiplication algorithms with reinforcement learning
The neural network (bottom box) takes as input a tensor \({{\mathscr{S}}}_{t}\), and outputs samples (u, v, w) from a distribution over potential next actions to play, and an estimate of the future returns (for example, of \(-{\rm{Rank}}\,({{\mathscr{S}}}_{t})\)). The network is trained on two data sources: previously played games and synthetic demonstrations. The updated network is sent to the actors (top box), where it is used by the MCTS planner to generate new games.