Fig. 5: Speed-ups of the AlphaTensor-discovered algorithm.
From: Discovering faster matrix multiplication algorithms with reinforcement learning
a,b, Speed-ups (%) of the AlphaTensor-discovered algorithms tailored for a GPU (a) and a TPU (b), optimized for a matrix multiplication of size 8,192 × 8,192. Speed-ups are measured relative to standard (for example, cuBLAS for the GPU) matrix multiplication on the same hardware. Speed-ups are reported for various matrix sizes (despite optimizing the algorithm only on one matrix size). We also report the speed-up of the Strassen-square algorithm. The median speed-up is reported over 200 runs. The standard deviation over runs is <0.4 percentage points (see Supplementary Information for more details). c, Speed-up of both algorithms (tailored to a GPU and a TPU) benchmarked on both devices.