Learning Triton One Kernel at a Time: Matrix Multiplication

multiplication is undoubtedly the most typical operation carried out by GPUs. It’s the elementary constructing block of linear algebra and reveals up throughout a large spectrum of various fields comparable to graphics, physics simulations and scientific computing whereas being ubiquitous in machine studying.

In at this time’s article, we’ll break down the conceptual implementation of basic matrix-matrix multiplication (GEMM) whereas introducing a number of optimisation ideas comparable to tiling and reminiscence coalescing. Lastly, we’ll implement GEMM in Triton!

This text is the second of a collection on Triton and GPU kernels, If you’re not accustomed to Triton or want a refresher on GPU fundamentals, take a look at the earlier article! All of the code showcased on this article is obtainable on GitHub.