SPAA 2022 · 34th ACM Symposium on Parallelism in Algorithms and Architectures, Philadelphia, PA, USA, July 2022

We study matrix multiplication in the low-bandwidth model: There are $n$ computers, and we need to compute the product of two $n \times n$ matrices. Initially computer $i$ knows row $i$ of each input matrix. In one communication round each computer can send and receive one $O(\log n)$-bit message. Eventually computer $i$ has to output row $i$ of the product matrix.

We seek to understand the complexity of this problem in the *uniformly sparse* case: each row and column of each input matrix has at most $d$ non-zeros and in the product matrix we only need to know the values of at most $d$ elements in each row or column. This is exactly the setting that we have, e.g., when we apply matrix multiplication for triangle detection in graphs of maximum degree $d$. We focus on the *supported* setting: the structure of the matrices is known in advance; only the numerical values of nonzero elements are unknown.

There is a trivial algorithm that solves the problem in $O(d^2)$ rounds, but for a large $d$, better algorithms are known to exist; in the moderately dense regime the problem can be solved in $O(dn^{1/3})$ communication rounds, and for very large $d$, the dominant solution is the fast matrix multiplication algorithm using $O(n^{1.158})$ communication rounds (for matrix multiplication over rings).

In this work we show that it is possible to overcome quadratic barrier for *all* values of $d$: we present an algorithm that solves the problem in $O(d^{1.907})$ rounds for rings and $O(d^{1.927})$ rounds for semirings, independent of $n$.