SPAA 2022 · 34th ACM Symposium on Parallelism in Algorithms and Architectures, Philadelphia, PA, USA, July 2022 · doi:10.1145/3490148.3538575
We study matrix multiplication in the low-bandwidth model: There are $n$ computers, and we need to compute the product of two $n \times n$ matrices. Initially computer $i$ knows row $i$ of each input matrix. In one communication round each computer can send and receive one $O(\log n)$-bit message. Eventually computer $i$ has to output row $i$ of the product matrix.
We seek to understand the complexity of this problem in the uniformly sparse case: each row and column of each input matrix has at most $d$ non-zeros and in the product matrix we only need to know the values of at most $d$ elements in each row or column. This is exactly the setting that we have, e.g., when we apply matrix multiplication for triangle detection in graphs of maximum degree $d$. We focus on the supported setting: the structure of the matrices is known in advance; only the numerical values of nonzero elements are unknown.
There is a trivial algorithm that solves the problem in $O(d^2)$ rounds, but for a large $d$, better algorithms are known to exist; in the moderately dense regime the problem can be solved in $O(dn^{1/3})$ communication rounds, and for very large $d$, the dominant solution is the fast matrix multiplication algorithm using $O(n^{1.158})$ communication rounds (for matrix multiplication over fields and rings supporting fast matrix multiplication).
In this work we show that it is possible to overcome quadratic barrier for all values of $d$: we present an algorithm that solves the problem in $O(d^{1.907})$ rounds for fields and rings supporting fast matrix multiplication and $O(d^{1.927})$ rounds for semirings, independent of $n$.
Kunal Agrawal and I-Ting Angelina Lee (Eds.): SPAA ’22, Proceedings ofthe 34th ACM Symposium on Parallelism in Algorithms and Architectures, July 11–14, 2022, Philadelphia, PA, USA, pages 435–444, ACM Press, New York, 2022
ISBN 978-1-4503-9146-7