Mikhail Pak, Technical University of Munich
SIAM ALA 2018, 7 May 2018
Rational Krylov subspace (simplified):
$$ \mathcal{R}_{\ell}(A, b) = \mathrm{span}\!\left\{ b, \ A^{-1} b, \ A^{-2} b, \ \ldots,\ A^{-\ell + 1} b \right\} $$
with invertible $A \in \mathbb{R}^{n \times n}$, $b \in \mathbb{R}^{n}$, $\ell \ll n$
Useful for solving large eigenproblems or for model order reduction of large dynamical systems
Can be computed with the rational Arnoldi algorithm:
$x^{[0]} \leftarrow b$
for $p = 1, \ldots, \ell - 1$, compute $x^{[p]} \leftarrow A^{-1} x^{[p - 1]}$
return $\mathrm{span} \{ x^{[0]}, \ldots, x^{[\ell - 1]} \}$
... that sounds almost too easy!
We need some special sauce:
do some work upfront to make computation of subsequent directions $x^{[p]}$ cheaper
reorthonormalize each direction $x^{[p]}$ immediately after computation
Without doing work upfront:
With doing work upfront:
Recent trends:
larger problems, more data
memory-bound instead of compute-bound
distributed data
As a result, development of memory access-conscious algorithms, e.g. randomized SVD (Halko et al., 2011)
Mou et al. 2015: “A distributed algorithm for solving a linear algebraic equation”
Solves $A x = b$ distributedly by a time-variant network of $m$ agents with limited knowledge, i.e. each agent has access only to a subset of rows:
$$A_{i} \in \mathbb{R}^{n_i \times n}, \ A = \mathrm{vertcat}(A_1, \ldots, A_m)$$
$$b_{i} \in \mathbb{R}^{n_i}, \ b = \mathrm{vertcat}(b_1, \ldots, b_m)$$
$$n = \textstyle\sum_{i = 1}^{m} n_i$$
The problem is distributed onto agents:
Update rule for the estimate $x_i(t)$ of the $i$-th agent:
$$ x_{i}(t + 1) = z_{i}(t) - P_{i} \left( z_{i}(t) - \frac{1}{m_{i}(t)} \sum_{j \in \mathcal{N}_{i}(t)} x_{j}(t) \right) $$
Here, $z_{i}(t) \in \mathbb{R}^{n}$ satisfies $A_{i} z_{i}(t) = b_{i}$, $P_{i} \in \mathbb{R}^{n \times n}$ is an orthogonal projector onto the kernel of $A_{i}$, $\mathcal{N}_{i}(t)$ are neighbors of $i$-th agents, and $m_{i}(t) = | \mathcal{N}_{i}(t) |$
Mou et al.'s main result:
If $A$ is invertible and the network is repeatedly jointly strongly connected, then there exists $0 < \lambda < 1$ for which all $x_{i}(t)$ converge to the solution to $A x = b$ as fast as $\lambda^{t}$ converges to $0$
Linear convergence under some mild assumptions on network connectivity
Some remarks:
no assumptions on matrix sparsity or structure
each agent requires $\Theta(n \cdot m_{i}(t))$ communication at each time step
the most expensive operation is the projection $P_{i}$
convergence is quite slow in practice ($\lambda \approx 1$)
(ノ﹏ヽ)
Problem statement:
Compute $\mathcal{R}_{\ell}(A, b)$ distributedly by agents with limited knowledge as quickly as possible
Naïve solution: Use Mou et al.'s algorithm as the linear equation solver within the rational Arnoldi process
Problem: No possibility for speed-up by doing work upfront
Cannot do work upfront:
Idea: Use the current, agent-specific estimate for $x^{[p]} = A^{-1} x^{[p - 1]}$ as the right hand side for the subsequent rational Krylov direction
In other words, compute $(p + 1)$-th direction while converging to the $p$-th direction
Iterate all directions at the same time:
Update rule for the $i$-th agent:
$$ x_{i}^{[p]}(t + 1) = z_{i}^{[p]}(t) - P_{i} \left( z_{i}^{[p]}(t) - \frac{1}{m_{i}(t)} \sum_{j \in \mathcal{N}_{i}(t)} x_{j}^{[p]}(t) \right) $$
Here, $p = 1, 2, \ldots, \ell - 1$, $x_{i}^{[0]}(t) \equiv b \ \forall i$, and $z_{i}^{[p]}(t) \in \mathbb{R}^{n}$ satisfies $A_{i} z_{i}^{[p]}(t) = S_{i} x_{i}^{[p - 1]}(t)$
($S_{i} \in \mathbb{R}^{n_i \times n}$ is a “selector” matrix s.t. $b_{i} = S_{i} b$)
Main result:
If $A$ is invertible and the network is repeatedly jointly strongly connected, then there exists $0 < \lambda < 1$ for which all $x_{i}^{[p]}(t)$ converge to the solution to $A^{p} x = b$ as fast as $t^{p - 1} \lambda^{t}$ converges to $0$ for all $p = 1, \ldots, \ell - 1$
Joint error dynamics of the Mou et al.'s algorithm:
$$e(t + 1) = \mathcal{A}(t) \, e(t)$$
Joint error dynamics of the proposed algorithm:
$$ e^{[p]}(t + 1) = \mathcal{A}(t) \, e^{[p]}(t) + \delta^{[p - 1]}(t) + D^{[p]}(t) $$
with $p = 1, 2, \ldots, \ell - 1$
$$ e^{[p]}(t + 1) = \mathcal{A}(t) \, e^{[p]}(t) + \delta^{[p - 1]}(t) + D^{[p]}(t) $$
$\delta^{[p - 1]}(t)$ is the disturbance term due to the fact that each agent's estimate for $x^{[p - 1]}$ changes in time
$D^{[p]}(t)$ is the disturbance term due to the fact that each agent uses its own estimate for $x^{[p - 1]}$
Informal statement of Lemma 1:
Consider an exponentially stable system which zero-input response decays as $\lambda^t$, $0 < \lambda < 1$. If this system is disturbed with an input signal which decays as $t^{\kappa} \lambda^t$, then its zero-state response will decay as $t^{\kappa + 1} \lambda^t$, $\kappa = 0, 1, 2, \ldots$
Proof sketch (using mathematical induction)
Base case ($p = 1$): Reduced to Mou et al.
Inductive hypothesis ($p > 1$): Assume that $x_{i}^{[p - 1]}(t)$ converge to the solution to $A^{p - 1} x = b$ as fast as $t^{p - 2} \lambda^{t}$ converges to $0$
Inductive step:
show that $D^{[p]}(t)$ and $\delta^{[p - 1]}(t)$ converge to $0$ as fast as $t^{p - 2} \lambda^{t}$ converges to $0$
apply Lemma 1
$e^{[p]}(t)$ consists of the zero-input response (bounded by $\lambda^{t}$) and the zero-state response (bounded by $t^{p - 1} \lambda^{t}$)
Summary of this contribution:
Propose a distributed algorithm for computing a rational Krylov subspace faster at the price of more communication at each time step: $\Theta(n \cdot \ell \cdot m_{i}(t))$
Provide a bound for the convergence: $t^{p - 1} \lambda^t$
Error dynamics similar to the underlying solver, hence benefits from any improvements to the Mou et al.'s algorithm
There are some serious open issues that have to be solved before applying this method in practice:
how can we implement reorthonormalization without jeopardizing convergence?
how can we improve the convergence rate $\lambda$?
how can we (further) reduce the required amount of communication?
Numerical results ($n = 1357$, $m = 23$, $\ell = 6$):
Acknowledgements: Alexander Wischnewski, Julio F. Pérez Cruz, Maria Cruz Varona, Christopher Lerch, Philipp Niermeyer, and Boris Lohmann