A distributed algorithm for computing rational Krylov subspaces

Mikhail Pak, Technical University of Munich

SIAM ALA 2018, 7 May 2018

Outline

Introduction

Rational Krylov subspace (simplified):


$$ \mathcal{R}_{\ell}(A, b) = \mathrm{span}\!\left\{ b, \ A^{-1} b, \ A^{-2} b, \ \ldots,\ A^{-\ell + 1} b \right\} $$


with invertible $A \in \mathbb{R}^{n \times n}$, $b \in \mathbb{R}^{n}$, $\ell \ll n$

Useful for solving large eigenproblems or for model order reduction of large dynamical systems

Introduction

Can be computed with the rational Arnoldi algorithm:

  • $x^{[0]} \leftarrow b$

  • for $p = 1, \ldots, \ell - 1$, compute $x^{[p]} \leftarrow A^{-1} x^{[p - 1]}$

  • return $\mathrm{span} \{ x^{[0]}, \ldots, x^{[\ell - 1]} \}$

... that sounds almost too easy!

Introduction

We need some special sauce:

  • do some work upfront to make computation of subsequent directions $x^{[p]}$ cheaper

  • reorthonormalize each direction $x^{[p]}$ immediately after computation

Introduction

Without doing work upfront:


Introduction

With doing work upfront:


Motivation

Recent trends:

  • larger problems, more data

  • memory-bound instead of compute-bound

  • distributed data

As a result, development of memory access-conscious algorithms, e.g. randomized SVD (Halko et al., 2011)

Related work

Mou et al. 2015: “A distributed algorithm for solving a linear algebraic equation”

Solves $A x = b$ distributedly by a time-variant network of $m$ agents with limited knowledge, i.e. each agent has access only to a subset of rows:

$$A_{i} \in \mathbb{R}^{n_i \times n}, \ A = \mathrm{vertcat}(A_1, \ldots, A_m)$$

$$b_{i} \in \mathbb{R}^{n_i}, \ b = \mathrm{vertcat}(b_1, \ldots, b_m)$$

$$n = \textstyle\sum_{i = 1}^{m} n_i$$

Related work

The problem is distributed onto agents:


Related work

Update rule for the estimate $x_i(t)$ of the $i$-th agent:

$$ x_{i}(t + 1) = z_{i}(t) - P_{i} \left( z_{i}(t) - \frac{1}{m_{i}(t)} \sum_{j \in \mathcal{N}_{i}(t)} x_{j}(t) \right) $$


Here, $z_{i}(t) \in \mathbb{R}^{n}$ satisfies $A_{i} z_{i}(t) = b_{i}$, $P_{i} \in \mathbb{R}^{n \times n}$ is an orthogonal projector onto the kernel of $A_{i}$, $\mathcal{N}_{i}(t)$ are neighbors of $i$-th agents, and $m_{i}(t) = | \mathcal{N}_{i}(t) |$

Related work

Mou et al.'s main result:

If $A$ is invertible and the network is repeatedly jointly strongly connected, then there exists $0 < \lambda < 1$ for which all $x_{i}(t)$ converge to the solution to $A x = b$ as fast as $\lambda^{t}$ converges to $0$

Linear convergence under some mild assumptions on network connectivity

Related work

Some remarks:

  • no assumptions on matrix sparsity or structure

  • each agent requires $\Theta(n \cdot m_{i}(t))$ communication at each time step

  • the most expensive operation is the projection $P_{i}$

  • convergence is quite slow in practice ($\lambda \approx 1$)

(ノ﹏ヽ)

Proposed algorithm

Problem statement:

Compute $\mathcal{R}_{\ell}(A, b)$ distributedly by agents with limited knowledge as quickly as possible


Naïve solution: Use Mou et al.'s algorithm as the linear equation solver within the rational Arnoldi process

Problem: No possibility for speed-up by doing work upfront

Proposed algorithm

Cannot do work upfront:


Proposed algorithm

Idea: Use the current, agent-specific estimate for $x^{[p]} = A^{-1} x^{[p - 1]}$ as the right hand side for the subsequent rational Krylov direction

In other words, compute $(p + 1)$-th direction while converging to the $p$-th direction

Proposed algorithm

Iterate all directions at the same time:


Proposed algorithm

Update rule for the $i$-th agent:

$$ x_{i}^{[p]}(t + 1) = z_{i}^{[p]}(t) - P_{i} \left( z_{i}^{[p]}(t) - \frac{1}{m_{i}(t)} \sum_{j \in \mathcal{N}_{i}(t)} x_{j}^{[p]}(t) \right) $$


Here, $p = 1, 2, \ldots, \ell - 1$, $x_{i}^{[0]}(t) \equiv b \ \forall i$, and $z_{i}^{[p]}(t) \in \mathbb{R}^{n}$ satisfies $A_{i} z_{i}^{[p]}(t) = S_{i} x_{i}^{[p - 1]}(t)$

($S_{i} \in \mathbb{R}^{n_i \times n}$ is a “selector” matrix s.t. $b_{i} = S_{i} b$)

Proposed algorithm

Main result:

If $A$ is invertible and the network is repeatedly jointly strongly connected, then there exists $0 < \lambda < 1$ for which all $x_{i}^{[p]}(t)$ converge to the solution to $A^{p} x = b$ as fast as $t^{p - 1} \lambda^{t}$ converges to $0$ for all $p = 1, \ldots, \ell - 1$

Convergence analysis and proof sketch

Joint error dynamics of the Mou et al.'s algorithm:

$$e(t + 1) = \mathcal{A}(t) \, e(t)$$


Joint error dynamics of the proposed algorithm:

$$ e^{[p]}(t + 1) = \mathcal{A}(t) \, e^{[p]}(t) + \delta^{[p - 1]}(t) + D^{[p]}(t) $$

with $p = 1, 2, \ldots, \ell - 1$

Convergence analysis and proof sketch

$$ e^{[p]}(t + 1) = \mathcal{A}(t) \, e^{[p]}(t) + \delta^{[p - 1]}(t) + D^{[p]}(t) $$


$\delta^{[p - 1]}(t)$ is the disturbance term due to the fact that each agent's estimate for $x^{[p - 1]}$ changes in time

$D^{[p]}(t)$ is the disturbance term due to the fact that each agent uses its own estimate for $x^{[p - 1]}$

Convergence analysis and proof sketch

Informal statement of Lemma 1:

Consider an exponentially stable system which zero-input response decays as $\lambda^t$, $0 < \lambda < 1$. If this system is disturbed with an input signal which decays as $t^{\kappa} \lambda^t$, then its zero-state response will decay as $t^{\kappa + 1} \lambda^t$, $\kappa = 0, 1, 2, \ldots$

Convergence analysis and proof sketch

Proof sketch (using mathematical induction)

Base case ($p = 1$): Reduced to Mou et al.

Inductive hypothesis ($p > 1$): Assume that $x_{i}^{[p - 1]}(t)$ converge to the solution to $A^{p - 1} x = b$ as fast as $t^{p - 2} \lambda^{t}$ converges to $0$

Convergence analysis and proof sketch

Inductive step:

  • show that $D^{[p]}(t)$ and $\delta^{[p - 1]}(t)$ converge to $0$ as fast as $t^{p - 2} \lambda^{t}$ converges to $0$

  • apply Lemma 1

  • $e^{[p]}(t)$ consists of the zero-input response (bounded by $\lambda^{t}$) and the zero-state response (bounded by $t^{p - 1} \lambda^{t}$)

Open questions and outlook

Summary of this contribution:

Propose a distributed algorithm for computing a rational Krylov subspace faster at the price of more communication at each time step: $\Theta(n \cdot \ell \cdot m_{i}(t))$

Provide a bound for the convergence: $t^{p - 1} \lambda^t$

Error dynamics similar to the underlying solver, hence benefits from any improvements to the Mou et al.'s algorithm

Open questions and outlook

There are some serious open issues that have to be solved before applying this method in practice:

  • how can we implement reorthonormalization without jeopardizing convergence?

  • how can we improve the convergence rate $\lambda$?

  • how can we (further) reduce the required amount of communication?

Open questions and outlook

Numerical results ($n = 1357$, $m = 23$, $\ell = 6$):



Thank you


Acknowledgements: Alexander Wischnewski, Julio F. Pérez Cruz, Maria Cruz Varona, Christopher Lerch, Philipp Niermeyer, and Boris Lohmann