SPSA#

What is SPSA?#

Simultaneous Perturbation Stochastic Approximation (SPSA) is an algorithm developed by Spall. It can be used if noisy and unbiased measurements of the gradient \(g(\boldsymbol{\theta}\)) are available. It can also be used if only (noisy) measurements of the loss function \(f(\boldsymbol{\theta})\) are available.

The advantage of SPSA compared to other algorithms is that only two loss measurements are required to generate an update. Therefore, SPSA is scalable.

How does SPSA work?#

Let \(\eta_i \in (0, \infty)\) be a perturbation vector and \(\Delta_i\) be a random vector such that \(\{\Delta_i\}\) is an iid sequence with \(\Delta_i(k)\) and \(1/\Delta_i(k)\) bounded and symmetric around zero. The components \(\Delta_i(k)\) are mutually independent. In practice, often the following binary random variable is used for \(\Delta_i\):

\[\mathbb{P}(\Delta_i(j) = -1) = \frac{1}{2} = \mathbb{P}(\Delta_i(j) = 1),\]

for all \(i\) and \(j\).

The SPSA algorithm is then as follows:

Binder#

If you want to experiment with the SPSA algorithm, you can use the provided Jupyter Notebook. If you want to experiment with the N-dimensional version of the algorithm, then you can use this Jupyter Notebook.

You can also run the notebooks directly in your browser by using Binder; simply click on the following button to open the SPSA notebook:

https://mybinder.org/badge_logo.svg