## Introduction

The Gaussian Process is an incredibly interesting object.

Paraphrasing from Rasmussen & Williams 2006, a Gaussian Process is a stochastic process (a collection of random variables, indexed on some set) where any finite number of them have a multivariate normal distribution.

Sample paths of GPs can represent functions. They also have many different interpretations, e.g. in the sense of covariances or basis function representations.

An MVN is completely characterized by its mean and covariance, and in the case of a GP, we can use the covariance to specify what sort of a function space we want to look at. For example, to create a representation of a continuous function, let $$y_d$$ be a vector of points corresponding to domain points $$x_{n,d}$$. To enforce continuity, all we need is:

Covariance between points is usually enforced using kernels, so for example, this is the RBF kernel:

This can be scaled (hence scaling the function along the y-axis), and a scale (the length-scale) can be applied to the domain variables. The function enforced by this kernel is infinitely differentiable, because the kernel itself is infinitely differentiable.

Knowing a few (“training”) points, one can predict all points between the training points (i.e. smoothing) by using the fact that, if $$x_2$$ is the set of training points, the conditional distribution of the predicted points is also multivariate normal:

In this case, $$\Sigma_{ij} = k({\bf x_i}, {\bf x_j})$$, but interestingly, due to the linearity of the MVN, a derivative of a GP is also a GP assuming that the mean and covariance are differentiable, in which case, the kernels are:

# Simulations

## Prior Draw from a Gaussian Process

GP Prior Draw Code ## Posterior of a Gaussian Process

We fit a GP with an RBF kernel and lengthscale 1.0 to the points: $$\begin{bmatrix} [-0.5, -1] \ [0, 0.5] \ [0.5, 0] \end{bmatrix}$$.

GP Posterior Code ## Derivatives of the GP

I’ve written a piece to calculate the derivative kernels efficiently, but I’m still testing this:

GP Derivative Covariances (1-D)

This seems to work and I’ve used (previous versions of it, coupled with Tensorflow’s Adam and Stan’s HMC/l_BFGS) to solve differential equations:

$\frac{dy}{dx} = sin(xy);$ $y(0) = 1$ ## Modeling SDE equivalents of GPs

There’s some great literature out there about modeling GPs as solutions of differential equations with a random component, but before I encountered that, the following was a brute-force attempt to model the functions $$\mu(X_t), \sigma(X_t)$$ where $$X_t$$ is a continuous time stochastic process and $$W_t$$ is the standard Weiner process:

When $$X_t$$ is a Gaussian Process, equating the Euler-Maruyama representation of the SDE above with the GP expressed as $$LZ$$ where $$L$$ is the Cholesky-decomposition of the covariance matrix, results in the random normal vector $$Z$$ being exactly equal to the random part of the SDE: $$dW_t = \sqrt{\Delta t} Z$$.

Hence modeling the functions $$\mu, \sigma$$ and minimizing the distance between $$Z, dW_t$$ is a way to obtain those functions without solving the SDE. When this is mathematically infeasible, the algorithm fails.

Simo Särkkä has shown that GPs can be written as state space models, which are models of the form:

For the centered Matern GP:

Python code for Matern SSM-GP Simulation

I’ve got the suspicion that the RBF GP can’t be represented this way (easily) because it’s infinitely differentiable; the Matern GP isn’t, so a derivative of a high enough order would appear to be random (which is set to $$u(t)$$ and integrating that many times over would lead to a nice function. So, it’s probably unsurprising that the RBF needs to be written as its infinite series representation and then represented as a SSM.

There’s also interesting literature out there about gaussian convolutional models, which in some sense, represent moving-average counterparts of the autoregressive approach above.

## 2020

### Efficient Gaussian Process Computation

I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.

### Gaussian Process Speech Synthesis (Draft)

Very untidy first working draft of the idea mentioned on the efficient computation page. Here, I fit a spectral mixture to some audio data to build a “generative model” for audio. I’ll implement efficient sampling later, and I’ll replace the arbitrary way this is trained with an LSTM-RNN to go straight from text/spectrograms to waveforms.

# Random Projects

### Gaussian Process Middle C

First of my experiments on audio modelling using gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.

Consider: