## Introduction

The Gaussian Process is an incredibly interesting object.

Paraphrasing from Rasmussen & Williams 2006, a Gaussian Process is a stochastic process (a collection of random variables, indexed on some set) where any finite number of them have a multivariate normal distribution.

GPs can represent functions and perhaps function spaces well (by having a covariance interpretation and also a basis function interpretation I guess). By contrast, a neural network can’t fit functions as easily - it needs *a lot more* data.

An MVN is completely characterized by its mean and covariance, and in the case of a GP, we can use the covariance to make magic. For example, to create a continuous function, let \(y_d\) be a vector of points corresponding to domain points \(x_{n,d}\). To enforce continuity, all we need is:

Covariance between points is usually enforced using kernels, so for example, this is the RBF kernel:

This can be scaled (hence scaling the function along the y-axis), and a scale (the length-scale) can be applied to the domain variables (by “normalizing” them), which has the effect of scaling along the x-axis. The function enforced by this kernel is infinitely differentiable.

Knowing a few (“training”) points, one can predict all points between the training points (i.e. *smoothing*) by using the fact that, if \(x_2\) is the set of training points, the conditional distribution of the predicted points is also multivariate normal:

In this case, \(\Sigma_{ij} = k({\bf x_i}, {\bf x_j})\), but interestingly, due to the linearity of the MVN, a derivative of a GP is also a GP assuming that the mean and covariance are differentiable, in which case, the kernels are:

# Simulations

## Prior Draw from a Gaussian Process

## GP Prior Draw Code

## Posterior of a Gaussian Process

We fit a GP with an RBF kernel and lengthscale 1.0 to the points: \( \begin{bmatrix} [-0.5, -1] \ [0, 0.5] \ [0.5, 0] \end{bmatrix} \).

## GP Posterior Code

## Derivatives of the GP

I’ve written a piece to calculate the derivative kernels efficiently, but I’m still testing this:

## GP Derivative Covariances (1-D)

This seems to work and I’ve used (previous versions of it, coupled with Tensorflow’s Adam and Stan’s HMC/l_BFGS) to solve differential equations:

## Modeling SDE equivalents of GPs

There’s some great literature out there about modeling GPs as solutions of differential equations with a random component, but before I encountered that, the following was a brute-force attempt to model the functions \(\mu(X_t), \sigma(X_t)\) where \(X_t\) is a continuous time stochastic process and \(W_t\) is the standard Weiner process:

When \(X_t\) is a Gaussian Process, equating the Euler-Maruyama representation of the SDE above with the GP expressed as \(LZ\) where \(L\) is the Cholesky-decomposition of the covariance matrix, results in the random normal vector \(Z\) being exactly equal to the random part of the SDE: \(dW_t = \sqrt{\Delta t} Z\).

Hence modeling the functions \(\mu, \sigma\) and minimizing the distance between \(Z, dW_t\) is a way to obtain those functions *without* solving the SDE. When this is mathematically infeasible, the algorithm fails.

Simo Särkkä has shown that GPs can be written as state space models, which are models of the form:

For the centered Matern GP:

## Python code for Matern SSM-GP Simulation

I’ve got the suspicion that the RBF GP can’t be represented this way (easily) because it’s infinitely differentiable; the Matern GP isn’t, so a derivative of a high enough order would appear to be random (which is set to \(u(t)\) and integrating that many times over would lead to a nice function. So, it’s probably unsurprising that the RBF needs to be written as its infinite series representation and *then* represented as a SSM.

## 2019

### Random Stuff

## Random Stuff

### Stochastic Bernoulli Probabilities

Consider: