I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.

Fast Toeplitz Matrix-Vector Products and Solving

Circulant matrix-vector products can be fast due to the way circulant matrices can be decomposed using their fourier transforms. Toeplitz matrices can be ‘embedded’ into a circulant matrix and their matrix-vector products can be computed efficiently too. Sample code is shown below.

Main reference:

J. Dongarra, P. Koev, and X. Li, “Matrix-Vector and Matrix-Matrix Multiplications”. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors. “Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide”. SIAM, Philadelphia, 2000. Available online.

This can then be used to compute multivariate log-densities quickly. Furthermore, this multiplication can then be used in conjugate gradient solvers to efficiently compute inverse-matrix-vector products in \(O(n \log n)\) time and \(O(n)\) space.

Python Code
import numpy as np
from scipy.linalg import toeplitz

def toeplitz_matmul(c, r = None, v = None):
	if v is None: v = np.zeros(c.shape)
	if r is None: r = c.conj()
        if len(v.shape) == 1: v = v.reshape(-1, 1)

	n = c.shape[0]; assert c.shape == r.shape
	embeded_col = np.hstack((c, np.flip(r[1:])))
	padded_v = np.vstack((v, np.zeros((n - 1, v.shape[1]))))
	fft_T = np.fft.fft(embeded_col, axis = 0).reshape(-1, 1)
	fft_v = np.fft.fft(padded_v, axis = 0)
	return np.fft.ifft(fft_T*fft_v, axis = 0).real[:n, :]

I’m trying to contribute some code to SciPy, pull request can be found here.

Toeplitz matrices can be inverted in quadratic time using the Levinson algorithm, that also has the capability of returning a determinant as a side product using no more computation (as I understand).

Some code that calculates this log determinant can be found here:

Marano S, Edwards B, Ferrari G and Fah D (2017), “Fitting Earthquake Spectra: Colored Noise and Incomplete Data”, Bulletin of the Seismological Society of America., January, 2017. Vol. 107(1), pp. 276-291. Implementation available online..

The fast matrix-vector multiplication can be used with a preconditioned conjugate gradient algorithm to implement fast inverse-vector products.

Toeplitz Matrix Cholesky Decomposition

… and also circulant matrix solving in the comments (using scipy and ctypes).

I got the toeplitz_cholesky library from here and compiled it. I’m going to check out toeblitz in the future. I believe that the following algorithm calculates the cholesky decomposition in \(O(n^2)\) time.

C/Python Code
# in bash: clang -shared -fpic toeplitz_cholesky.c -o tc.dylib -O3

import time
import numpy as np
import matplotlib.pyplot as plt

plt.style.use("ggplot"); plt.ion()
# sc.linalg.solve_circulant, append t[(n - 2):0:-1]

dll = np.ctypeslib.load_library("tc", ".")
np_poin = np.ctypeslib.ndpointer

def kernel(n = 100):
	k = np.linspace(0, 5, n)
	k = np.exp(-(k - k[0])**2)
	k[0] += 1e-10
	return k

def toep_chol_prepare(n):
	type_input_1 = np.ctypeslib.ctypes.c_int64
	type_input_2 = np_poin(dtype = np.double, ndim = 1, shape = n)
	type_output = np_poin(dtype = np.double, ndim = 2, shape = (n, n))

	dll.t_cholesky_lower.argtypes = [type_input_1, type_input_2]
	dll.t_cholesky_lower.restype = type_output
	return dll.t_cholesky_lower

def timer(n = 100):
	func_ptr = toep_chol_prepare(n)
	tic = time.time()
	L = func_ptr(n, kernel(n))
	toc = time.time()
	return toc - tic, L

if __name__ == "__main__":
	runtime, L = timer(10000) # half a second!
	x = np.matmul(L.T, np.random.normal(size = 10000))
	input("Press the enter key to quit.")

Efficient Inference of GP Covariances

Sometimes, it is easier to do parameter inference in the frequency domain. One can use SymPy (or another symbolic computation program) to get the theoretical spectrum of a GP (using a Fourier transform of the covariance - note that to get from samples to the PSD, the PSD is defined as the expected value of the series squared due to Wiener-Khinchin) and we use the Whittle likelihood (I believe) to fit the parameters of the process using the spectrum. Similar ideas can be found in William Wilkinson’s papers and Turner and Sahani (2014).

I was writing a Gaussian Process Vocoder that synthesizes speech from mel spectrograms (using an LSTM to get from the mel spectrograms to the spectral kernel’s parameters), but the whole thing looks too similar to Tokuda & Zen (Directly Modelling Speech Waveforms …) - which I discovered after writing a good chuck of the code. The implementation used the spectral kernel to obtain a zero mean GP with the right frequencies and block-stationary treatments of these non-stationary GPs to synthesize audio. Tokuda & Zen used an LSTM-RNN to obtain the cepstral coefficients, while I used it to obtain spectral parameters.

Have a look at the Speech Synthesis page for code.

Sampling Periodic GPs using the State Space Representation

The representation is due to Solin & Sarkka (2014).

Python Code
import numpy as np
from scipy.special import iv

Adapted from matlab code from Arno Solin and Simo Sarkka (2014).
Explicit Link Between Periodic Covariance Functions and
State Space Models. Available at https://users.aalto.fi/~asolin/.

magnSigma2   : Magnitude scale parameter
lengthScale  : Distance scale parameter
period       : Length of repetition period
mlengthScale : Matern lengthScale
N            : Degree of approximation
nu           : Matern smoothness parameter
mN           : Degree of approximation if squared exonential
valid        : If false, uses Bessel functions

df(t)/dt = F f(t) + L w(t)
spectral denisty of w(t) is Qc
y_k = H f(t_k) + r_k, r_k ~ N(0, R)


def reformat(conv_function):
    def wrapper(*args, **kwargs):
        F, L, Qc, H, Pinf = conv_function(*args, **kwargs)
        if len(L.shape) == 1: L = np.diag(L)
        if len(H.shape) == 1: H = H.reshape(1, -1)
        if type(Qc) is not np.ndarray: Qc = np.array(Qc).reshape(1, 1)
        if len(Qc.shape) == 2: Qc = np.diag(Qc)
        return F, L, Qc, H, Pinf
    return wrapper

class StateSpaceModels:

    def Periodic(magnSigma2 = 1, lengthScale = 1, period = 1, N = 6):

        q2 = 2 * magnSigma2 * np.exp(-1/lengthScale**2) *\
            iv(np.arange(0, N + 1), 1/lengthScale**2)
        q2[0] *= 0.5

        w0 = 2*np.pi/period
        F = np.kron(np.diag(np.arange(0, N + 1)), [[0, -w0], [w0, 0]])
        L = np.eye(2 * (N + 1))
        Qc = np.zeros(2 * (N + 1))
        Pinf = np.kron(np.diag(q2), np.eye(2))
        H = np.kron(np.ones((1, N + 1)), [1, 0])

        return F, L, Qc, H, Pinf

    def Cosine(magnSigma2 = 1, period = 1):

        w0 = 2*np.pi*period
        F = np.array([[0, -w0], [w0, 0]])
        L = np.eye(2)
        Qc = np.zeros(2)
        Pinf = np.eye(2) * magnSigma2
        H = np.array([1, 0])

        return F, L, Qc, H, Pinf

    def Matern52(magnSigma2 = 1, lengthScale = 1):

        lambda_ = np.sqrt(5) / lengthScale
        F = np.array([
            [          0,             1,          0],
            [          0,             0,          1],
            [-lambda_**3, -3*lambda_**2, -3*lambda_]])
        L = np.array([[0], [0], [1]])
        Qc = (400*np.sqrt(5)*lengthScale*magnSigma2) / (3 * lengthScale**6)
        H = np.array([1, 0, 0])

        kappa = 5*magnSigma2 / (3*lengthScale**2)
        Pinf = np.array([
        	[magnSigma2, 0    , -kappa],
        	[0         , kappa,      0],
        	[-kappa    , 0    , 25*magnSigma2/lengthScale**4]])

        return F, L, Qc, H, Pinf

    def Quasiperiodic(magnSigma2 = 1, lengthScale = 1, period = 1,
        mlengthScale = 1, N = 6, use_cosine = False):
        # unused args: nu = 3/2, mN = 6

        if use_cosine:
            F1, L1, Qc1, H1, Pinf1 = StateSpaceModels.Cosine(1, period)
            F1, L1, Qc1, H1, Pinf1 = StateSpaceModels.Periodic(1, lengthScale, period, N)

        F2, L2, Qc2, H2, Pinf2 = StateSpaceModels.Matern52(magnSigma2, mlengthScale)

        F = np.kron(F1, np.eye(len(F2))) + np.kron(np.eye(len(F1)), F2)
        L = np.kron(L1, L2)
        Qc = np.kron(Pinf1, Qc2)

        Pinf = np.kron(Pinf1, Pinf2)
        H = np.kron(H1, H2)

        return F, L, Qc, H, Pinf

SSM = import('ssm')$StateSpaceModels

example_ssm = SSM$Periodic()
F  = example_ssm[[1]]
L  = example_ssm[[2]]
Qc = example_ssm[[3]]
H  = example_ssm[[4]]

# naive implementation (I'll code up the exact solution later)
NumericVector sim(arma::mat F,
                  arma::mat L,
                  arma::vec Qc,
                  arma::vec H,
                  double dt = 0.01,
                  size_t n = 10000) {

    NumericVector x(n);
    arma::vec f(F.n_rows);
    arma::vec u(L.n_rows);
    f = rnorm(F.n_rows);

    arma::mat Q_root = sqrt(Qc * dt);

    for (int i = 0; i < n; i++) {
        u = rnorm(L.n_rows);
        f = f + F*f*dt + (L*Q_root)%u;
    	x[i] = arma::dot(H, f);
    return x;
}', depends = 'RcppArmadillo')

Efficient inverse-matrix multiplication

\[L^T / (L / y) = (LL^T)^{-1}y\]
import numpy as np

naive = np.linalg.solve(A, b)

L = np.linalg.cholesky(A)
efficient = np.linalg.solve(L.T, np.linalg.solve(L, b))

Using einsum for vectorizing matrix ops

An old theoretical physics course that I took introduced me to the wonders of the Einstein summation convention.

I had to vectorize some matrix operations recently for multivariate normal liklihood calculations, \(X * X^T\) and \(Y * X\), on a large list of matrices (shape \((n, k, k)\)). Using einsum for this gives a massive performance boost. It’s great.

import numpy as np

X = np.arange(16).reshape(4, 2, 2)
Sa = np.einsum('aij,akj->aik', X, X)

np.all([np.all(Sa[i, ...] == X[i, ...] @ X[i, ...].T) for i in range(4)])

Y = np.arange(4).reshape(2, 2)
Sb = np.einsum('ij,ajk->aik', Y, X)

np.all([np.all(Sb[i, ...] == Y @ X[i, ...]) for i in range(4)])



Efficient Gaussian Process Computation

I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.

4 min read

Gaussian Processes in MGCV

I lay out the canonical GP interpretation of MGCV’s GAM parameters here. Prof. Wood updated the package with stationary GP smooths after a request. I’ve run through the predict.gam source code in a debugger, and mainly, the computation of predictions follows:

~1 min read


I wanted to see how easy it was to do photogrammetry (create 3d models using photos) using PyTorch3D by Facebook AI Research.

1 min read

Dead Code & Syntax Trees

This post was motivated by some R code that I came across (over a thousand lines of it) with a bunch of if-statements that were never called. I wanted an automatic way to get a minimal reproducing example of a test from this file. While reading about how to do this, I came across Dead Code Elimination, which kills unused and unreachable code and variables as an example.

1 min read
Back to Top ↑



I used to do a fair bit of astrophotography in university - it’s harder to find good skies now living in the city. Here are some of my old pictures. I’ve kept making rookie mistakes (too much ISO, not much exposure time, using a slow lens, bad stacking, …), for that I apologize!

1 min read

Probabilistic PCA

I’ve been reading about PPCA, and this post summarizes my understanding of it. I took a lot of this from Pattern Recognition and Machine Learning by Bishop.

1 min read

Modelling with Spotify Data

The main objective of this post was just to write about my typical workflow and views rather than come up with a great model. The structure of this data is also outside my immediate domain so I thought it’d be fun to write up a small diary on making a model with it.

6 min read

Morphing with GPs

The main aim here was to morph space inside a square but such that the transformation preserves some kind of ordering of the points. I wanted to use it to generate some random graphs on a flat surface and introduce spatial deformation to make the graphs more interesting.

2 min read

SEIR Models

I had a go at a few SEIR models, this is a rough diary of the process.

4 min read

Speech Synthesis

The initial aim here was to model speech samples as realizations of a Gaussian process with some appropriate covariance function, by conditioning on the spectrogram. I fit a spectral mixture kernel to segments of audio data and concatenated the segments to obtain the full waveform. Partway into writing efficient sampling code (generating waveforms using the Gaussian process state space representation), I realized that it’s actually quite easy to obtain waveforms if you’ve already got a spectrogram.

5 min read
Back to Top ↑


Gaussian Process Middle C

First of my experiments on audio modeling using Gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.

1 min read
Back to Top ↑


Back to Top ↑