Random Projects

Inferring Gaussian Process Autocorrelation

It seems reasonable and intuitive to think that the sample autocovariance function of a stationary gaussian process to be a sufficient statistic for its covariance function, and I read that this is indeed true for certain stationary GPs with a rational spectrum. This condition is quite similar (if not the same) for GPs to possess a state space representation.

It’s also interesting that not all GPs are ergodic. GPs are mixing (and hence ergodic, I believe) if the covariance dies off to zero after a point, or if its spectrum is absolutely continuous. Loosely, this means that the distribution of the process can be inferred from just one long sample. GPs with an exponentiated sine squared (ESS) covariance function, for example, wouldn’t be ergodic.

As a consequence, the covariance function of a zero mean GP with an ESS kernel, and similar signals, cannot be inferred from a single sample. This is reasonable, as no matter how long the signal is, there’s no new information in it after a certain point. Intuitively, a sample from a zero mean GP with an ESS kernel might look like . The ESS is a kernel which is periodic, and the correlation of points spaced half a period apart is closest to zero (compared to any other pair of points), but still strictly positive. Another sample from that GP may look like .

Points one and two are closer together within each sample than across samples due to the correlation, but given just one observation of the signal (and with no knowledge of the mean of the process), it would appear that points one and two are negatively correlated.

The image below shows this; the black line is the true autocovariance function of the zero mean GP with an ESS kernel, and the boxplots show the sampling distribution of the unbiased sample autocovariance function based on single samples.

New York Conditional Taxi Dropoff Probabilities

I fit a twenty component mixture of multivariate normals, using scikit-learn, to the four dimensional new york taxi pickup/dropoffs dataset.

The dimensions look like (pickup_lat, pickup_lon, dropoff_lat, dropoff_lon). The aim is to predict the distribution of (dropoff_lat, dropoff_lon) by conditioning on (pickup_lat, pickup_lon).

Fancy ways to do this might include fitting a neural net or some kind of a gp to the conditional density, but here, I literally just fit a 4d mvn to the whole dataset. To condition, we just plug in the pickup position and renormalize (Bayes rule).

Envelope Modelling

Google’s Quick Draw dataset contains multiple observations of quickly drawn envelopes. I fit a 256-component restricted boltzmann machine (heavily overparameterised; not a great model - I know) to the data, which represents a big nasty distribution over the random field that represents an envelope image. Now, starting off with a completely random image, using Gibbs sampling, we can make our way to the typical set of the distribution, which hopefully looks like an envelope. Here’s what the burn in looks like:

Inferring the Extent of Differentiability

Let’s say that we have an observation of a noiseless function but we don’t know how smooth it is. You could probably fit a Matern GP with different smoothness parameters to see which parameter maximises the log marginal likelihood (the matern parameter corresponds to the number of times one can differentiate a sample from the gp).

Below, I’ve simulated a Matern GP with a particular parameter, and fit it using parameters ranging from . The color & label correspond to the parameter while sampling.

Modelling Audio using GPs

I used the S-PAD and the GP-PAD models from Richard Turner’s thesis to make these plots using some random audio data from the internet.

Sample stan code for this.
// S-PAD
data {
	int n;     // len
	real x[n]; // audio
parameters {
	real<lower = 0, upper = 1> l;
	real<lower = 0, upper = 10> s;
	vector<lower = -10, upper = 2>[n] sigma;
model {
	sigma[1] ~ normal(0, s);
	sigma[2:n] ~ normal(l * sigma[1:(n - 1)], s*(1 - l^2)^0.5);
	x ~ normal(0, exp(sigma));

data {
	int n;
	int n_s; // nrow of S22
	real seg_a[n]; // audio sample
	matrix[n, n_s] factor; // mvn conditional distribution shift factor S12 * S22^(-1)
	cholesky_factor_cov[n_s] K22c; // cholesky factor of S22
parameters {
	real<lower = -7.5, upper = 10> mu;
	vector<lower = -7.5, upper = 10>[n_s] sigma;
transformed parameters {
	vector[n] sigma_vec;
	sigma_vec = rep_vector(mu, n) +
		factor*(sigma - rep_vector(mu, n_s));
model {
	sigma ~ multi_normal_cholesky(rep_vector(mu, n_s), K22c);
	seg_a ~ normal(0, log(1 + exp(sigma_vec)));

Modelling my d20 dice

I fit a spline on a sphere representing my d20. The color represents the model probabilities. The distance from the centre represents the proportion of times my dice fell on a particular number during six hundred trials.

Code to help with this (messy).

# I came up with a coordinate system to label the dice
# The params phi and theta are reversed w.r.t. polar convention
# phi is pi - phi w.r.t. the polar convention
vertices <- data.table(
	r = 1, theta = c(0, 0, 2 * pi * (0:9)/10),
	phi = c(pi/2, -pi/2, rep(c(-atan(0.5), atan(0.5)), 5)) + pi/2,
	label = letters[1:12])

dice_map <- list(
	"1" = "ihj", "2" = "dfe", "3" = "kjl", "4" = "beg",
	"5" = "gfh", "6" = "bck", "7" = "ahj", "8" = "cdl",
	"9" = "bki", "10" = "adl", "11" = "bgi", "12" = "adf",
	"13" = "gih", "14" = "bce", "15" = "afh", "16" = "ckl",
	"17" = "ajl", "18" = "egf", "19" = "kij", "20" = "ced")

cart <- function(polar) {
	# theta phi
	return(c(sin(polar[1]) * cos(polar[2]),
		     sin(polar[1]) * sin(polar[2]),

polar <- function(cart) {
	return(c(acos(cart[3]), atan2(cart[2], cart[1])))

my_coords <- function(pol_mine) {
	return(c(pi - pol_mine[2], pol_mine[1]))

arc_length <- function(polar_a, polar_b) {
	return(acos(sum(cart(polar_a) * cart(polar_b))))

query <- function(verts) {
	Reduce(`&`, lapply(verts, function(v) grepl(v, dice_map)))

rd20 <- function(n = 1) replicate(n, {
	z <- rnorm(3); z <- z/sqrt(sum(z^2))

qd20 <- function(polar_coords) {
	dists <- sapply(1:12, function(i) arc_length(
		polar_coords, vertices[i, my_coords(c(theta, phi))]))
	min_three <- sort(dists)[1:3]
	verts <- vertices[which(dists %in% min_three), label]

Changes in Park-Going

… w.r.t. baseline, as a result of the pandemic (as of 23rd Apr 2020). Based on the Google mobility dataset.

Minimal Working Implementation of the Griffin Lim Algorithm

This code is based on the Librosa source on GitHub.

import numpy as np
from scipy.signal import stft, istft
from scipy.io import wavfile as wav

_, audio = wav.read('audio.wav')

stft_of_audio = stft(audio)[2]
st_spectrum = np.square(np.abs(stft_of_audio))

angles = np.empty(st_spectrum.shape, dtype=np.complex64)
angles[:] = np.exp(2j * np.pi * np.random.rand(*st_spectrum.shape)) # angles[:] = 1.0

stft_recon = 0.; momentum = 0.99
for _ in range(2000):
	stft_recon_prv = stft_recon
	signal_recon = istft(st_spectrum * angles)[1]
	stft_recon = stft(signal_recon)[2]
	angles[:] = stft_recon - (momentum / (1 + momentum)) * stft_recon_prv
	angles[:] /= np.abs(angles) + 1e-16

audio_recon = np.array(signal_recon, dtype = np.int16)
audio_recon = np.repeat(audio_recon, 2).reshape(-1, 2)
wav.write("audio.wav", 22050, audio_recon)



I used to do a fair bit of astrophotography in university - it’s harder to find good skies now living in the city. Here are some of my old pictures. I’ve kept making rookie mistakes (too much ISO, not much exposure time, using a slow lens, bad stacking, …), for that I apologize!

1 min read

Probabilistic PCA

I’ve been reading about PPCA, and this post summarizes my understanding of it. I took a lot of this from Pattern Recognition and Machine Learning by Bishop.

1 min read

Modelling with Spotify Data

The main objective of this post was just to write about my typical workflow and views rather than come up with a great model. The structure of this data is also outside my immediate domain so I thought it’d be fun to write up a small diary on making a model with it.

5 min read

Morphing with GPs

The main aim here was to morph space inside a square but such that the transformation preserves some kind of ordering of the points. I wanted to use it to generate some random graphs on a flat surface and introduce spatial deformation to make the graphs more interesting.

2 min read

SEIR Models

I had a go at a few SEIR models, this is a rough diary of the process.

4 min read

Speech Synthesis

The initial aim here was to model speech samples as realizations of a Gaussian process with some appropriate covariance function, by conditioning on the spectrogram. I fit a spectral mixture kernel to segments of audio data and concatenated the segments to obtain the full waveform. Partway into writing efficient sampling code (generating waveforms using the Gaussian process state space representation), I realized that it’s actually quite easy to obtain waveforms if you’ve already got a spectrogram.

5 min read

Efficient Gaussian Process Computation

I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.

4 min read
Back to Top ↑


Gaussian Process Middle C

First of my experiments on audio modeling using Gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.

1 min read
Back to Top ↑


Back to Top ↑