First of my experiments on audio modeling using Gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.

I tried using the Gaussian Process Convolution Model and the Spectral Mixture kernel for this, but neither seemed to work well for me (numerical issues with the covariance, likelihood has too many modes respectively). However, I did learn about the vec trick and that sympy handles infinite precision - for more info check out my scicomp question here.

Edit: Looking back at this, I should’ve tried to fit the parameters of a non-stationary spectral kernel using the STFT.

Due to the difficulty in automating the inference for the kernel, I decided to try my hand at writing it out myself. It’s a simple signal, so it was pretty easy.

The signal:

Its sample autocorrelation by time:

Key observations about the signal are:

  1. It’s highly periodic, with the same frequency.
  2. The autocorrelation decays to zero if you go far enough away.
  3. The signal is heteroskedastic, with the variance decreasing over time.
  4. The autocorrelation function seems to change at some point.

I designed a kernel with these characteristics. It may’ve been a total accident (haven’t checked yet), but this works! The result:


Code (it’s a bit messy):

R Code
# a <- tuneR::readWave("middle_c.wav")
# 
# correl_matrix <- function(i) {
#     audio_segment <- a@left[(i*100):(i*100 + 999)]
#     n <- length(audio_segment); lag_max <- 167 # period
#     auto_correl <- acf(audio_segment, lag.max = lag_max, plot = F)
#     auto_correl <- auto_correl$acf[, 1, 1] * n/(n - 0:lag_max) # bias correction
#     return(auto_correl)
# }

# correl <- do.call(rbind, lapply(95:880, correl_matrix))

covar_core <- function(t, bef = T) {
	p <- if(!bef) c(5, 1.7) else c(2, 0.25)
	t <- abs(t)/167
	result <- 2*exp(-p[1]*sin(pi*t)^2)
	result <- result - 1 + p[2]*cos(pi/2 + pi*t)^8
	result <- exp(-t/102) * result
	return(result)
}

sigma_decay <- function(t) 4*t*exp(-(t/3000)) + 2000*(plogis(t/1000) - 0.5)

mixing_weight <- function(t) plogis(t/2500 - 17)

n <- 5000
t <- seq(0, 90000, length.out = n)

diff_mat <- matrix(t, n, n, T) - matrix(t, n, n, F)
core_bef <- covar_core(diff_mat, T)
core_aft <- covar_core(diff_mat, F)
weight <- sqrt(mixing_weight(t))
weight <- matrix(weight, n, n, T) * matrix(weight, n, n, F)

S <- weight*core_aft + (1 - weight)*core_bef
S <- diag(sigma_decay(t)) %*% S %*% diag(sigma_decay(t))
S <- S + diag(n)*1e-5
S_ <- as.matrix(Matrix::nearPD(S)$mat)
L <- chol(S_)

sample <- as.integer(t(L) %*% rnorm(n))
audio <- tuneR::Wave(left = sample, samp.rate = 5000)
writeWave(audio, "sample_from_gp.wav")

2020

Probabilistic PCA

I’ve been reading about PPCA, and this post summarizes my understanding of it. I took a lot of this from Pattern Recognition and Machine Learning by Bishop.

1 min read

Modelling with Spotify Data

The main objective of this post was just to write about my typical workflow and views rather than come up with a great model. The structure of this data is also outside my immediate domain so I thought it’d be fun to write up a small diary on making a model with it.

5 min read

Astrophotography

I used to do a fair bit of astrophotography in university - it’s harder to find good skies now living in the city. Here are some of my old pictures. I’ve kept making rookie mistakes (too much ISO, not much exposure time, using a slow lens, bad stacking, …), for that I apologize!

1 min read

Morphing with GPs

The main aim here was to morph space inside a square but such that the transformation preserves some kind of ordering of the points. I wanted to use it to generate some random graphs on a flat surface and introduce spatial deformation to make the graphs more interesting.

1 min read

SEIR Models

I had a go at a few SEIR models, this is a rough diary of the process.

3 min read

Speech Synthesis

The initial aim here was to model speech samples as realizations of a Gaussian process with some appropriate covariance function, by conditioning on the spectrogram. I fit a spectral mixture kernel to segments of audio data and concatenated the segments to obtain the full waveform. Partway into writing efficient sampling code (generating waveforms using the Gaussian process state space representation), I realized that it’s actually quite easy to obtain waveforms if you’ve already got a spectrogram.

4 min read

Efficient Gaussian Process Computation

I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.

4 min read
Back to Top ↑

2019

Gaussian Process Middle C

First of my experiments on audio modeling using Gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.

1 min read
Back to Top ↑

2018

Back to Top ↑