First of my experiments on audio modelling using gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.

I tried using the Spectral Mixture kernel and the Gaussian Process Convolution Model for this, but neither seemed to work well for me (numerical issues with the covariance, likelihood has too many modes respectively). However, I did learn about the vec trick and that sympy handles infinite precision - for more info check out my scicomp question here.

Due to the difficulty in automating the inference for the kernel, I decided to try my hand at writing it out myself. It’s a simple signal, so it was pretty easy.

The signal:

Its sample autocorrelation by time:

Key observations about the signal are:

  1. It’s highly periodic, with the same frequency.
  2. The autocorrelation decays to zero if you go far enough away.
  3. The signal is heteroskedastic, with the variance decreasing over time.
  4. The autocorrelation function seems to change at some point.

I designed a kernel with these charecteristics. It may’ve been a total accident (haven’t checked yet), but this works! The result:

Code (it’s a bit messy):

R Code
# a <- tuneR::readWave("middle_c.wav")
# correl_matrix <- function(i) {
#     audio_segment <- a@left[(i*100):(i*100 + 999)]
#     n <- length(audio_segment); lag_max <- 167 # period
#     auto_correl <- acf(audio_segment, lag.max = lag_max, plot = F)
#     auto_correl <- auto_correl$acf[, 1, 1] * n/(n - 0:lag_max) # bias correction
#     return(auto_correl)
# }

# correl <-, lapply(95:880, correl_matrix))

covar_core <- function(t, bef = T) {
	p <- if(!bef) c(5, 1.7) else c(2, 0.25)
	t <- abs(t)/167
	result <- 2*exp(-p[1]*sin(pi*t)^2)
	result <- result - 1 + p[2]*cos(pi/2 + pi*t)^8
	result <- exp(-t/102) * result

sigma_decay <- function(t) 4*t*exp(-(t/3000)) + 2000*(plogis(t/1000) - 0.5)

mixing_weight <- function(t) plogis(t/2500 - 17)

n <- 5000
t <- seq(0, 90000, length.out = n)

diff_mat <- matrix(t, n, n, T) - matrix(t, n, n, F)
core_bef <- covar_core(diff_mat, T)
core_aft <- covar_core(diff_mat, F)
weight <- sqrt(mixing_weight(t))
weight <- matrix(weight, n, n, T) * matrix(weight, n, n, F)

S <- weight*core_aft + (1 - weight)*core_bef
S <- diag(sigma_decay(t)) %*% S %*% diag(sigma_decay(t))
S <- S + diag(n)*1e-5
S_ <- as.matrix(Matrix::nearPD(S)$mat)
L <- chol(S_)

sample <- as.integer(t(L) %*% rnorm(n))
audio <- tuneR::Wave(left = sample, samp.rate = 5000)
writeWave(audio, "sample_from_gp.wav")


Efficient Gaussian Process Computation

I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.

4 min read

Gaussian Process Speech Synthesis (Draft)

Very untidy first working draft of the idea mentioned on the efficient computation page. Here, I fit a spectral mixture to some audio data to build a “generative model” for audio. I’ll implement efficient sampling later, and I’ll replace the arbitrary way this is trained with an LSTM-RNN to go straight from text/spectrograms to waveforms.

3 min read
Back to Top ↑


Gaussian Process Middle C

First of my experiments on audio modelling using gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.

1 min read
Back to Top ↑


Back to Top ↑