Very untidy first working draft of the idea mentioned on the efficient computation page. Here, I fit a spectral mixture to some audio data to build a “generative model” for audio. I’ll implement efficient sampling later, and I’ll replace the arbitrary way this is trained with an LSTM-RNN to go straight from text/spectrograms to waveforms.

The idea though, is that we fit a Gaussian process to some waveforms in the frequency domain rather than the time domain. This also showcases the power of SymPy - to get the analytical form of the spectral densities using the covariance function, I use SymPy to do the Fourier transform. This is original work - I drew from Richard Turner’s thesis for the amplitude demodulation part, Andrew Gordon Wilson’s papers for the spectral kernel and the LJSpeech dataset for the speech data. This was a learning exercise - it’s not meant to be groundbreaking, however, a motivation is to create a simple model for speech that should give reasonable results on modest hardware.

Synthesized audio sample:

Currently, this code takes in an audio file, fits piecewise stationary GPs to each 5ms block and synthesizes a waveform from these GPs. Earlier commits of this page may have interesting bits of code written and discarded during the initial write-up (e.g. tests). My LSTM-RNN code (that doesn’t really work yet) can be found on github: infprobscix/gpsynth (still working on this).

# Code

Imports and inits:

Now, I define a spectrum function (mainly to learn how it works). An estimate of the specral density of a process is the absolute value squared of the DFT of the process.

Here, I define a function that takes in a multivariate normal covariance matrix, and returns a conditional covariance matrix of the process, if the condition were that the start and end of the process were exactly at zero. This is mainly for convenience as later, I synthesize the speech process in stationary blocks. Concatenating the blocks would make it discontinuous, hence this conditioning. With this, the synthesis can be parallelized but this probably introduces artifacts in the spectrum of the full process.

This function implements an autoregressive process. If a GP is too long to be synthesized in one go, this function breaks up the sequence into blocks and conditions the next block’s distribution on the last block.

The main program starts here. First, read the (LJSpeech) data.

Model the audio as a product of an envelope as in Richard Turner’s thesis and a stationary part that contains the frequencies. This part wasn’t strictly necessary as the inference procedure can fit the standard deviation of the process in each 5ms block.

Split the audio into 5ms blocks.

Calculate the Fourier transform of the theoretical covariance function symbolically (which is the symbolic spectral density). Also remember, Fourier transforms are linear operators, so the FT of a sum of covariances is a sum of spectral densities. Here, we fit a 24 component spectral mixture kernel.

Humans can’t hear all frequencies equally, and roughly, within some bands of frequencies, we can’t tell apart two similar frequencies. Also, some (0.5-5kHz are way more important than other frequencies). Interestingly, in the presence of strong frequencies, lower power frequencies may not be heard - this is a consideration for the future. Here, we initiate the spectral mixture within the bands that the psychoacoustics community has suggested as one of the older ways to define the bands.

Initiate the variables. `freq_obs`

is a vector of observed frequencies, `spec_obs`

are the observed spectra corresponding to these frequencies. `p_ps`

are the periodicity parameters, `l_ps`

are the lengthscale parameters and `s_ps`

are the scale parameters of the spectral kernel:

Vectorize loss.

Optimize.

Synthesize.

## 2020

### Efficient Gaussian Process Computation

I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.

### Sparse Gaussian Process Examples

## Minimal Working Examples

### Gaussian Process Speech Synthesis (Draft)

Very untidy first working draft of the idea mentioned on the efficient computation page. Here, I fit a spectral mixture to some audio data to build a “generative model” for audio. I’ll implement efficient sampling later, and I’ll replace the arbitrary way this is trained with an LSTM-RNN to go straight from text/spectrograms to waveforms.

## 2019

### Random Projects

# Random Projects

### Gaussian Process Middle C

First of my experiments on audio modelling using gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.

### An Ising-Like Model

## … using Stan & HMC

### Random Stuff

## Random Stuff

### Stochastic Bernoulli Probabilities

Consider: