## Exact Densities for Non-Monotonic Transformations (draft)

This post was inspired by a late night epiphany that it should be possible to treat non-monotonic functions as piecewise monotonic and use the change of variables technique to obtain densities… and lo, you can:

If two random variables \(X\) and \(Y\) are related as \(Y = g(X)\), where \(g\) is a differentiable non-monotonic (but piecewise monotonic) function, and if the density of \(X\) is known to be \(p_X\), then the density of \(Y\) can be expressed as:

… where the sum is over all values \(x_i\) in the preimage set corresponding to \(y\).

References:

### Proof of the Exact Density

From the KMT book, one (handwavy way) to prove the exact result is as follows:

For any given length \(y + \delta y\), there are multiple lengths in the preimage of the function\(g\), here, \(x_1 + \delta x_1, x_2 + \delta x_2, x_3 + \delta x_3\). Note that \(\delta x_2\) is negative.

Noting that \(x = g^{-1}(y)\) and that the second derivative term in the equation above is negative, the solution follows.

## R Code for the Graph

The great thing deal about this method is that one has the density outright - this can then be fed into other MCMC samplers in combination with more complex models (which wouldn’t be the case with plain Monte Carlo).

The preimage set can be a pain to obtain, but one possible way is to get it is by fitting a spline to the function \(g\), breaking it up into monotonic pieces and inverting them. Both the smoothing spline and GPs are suitable here as they are differentiable and derivative values can be obtained explicitly.

### Simulation

The function in this demonstration is:

Obtaining the density of \(Y\) and using MCMC to sample from it gives us a sample from \(Y\) but we could’ve used Monte Carlo simulation to obtain samples from the distribution as well. Comparing the two methods results in this QQ plot:

## MCMC Code

… it works!

## The More Interesting Problem

We can use this methodology to answer a really interesting question: * Is it possible to sample from \(X\) if the distribution of \(Y\) is known instead?* (Yep!)

My first attempt before seeing the result displayed on this page was to run the Metropolis-Hastings by using just a single term in the density of \(X\), which was the density of \(Y\) multiplied its Jacobian at that point and surprisingly, this gives very good results anyway - saving the need to calculate the set of inverse values. Perhaps this is due to some kind of an averaging effect that the MCMC produces.

There’s a(n inelegant but more or less a correct way) of solving this problem. The density of \(Y\) for any \(X\) is known, and so is derivative information, so for every \(x\), we can calculate a \(y = g(x)\) and then invert the function - i.e. calculate the set \( \{ x_i: g(x_i) = y \} \). By doing this, one can obtain a set of linear equations for the density of \(X\) in terms of the density of \(Y\):

The way I’ve modeled this is by treating the CDF corresponding to the density \(p_X\) as an unknown function - a GP, with start and endpoints forced to \(0, 1\) and its derivative (the density, which will also be a GP) forced to be positive. I then use a fine grid on \(X\) and construct a large matrix of linear combinations according to the equation above:

… and set the observed difference between the LHS and RHS to zero.

I’m sure there’s a more elegant way of doing this, but the main takeaway is that this is possible. In more complex models, we now know exactly how to change a likelihood / loss function where dependencies between parameters is known beforehand.

Here’s a simulation; we find a density for \(X\) such that the density of \(Y\) is uniform over its range. The function is the same as the one shown above. Results:

## mat_worker_two.py

## gp_density.R

We can use this method to set informative priors in situations where information is available about a complex transformation of the prior and not the prior itself. Perhaps this method can be used to obtain solutions for Bayesian inverse problems as well.

*If you see any errors in the post or if you’d like to comment, please create an issue on GitHub.*

## 2020

### Efficient Gaussian Process Computation

I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.

### Gaussian Process Speech Synthesis (Draft)

Very untidy first working draft of the idea mentioned on the efficient computation page. Here, I fit a spectral mixture to some audio data to build a “generative model” for audio. I’ll implement efficient sampling later, and I’ll replace the arbitrary way this is trained with an LSTM-RNN to go straight from text/spectrograms to waveforms.

## 2019

### Random Projects

# Random Projects

### Gaussian Process Middle C

First of my experiments on audio modelling using gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.

### An Ising-Like Model

## … using Stan & HMC

### Sparse Gaussian Process Examples

## A Minimal Working Example

### Random Stuff

## Random Stuff

### Stochastic Bernoulli Probabilities

Consider: