Exact Densities for Non-Monotonic Transformations (draft)
This post was inspired by a late night thought that it should be possible to treat non-monotonic functions as piecewise monotonic and use the change of variables technique to obtain densities… and lo, you can:
If two random variables and are related as , where is a differentiable non-monotonic (but piecewise monotonic) function, and if the density of is known to be , then the density of can be expressed as:
… where the sum is over all values in the preimage set corresponding to .
From the KMT book, one (handwavy way) to prove the exact result is as follows:
For any given length , there are multiple lengths in the preimage of the function, here, . Note that is negative.
Noting that and that the second derivative term in the equation above is negative, the solution follows.
R Code for the Graph
The great thing deal about this method is that one has the density outright - this can then be fed into other MCMC samplers in combination with more complex models (which wouldn’t be the case with plain Monte Carlo).
The preimage set can be a pain to obtain, but one possible way is to get it is by fitting a spline to the function , breaking it up into monotonic pieces and inverting them. Both the smoothing spline and GPs are suitable here as they are differentiable and derivative values can be obtained explicitly.
The function in this demonstration is:
Obtaining the density of and using MCMC to sample from it gives us a sample from but we could’ve used Monte Carlo simulation to obtain samples from the distribution as well. Comparing the two methods results in this QQ plot:
… it works!
The More Interesting Problem
We can use this methodology to answer a really interesting question: Is it possible to sample from if the distribution of is known instead? (Yep!)
My first attempt before seeing the result displayed on this page was to run the Metropolis-Hastings by using just a single term in the density of , which was the density of multiplied its Jacobian at that point and surprisingly, this gives very good results anyway - saving the need to calculate the set of inverse values. Perhaps this is due to some kind of an averaging effect that the MCMC produces.
There’s a(n inelegant but more or less a correct way) of solving this problem. The density of for any is known, and so is derivative information, so for every , we can calculate a and then invert the function - i.e. calculate the set . By doing this, one can obtain a set of linear equations for the density of in terms of the density of :
The way I’ve modeled this is by treating the CDF corresponding to the density as an unknown function - a GP, with start and endpoints forced to and its derivative (the density, which will also be a GP) forced to be positive. I then use a fine grid on and construct a large matrix of linear combinations according to the equation above:
… and set the observed difference between the LHS and RHS to zero.
I’m sure there’s a more elegant way of doing this, but the main takeaway is that this is possible. In more complex models, we now know exactly how to change a likelihood / loss function where dependencies between parameters is known beforehand.
Here’s a simulation; we find a density for such that the density of is uniform over its range. The function is the same as the one shown above. Results:
We can use this method to set informative priors in situations where information is available about a complex transformation of the prior and not the prior itself. Perhaps this method can be used to obtain solutions for Bayesian inverse problems as well.
The initial aim here was to model speech samples as realizations of a Gaussian process with some appropriate covariance function, by conditioning on the spectrogram. I fit a spectral mixture kernel to segments of audio data and concatenated the segments to obtain the full waveform. Partway into writing efficient sampling code (generating waveforms using the Gaussian process state space representation), I realized that it’s actually quite easy to obtain waveforms if you’ve already got a spectrogram.
I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.