Many models, particularly those that generate multi-modal posteriors are unidentifiable. Unidentifiability happens when different sets of parameters can produce models whose observations look the same.
Weakly informative priors are better than non-informative ones as they constrain the space enough so that an algorithm searching for the posterior will find it quickly. Strong priors can add an incredible amount of information to a well specified model.
The bootstrap is related to the idea of sampling from an ECDF - transformations of these samples are good estimators for transformations of corresponding random variables. The fact that these are better than normal approximations can be proved using Hermite expansions.
The Jacobian is extremely important in the context of probabilistic transformations.
One can produce multimodal distributions using monotonic transformations of r.v.s
\(ln \Gamma \) has a left skew whereas \(ln lgN \) is symmetric.
Multivariate normal graphs are factorized; i.e. the precision matrix (containing partial autocovariances) can be sparse. One may sometimes also work out the direction of dependance between nodes on such graphs.
One way to prove the distribution of SDEs is to solve the Fokker-Planck equation.
The Fourier transform of a covariance function yeilds the spectral density.
Fourier smoothing can be used to approximate seasonality (i.e. pick high powered frequencies on a spectrogram and fit corresponding periodic curves to the seasonality series). This can get tricky because the spectrogram usually generates additional peaks at integer multiples of the frequency of any component signal.
Inference about the number of experiments of a binomial distribution is very hard.
Change of measure: the idea is to model the expected probabilistic behavior of a system based on the behavior of another.
Polynomial interpolation (i.e. Lagrange polynomials) can have messy oscillation in higher degrees. This is Runge’s phenomenon and is a similar occurrence to the Gibbs phenomenon in Fourier series on steps.
Independence of cumulants is characteristic of the Normal, and in general, cumulants are not independent.
When probing non-monotonic transformations of random variables, one can use either Monte Carlo simulation or methods in uncertainty quantification (which break up the function into orthogonal components and use this form to calculate moments of an expected target distribution).
Non-Gaussian stochastic processes can tend to normality in certain scenarios (sometimes due to the CLT).
Maxent modeling - if just particular facts are known about a scenario, a joint distribution can be found with maximum entropy that satisfies the given constraints (this is apparently a calculus of variations problem). The exponential family is an example of a maxent family of distributions with differing constraints. Moreover, the exponential family is characterized by sufficient statistics with non-increasing lengths w.r.t. the data, hence the “really random”-ness.
Treating GPs as latent variables acting on an observable is amazing. One can even solve differential equations using such an approach, particularly when working with good samplers or optimizers, e.g. HMC with Stan and Adam in Tensorflow.
It’s probably impossible to represent monotonic functions and positive functions using a basis function representation (on the euclidean space) because the space of monotonoic and/or positive functions isn’t closed under linear transformations, so it cannot be a subspace endowed with a basis! However, there can exist bases (e.g. using the Aitchison geometry) on simplexes, etc. - which aren’t euclidean spaces and the operators defined in these spaces are quite weird.
Gaussian process on spheres are easy to simulate (e.g. by using arc lengths instead of the usual distance metric in the kernel).
Stan code for simple radial basis function kernel interpolation of the form:
It’s fast, but can only handle one dimensional vectors and the choice of knots is manual for now.
A while ago, I was experimenting with principal and independent component analyses. The ICA is amazing, I do not have the original garbled files and I don’t remember where I got them from, but I managed to “unmix” the streams easily:
I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.
Minimal Working Examples
Very untidy first working draft of the idea mentioned on the efficient computation page. Here, I fit a spectral mixture to some audio data to build a “generative model” for audio. I’ll implement efficient sampling later, and I’ll replace the arbitrary way this is trained with an LSTM-RNN to go straight from text/spectrograms to waveforms.
First of my experiments on audio modelling using gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.
… using Stan & HMC