Random Stuff

Many models, particularly those that generate multi-modal posteriors are unidentifiable. Unidentifiability happens when different sets of parameters can produce models whose observations look the same.

Differencing in autoregressive models in discrete time \(\Leftrightarrow\) derivatives in continuous time.

Weakly informative priors are better than non-informative ones as they constrain the space enough so that an algorithm searching for the posterior will find it quickly. Strong priors can add an incredible amount of information to a well specified model.

The bootstrap constructs samples from an ECDF - transformations of these samples are very good estimators on practice. The fact that these are better than normal approximations can be proved using Hermite expansions.

The Jacobian are extremely important in the context of probabilistic transformations.

$$ if \;\; {\bf g}: \underline Y \rightarrow \underline X \;\; st \;\; {\bf g}^{-1}: \underline X \rightarrow \underline Y \;\; then \;\; f_{\underline Y} = f_{\underline X} ({\bf g}^{-1}) \left| \frac{\partial x_i}{\partial y_j} \right| $$

\(ln \Gamma \) has a left skew whereas \(ln lgN \) is symmetric

Multivariate normal graphs are factorized; i.e. the precision matrix can be sparse, particularly (I suspect) in GPs with a monotonically decreasing covariance w.r.t. radius.

One way to prove the distribution of SDEs is to solve the Fokker-Planck equation.

The Fourier transform of a covariance function yeilds the spectral density.

Fourier smoothing can be used to approximate seasonality (i.e. pick high powered frequencies on a spectrogram and fit the corresponding Fourier series to the seasonality series). This can get tricky because the spectrogram usually generates additional peaks at integer multiples of the frequency of any component signal.

Inference about the number of experiments of a binomial distribution is ridiculously hard and the MLE is usually useless.

Change of measure: the idea is to model the expected probabilistic behavior of a system based on the behavior of another.

Polynomial interpolation (i.e. Lagrange polynomials) can have messy oscillation in higher degrees. This is Runge’s phenomenon and is a similar occurrence to the Gibbs phenomenon in Fourier series on steps.

Independence of cumulants is characteristic of the Normal, and in general, cumulants are not independent.

When probing non-monotonic transformations of random variables, one can use either Monte Carlo simulation or methods in uncertainty quantification (which break up the function into orthogonal components and calculate moments of an expected target distribution).

Non-Gaussian stochastic processes can tend to normality in certain scenarios (sometimes due to the CLT).

Maxent modeling - if just particular facts are known about a scenario, a joint distribution can be found with maximum entropy that satisfies the given constraints. The exponential family is an example of a family of distributions with differing constraints. Moreover, the exponential family is characterized by sufficient statistics with non-increasing lengths w.r.t. the data, hence the “really random”-ness.

Treating GPs as latent variables acting on an observable is amazing. One can even solve differential equations using such an approach, particularly when working with good samplers or optimizers, e.g. HMC with Stan and Adam in Tensorflow.

It’s probably impossible to represent monotonic functions and positive functions using a basis function representation (on the euclidean space) because the space of monotonoic and/or positive functions isn’t closed under linear transformations, so it cannot be a subspace endowed with a basis! (an awesome coworker shared this thought) However, there can exist bases (e.g. using the Aitchison geometry) on simplexes, etc. - which aren’t euclidean spaces.

Gaussian process on spheres are easy to simulate (e.g. by using arc lengths instead of the usual distance metric in the kernel).

Stan code for simple radial basis function kernel interpolation of the form:

    matrix dist_maker(vector x,
                      row_vector knots,
                      real lscale) {
        int num_data = num_elements(x);
        int num_knot = num_elements(knots);
        matrix[num_data, num_knot] distances;
        return rep_matrix(x/lscale, num_knot) -
               rep_matrix(knots/lscale, num_data);
    vector s(vector x,
             row_vector knots,
             vector params,
             real inter,
             real lscale) {
        return inter + exp(-square(dist_maker(x, knots, lscale))) * params;

It’s really fast, but can only handle one dimensional vectors and the choice of knots is manual for now.

A while ago, I was experimenting with principal and independent component analyses. The ICA is amazing, I do not have the original garbled files and I don’t remember where I got them from, but I managed to “unmix” the streams easily:

ICA Code
 mix1 <- readWave("./mixedX.wav")
 mix2 <- readWave("./mixedY.wav")
 wave1 <- mix1@left
 wave2 <- mix2@left
 ica <- fastICA(data.frame(x = wave1, y = wave2), 2, method = "R", maxit = 250, tol = 1e-50, verbose = TRUE)
 writeWave(Wave(ica$S[,1], samp.rate = 32000, bit = 16, pcm = TRUE), "./seperatedX.wav")
 writeWave(Wave(ica$S[,2], samp.rate = 32000, bit = 16, pcm = TRUE), "./seperatedY.wav")

Separated Sample:

I’ve also been experimenting with inference of Bayesian Networks - if the graph is multivariate normal, it’s possible to work out conditional independencies by doing a bunch of hypothesis tests to figure out zero partial autocorrelations, because these determine the conditional independencies on the graph.


Back to Top ↑


Back to Top ↑