Random Stuff

For dealing with road/city networks, refer to Geoff Boeing’s blog and his amazing python package OSMnx. Go to Shapely for manipulation of line segments and other objects in python, networkx for networks in python and igraph for networks in R.

Procedural generation of road networks is normally based on tenor fields but simpler algorithms involving randomly growing roads with an equal probability of splitting into a T-junction or branching a road to the side also make for good results. In particular, I love the work of Oleg Dolya on his Village generator and his twitter post with this graphic:

Selenium Webdriver is a framework that can help you automate browser-related activities (e.g. for testing, scraping). You can emulate user behavior, screenshot, etc. First install selenium on a computer and then the language bindings (python bindings are available).

Phase is important in the perception of speech quality. For stationary GPs, I think I read that phase (angle) might just be uniformly distributed on a circle. For some speech data though, you see some weird non-uniform histograms with differing levels of variation by frequency particularly for frequencies where the magnitude is high. Also slight lag-one partial autocorrelations across time.

Many real life networks are fat tailed (in terms of the degree distribution).

Causality can be studied from observational data by the assumption of a graph. For example, in the additive noise model, I believe that we assume a graph that looks like \(X \rightarrow Y\) if \(Y = f(X) + \epsilon\) or vice versa. The model is unidentifiable when f is linear and \(\epsilon\) is gaussian.

Information criteria aren’t a good choice while optimizing hyperparameters of certain nonparametric models as they assume that data is conditionally independent given the parameters.

The error term of a statistical model can be (for example a GP) - the aim here is to account for the unmodelled dynamics of a problem and reduce confounding between those and the core model.

Griffin-Lim doesn’t do so well with whispered audio.

Something I was pointed to today: Regression Dilution. Noise in a covariate causes a bias of the slope towards zero. The covariate is dispersed without a change in y values, which causes the regression line to flatten.

An interesting statement: “neural networks use finitely many highly adaptive basis functions whereas gaussian processes typically use infinitely many fixed basis functions” - paraphrased from Wilson et al. 2015, based on work by MacKay, Neal and others.

Many models, particularly those that generate multi-modal posteriors are unidentifiable. Unidentifiability happens when different sets of parameters can produce models whose observations look the same.

Weakly informative priors are better than non-informative ones as they constrain the space enough so that an algorithm searching for the posterior will find it quickly. Strong priors can add an incredible amount of information to a well specified model.

The bootstrap is related to the idea of sampling from an ECDF - transformations of these samples are good estimators for transformations of corresponding random variables. The fact that these are better than normal approximations can be proved using Hermite expansions.

The Jacobian is extremely important in the context of probabilistic transformations.

\[if \;\; {\bf g}: \underline Y \rightarrow \underline X \;\; st \;\; {\bf g}^{-1}: \underline X \rightarrow \underline Y \;\; then \;\; f_{\underline Y} = f_{\underline X} ({\bf g}^{-1}) \left| \frac{\partial x_i}{\partial y_j} \right|\]

One can produce multimodal distributions using monotonic transformations of r.v.s

\(ln \Gamma\) has a left skew whereas \(ln lgN\) is symmetric.

Multivariate normal graphs are factorized; i.e. the precision matrix (containing partial autocovariances) can be sparse. One may sometimes also work out the direction of dependance between nodes on such graphs.

One way to prove the distribution of SDEs is to solve the Fokker-Planck equation.

The Fourier transform of a covariance function yeilds the spectral density.

Fourier smoothing can be used to approximate seasonality (i.e. pick high powered frequencies on a spectrogram and fit corresponding periodic curves to the seasonality series). This can get tricky because the spectrogram usually generates additional peaks at integer multiples of the frequency of any component signal.

Inference about the number of experiments of a binomial distribution is very hard.

Change of measure: the idea is to model the expected probabilistic behavior of a system based on the behavior of another.

Polynomial interpolation (i.e. Lagrange polynomials) can have messy oscillation in higher degrees. This is Runge’s phenomenon and is a similar occurrence to the Gibbs phenomenon in Fourier series on steps.

Independence of cumulants is characteristic of the Normal, and in general, cumulants are not independent.

When probing non-monotonic transformations of random variables, one can use either Monte Carlo simulation or methods in uncertainty quantification (which break up the function into orthogonal components and use this form to calculate moments of an expected target distribution).

Non-Gaussian stochastic processes can tend to normality in certain scenarios (sometimes due to the CLT).

Maxent modeling - if just particular facts are known about a scenario, a joint distribution can be found with maximum entropy that satisfies the given constraints (this is apparently a calculus of variations problem). The exponential family is an example of a maxent family of distributions with differing constraints. Moreover, the exponential family is characterized by sufficient statistics with non-increasing lengths w.r.t. the data, hence the “really random”-ness.

Treating GPs as latent variables acting on an observable is amazing. One can even solve differential equations using such an approach, particularly when working with good samplers or optimizers, e.g. HMC with Stan and Adam in Tensorflow.

It’s probably impossible to represent monotonic functions and positive functions using a basis function representation (on the euclidean space) because the space of monotonoic and/or positive functions isn’t closed under linear transformations, so it cannot be a subspace endowed with a basis! However, there can exist bases (e.g. using the Aitchison geometry) on simplexes, etc. - which aren’t euclidean spaces and the operators defined in these spaces are quite weird.

Gaussian process on spheres are easy to simulate (e.g. by using arc lengths instead of the usual distance metric in the kernel).

Stan code for simple radial basis function kernel interpolation of the form:

\[f(x) = \sum_i \alpha_i e^{-(x - x_i)^2}\]
    matrix dist_maker(vector x,
                      row_vector knots,
                      real lscale) {
        int num_data = num_elements(x);
        int num_knot = num_elements(knots);
        matrix[num_data, num_knot] distances;
        return rep_matrix(x/lscale, num_knot) -
               rep_matrix(knots/lscale, num_data);
    vector s(vector x,
             row_vector knots,
             vector params,
             real inter,
             real lscale) {
        return inter + exp(-square(dist_maker(x, knots, lscale))) * params;

It’s fast, but can only handle one dimensional vectors and the choice of knots is manual for now.

A while ago, I was experimenting with principal and independent component analyses. The ICA is amazing, I do not have the original garbled files and I don’t remember where I got them from, but I managed to “unmix” the streams easily:

ICA Code
 mix1 <- readWave("./mixedX.wav")
 mix2 <- readWave("./mixedY.wav")
 wave1 <- mix1@left
 wave2 <- mix2@left
 ica <- fastICA(data.frame(x = wave1, y = wave2), 2, method = "R", maxit = 250, tol = 1e-50, verbose = TRUE)
 writeWave(Wave(ica$S[,1], samp.rate = 32000, bit = 16, pcm = TRUE), "./seperatedX.wav")
 writeWave(Wave(ica$S[,2], samp.rate = 32000, bit = 16, pcm = TRUE), "./seperatedY.wav")

Separated Sample:

Cauchy processes can have spikes between ordinary-looking segments (as opposed to GPs which are (smoother?); you can’t have jumps with some probability all along the process, as they get integrated/smoothened out).

Insight and simple models go a long way.

I guess that many philosophies of probability share the same idea at the big-picture level. Subjectivist probabilities that humans conceive of might be related to mechanistic probabilities through human knowledge of the underlying mechanisms giving rise to the probabilities. I’m probably talking nonsense. For practical usage though, it doesn’t matter as anything that follows the Kolmogorov axioms can utilize the probability framework.

Figuring out periodicities in GPs can be difficult due to local minima in the likelihood.

To perform efficient MCMC for a r.v. which is a deterministic transform of other r.v.s such that you capture the tail of a distribution well, look up subset simulation.

GPs with sums of covariance functions can be ‘disaggregated’ pretty easily - just write out the joint covariance of Z, Y where Z = X + Y.


Efficient Gaussian Process Computation

I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.

4 min read

Gaussian Processes in MGCV

I lay out the canonical GP interpretation of MGCV’s GAM parameters here. Prof. Wood updated the package with stationary GP smooths after a request. I’ve run through the predict.gam source code in a debugger, and mainly, the computation of predictions follows:

~1 min read


I wanted to see how easy it was to do photogrammetry (create 3d models using photos) using PyTorch3D by Facebook AI Research.

1 min read

Dead Code & Syntax Trees

This post was motivated by some R code that I came across (over a thousand lines of it) with a bunch of if-statements that were never called. I wanted an automatic way to get a minimal reproducing example of a test from this file. While reading about how to do this, I came across Dead Code Elimination, which kills unused and unreachable code and variables as an example.

~1 min read
Back to Top ↑



I used to do a fair bit of astrophotography in university - it’s harder to find good skies now living in the city. Here are some of my old pictures. I’ve kept making rookie mistakes (too much ISO, not much exposure time, using a slow lens, bad stacking, …), for that I apologize!

1 min read

Probabilistic PCA

I’ve been reading about PPCA, and this post summarizes my understanding of it. I took a lot of this from Pattern Recognition and Machine Learning by Bishop.

1 min read

Modelling with Spotify Data

The main objective of this post was just to write about my typical workflow and views rather than come up with a great model. The structure of this data is also outside my immediate domain so I thought it’d be fun to write up a small diary on making a model with it.

5 min read

Morphing with GPs

The main aim here was to morph space inside a square but such that the transformation preserves some kind of ordering of the points. I wanted to use it to generate some random graphs on a flat surface and introduce spatial deformation to make the graphs more interesting.

2 min read

SEIR Models

I had a go at a few SEIR models, this is a rough diary of the process.

4 min read

Speech Synthesis

The initial aim here was to model speech samples as realizations of a Gaussian process with some appropriate covariance function, by conditioning on the spectrogram. I fit a spectral mixture kernel to segments of audio data and concatenated the segments to obtain the full waveform. Partway into writing efficient sampling code (generating waveforms using the Gaussian process state space representation), I realized that it’s actually quite easy to obtain waveforms if you’ve already got a spectrogram.

5 min read
Back to Top ↑


Gaussian Process Middle C

First of my experiments on audio modeling using Gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.

1 min read
Back to Top ↑


Back to Top ↑