Probability & Sigma Algebras

This was my totally non-rigorous take on the definition of probabilities when I was a student.

Probability today is a (mathematically) well defined object, after a lot of struggle in the late 19th and early 20th century. Later, subjective forms of (precursors to Bayesian) probability came up with axioms that could result in the modern accepted definition. Due to Kolmogorov, this is roughly:

If \(\Omega\) is a set of events and \(\Sigma\) is a bunch of interesting statements we can pose about the events, then, \(\mathcal P\), a probability measure, gives us a degree of plausibility for each interesting statement we may pose.

\(\mathcal P\) obeys the following rules:

  1. \(\mathcal P(statement)\) is either 0 or more than 0.
  2. \(\mathcal P(\Omega)\) is 1 (so the probability that something - anything - from \(\Omega\) happens is 1).
  3. If two events, \(A,\;B\), can’t both happen, then \(P(A\;or\;B\;happening) = P(A) + P(B)\)

Notice that the probability measure acts on the set of interesting statements. This set of statements needs to be defined on a set of events \(\Omega\) and must obey these rules:

  1. Nothing happening (\(\emptyset = \Omega^c\)) and anything happening are both in the set of statements.
  2. If “event \(A\) happening” is in the set of statements, then “event \(A\) not happening” should be as well. This is because, if \(\mathcal P(A) = a\), the set \(A\)’s complement needs to exist for us to define \(\mathcal P(A^c) = 1 - a\).
  3. If two events \(A, B\) are in the set of statements, then we must be able to ask, what is the probability of A or B happening - there is a need for consistency. So, the union \(A \cup B\) must also be in the set of statements.

Such a set of statements is called a \(\sigma\)-algebra. The probability measure is defined on the set of events and the statements we can pose about the events. The probability measure doesn’t know how to handle any statement not defined in the set of statements, so the probability of any undefined statements is meaningless.

Random Variables

For convenience, we usually work with functions called random variables \(X:\Omega \rightarrow \mathbb R\) which are defined on a tuple - the set of events and the set of statements - called a measurable space. Sometimes, we replace \(\mathbb R\) with some other set and call the function a “random element” instead. We do this, in part, for convenience e.g. \(\\{"head", "tail" \\} \rightarrow \\{ 0, 1 \\}\).

Random variables are usually the basis for defining “probability/model laws”, which are a set of statements about the assumptions in a model.


A density is a function whose integral, with respect to a base measure (which ultimately ties in with how you measure things irl) produces a probability measure of a particular set.

The base measure is interesting, because it can inform how we measure things, e.g. we can count sets of zero lebesgue measure using dirac deltas or counting measures (common if our event space is discrete), we can use lengths of intervals (where the lebesgue measure comes into play - when we have a real event space) or perhaps a mixture of those two circumstances. We may even want to measure the size of sets w.r.t. to the cantor set in a pathological world.

Non-Measurable Spaces

Measurable spaces are a big deal because you’d run into major problems defining measures on non-measurable spaces. Famous (and amazing) examples include the Vitali Set and the Banach-Tarski Paradox. It’s interesting that the construction of these sets always relies on the axiom of choice; most mathematicians are comfortable accepting the existence of sets which have no measure than abandoning the axiom of choice. Here’s a cool video on that by PBS Infinite Series & Kelsey Houston-Edwards.


Efficient Gaussian Process Computation

I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.

4 min read

Gaussian Processes in MGCV

I lay out the canonical GP interpretation of MGCV’s GAM parameters here. Prof. Wood updated the package with stationary GP smooths after a request. I’ve run through the predict.gam source code in a debugger, and mainly, the computation of predictions follows:

~1 min read


I wanted to see how easy it was to do photogrammetry (create 3d models using photos) using PyTorch3D by Facebook AI Research.

1 min read

Dead Code & Syntax Trees

This post was motivated by some R code that I came across (over a thousand lines of it) with a bunch of if-statements that were never called. I wanted an automatic way to get a minimal reproducing example of a test from this file. While reading about how to do this, I came across Dead Code Elimination, which kills unused and unreachable code and variables as an example.

~1 min read
Back to Top ↑



I used to do a fair bit of astrophotography in university - it’s harder to find good skies now living in the city. Here are some of my old pictures. I’ve kept making rookie mistakes (too much ISO, not much exposure time, using a slow lens, bad stacking, …), for that I apologize!

1 min read

Probabilistic PCA

I’ve been reading about PPCA, and this post summarizes my understanding of it. I took a lot of this from Pattern Recognition and Machine Learning by Bishop.

1 min read

Modelling with Spotify Data

The main objective of this post was just to write about my typical workflow and views rather than come up with a great model. The structure of this data is also outside my immediate domain so I thought it’d be fun to write up a small diary on making a model with it.

5 min read

Morphing with GPs

The main aim here was to morph space inside a square but such that the transformation preserves some kind of ordering of the points. I wanted to use it to generate some random graphs on a flat surface and introduce spatial deformation to make the graphs more interesting.

1 min read

SEIR Models

I had a go at a few SEIR models, this is a rough diary of the process.

3 min read

Speech Synthesis

The initial aim here was to model speech samples as realizations of a Gaussian process with some appropriate covariance function, by conditioning on the spectrogram. I fit a spectral mixture kernel to segments of audio data and concatenated the segments to obtain the full waveform. Partway into writing efficient sampling code (generating waveforms using the Gaussian process state space representation), I realized that it’s actually quite easy to obtain waveforms if you’ve already got a spectrogram.

4 min read
Back to Top ↑


Gaussian Process Middle C

First of my experiments on audio modeling using Gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.

~1 min read
Back to Top ↑


Back to Top ↑