Probability & Sigma Algebras

This was my totally non-rigorous take on the definition of probabilities when I was a student.

Probability today is a (mathematically) well defined object, after a lot of struggle in the late 19th and early 20th century. Later, subjective forms of (precursors to Bayesian) probability came up with axioms that could result in the modern accepted definition. Due to Kolmogorov, this is roughly:

If \(\Omega\) is a set of events and \(\Sigma\) is a bunch of interesting statements we can pose about the events, then, \(\mathcal P\), a probability measure, gives us a degree of plausibility for each interesting statement we may pose.

\(\mathcal P\) obeys the following rules:

  1. \(\mathcal P(statement)\) is either 0 or more than 0.
  2. \(\mathcal P(\Omega)\) is 1 (so the probability that something - anything - from \(\Omega\) happens is 1).
  3. If two events, \(A,\;B\), can’t both happen, then \(P(A\;or\;B\;happening) = P(A) + P(B)\)

Notice that the probability measure acts on the set of interesting statements. This set of statements needs to be defined on a set of events \(\Omega\) and must obey these rules:

  1. Nothing happening (\(\emptyset = \Omega^c\)) and anything happening are both in the set of statements.
  2. If “event \(A\) happening” is in the set of statements, then “event \(A\) not happening” should be as well. This is because, if \(\mathcal P(A) = a\), the set \(A\)’s complement needs to exist for us to define \(\mathcal P(A^c) = 1 - a\).
  3. If two events \(A, B\) are in the set of statements, then we must be able to ask, what is the probability of A or B happening - there is a need for consistency. So, the union \(A \cup B\) must also be in the set of statements.

Such a set of statements is called a \(\sigma\)-algebra. The probability measure is defined on the set of events and the statements we can pose about the events. The probability measure doesn’t know how to handle any statement not defined in the set of statements, so the probability of any undefined statements is meaningless.

Random Variables

For convenience, we usually work with functions called random variables \(X:\Omega \rightarrow \mathbb R \) which are defined on a tuple - the set of events and the set of statements - called a measurable space. Sometimes, we replace \(\mathbb R\) with some other set and call the function a “random element” instead. We do this, in part, for convenience e.g. \( \{“head”, “tail” \} \rightarrow \{ 0, 1 \} \).

Random variables are usually the basis for defining “probability/model laws”, which are a set of statements about the assumptions in a model.


A density is a function whose integral, with respect to a base measure (which ultimately ties in with how you measure things irl) produces a probability measure of a particular set.

The base measure is interesting, because it can inform how we measure things, e.g. we can count sets of zero lebesgue measure using dirac deltas or counting measures (common if our event space is discrete), we can use lengths of intervals (where the lebesgue measure comes into play - when we have a real event space) or perhaps a mixture of those two circumstances. We may even want to measure the size of sets w.r.t. to the cantor set in a pathological world.

Non-Measurable Spaces

Measurable spaces are a big deal because you’d run into major problems defining measures on non-measurable spaces. Famous (and amazing) examples include the Vitali Set and the Banach-Tarski Paradox. It’s interesting that the construction of these sets always relies on the axiom of choice; most mathematicians are comfortable accepting the existence of sets which have no measure than abandoning the axiom of choice. Here’s a cool video on that by PBS Infinite Series & Kelsey Houston-Edwards.


Efficient Gaussian Process Computation

I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.

4 min read

Gaussian Process Speech Synthesis (Draft)

Very untidy first working draft of the idea mentioned on the efficient computation page. Here, I fit a spectral mixture to some audio data to build a “generative model” for audio. I’ll implement efficient sampling later, and I’ll replace the arbitrary way this is trained with an LSTM-RNN to go straight from text/spectrograms to waveforms.

3 min read
Back to Top ↑


Gaussian Process Middle C

First of my experiments on audio modelling using gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.

~1 min read
Back to Top ↑


Back to Top ↑