Probability & Sigma Algebras
This was my totally non-rigorous take on the definition of probabilities when I was a student.
Probability today is a (mathematically) well defined object, after a lot of struggle in the late 19th and early 20th century. Later, subjective forms of (precursors to Bayesian) probability came up with axioms that could result in the modern accepted definition. Due to Kolmogorov, this is roughly:
If is a set of events and is a bunch of interesting statements we can pose about the events, then, , a probability measure, gives us a degree of plausibility for each interesting statement we may pose.
obeys the following rules:
- is either 0 or more than 0.
- is 1 (so the probability that something - anything - from happens is 1).
- If two events, , can’t both happen, then
Notice that the probability measure acts on the set of interesting statements. This set of statements needs to be defined on a set of events and must obey these rules:
- Nothing happening () and anything happening are both in the set of statements.
- If “event happening” is in the set of statements, then “event not happening” should be as well. This is because, if , the set ’s complement needs to exist for us to define .
- If two events are in the set of statements, then we must be able to ask, what is the probability of A or B happening - there is a need for consistency. So, the union must also be in the set of statements.
Such a set of statements is called a -algebra. The probability measure is defined on the set of events and the statements we can pose about the events. The probability measure doesn’t know how to handle any statement not defined in the set of statements, so the probability of any undefined statements is meaningless.
For convenience, we usually work with functions called random variables which are defined on a tuple - the set of events and the set of statements - called a measurable space. Sometimes, we replace with some other set and call the function a “random element” instead. We do this, in part, for convenience e.g. .
Random variables are usually the basis for defining “probability/model laws”, which are a set of statements about the assumptions in a model.
A density is a function whose integral, with respect to a base measure (which ultimately ties in with how you measure things irl) produces a probability measure of a particular set.
The base measure is interesting, because it can inform how we measure things, e.g. we can count sets of zero lebesgue measure using dirac deltas or counting measures (common if our event space is discrete), we can use lengths of intervals (where the lebesgue measure comes into play - when we have a real event space) or perhaps a mixture of those two circumstances. We may even want to measure the size of sets w.r.t. to the cantor set in a pathological world.
Measurable spaces are a big deal because you’d run into major problems defining measures on non-measurable spaces. Famous (and amazing) examples include the Vitali Set and the Banach-Tarski Paradox. It’s interesting that the construction of these sets always relies on the axiom of choice; most mathematicians are comfortable accepting the existence of sets which have no measure than abandoning the axiom of choice. Here’s a cool video on that by PBS Infinite Series & Kelsey Houston-Edwards.
I’ve been reading about PPCA, and this post summarizes my understanding of it. I took a lot of this from Pattern Recognition and Machine Learning by Bishop.
The main objective of this post was just to write about my typical workflow and views rather than come up with a great model. The structure of this data is also outside my immediate domain so I thought it’d be fun to write up a small diary on making a model with it.
I used to do a fair bit of astrophotography in university - it’s harder to find good skies now living in the city. Here are some of my old pictures. I’ve kept making rookie mistakes (too much ISO, not much exposure time, using a slow lens, bad stacking, …), for that I apologize!
The main aim here was to morph space inside a square but such that the transformation preserves some kind of ordering of the points. I wanted to use it to generate some random graphs on a flat surface and introduce spatial deformation to make the graphs more interesting.
I had a go at a few SEIR models, this is a rough diary of the process.
The initial aim here was to model speech samples as realizations of a Gaussian process with some appropriate covariance function, by conditioning on the spectrogram. I fit a spectral mixture kernel to segments of audio data and concatenated the segments to obtain the full waveform. Partway into writing efficient sampling code (generating waveforms using the Gaussian process state space representation), I realized that it’s actually quite easy to obtain waveforms if you’ve already got a spectrogram.
I’ll try to give examples of efficient gaussian process computation here, like the vec trick (Kronecker product trick), efficient toeliptz and circulant matrix computations, RTS smoothing and Kalman filtering using state space representations, and so on.
Minimal Working Examples
First of my experiments on audio modeling using Gaussian processes. Here, I construct a GP that, when sampled, plays middle c the way a grand piano would.
… using Stan & HMC