Probability tutorial

Some motivation: I think probability and statistics are often confusing because the ontology of things like “random variable” and “distribution” are unclear/mysterious. Is a random variable really “random” or is it just a deterministic mathematical object (like everything else in math we like to work with)? Ever since starting to learn Haskell, I’ve been obsessed with asking what type a certain object has.

On notation: by \(A \to B\) we mean the set of functions mapping some set \(A\) to another set \(B\). This is sometimes written \(B^A\), because \(|A\to B| = |B|^{|A|}\). And \(f : A \to B\) is a “type annotation” and means \(f\) is a function mapping \(A\) to \(B\). But combining these, we could also have written \(f \in (A\to B)\), just like we write \(x\in X\) to mean \(x\) is in some set \(X\). Using this notation we can say things like \((A \to B) \subset (A\times B)\).

The following questions should be converted to multiple choice questions.

When we write \(X = x\), what is the type of \(=\)?

Answer: \((=) : (\Omega \to \mathbf R) \times \mathbf R \to \mathcal P (\Omega)\). Notice that \((=)\) is a binary function that does not return a boolean. This is no ordinary equals sign! By the way, what is the type of an ordinary equals sign?

Answer: \((=): A \times A \to \{\text{True}, \text{False}\}\), where \(A\) can be any set.

When we write \(\Pr (X = x)\), what is the type of \(=\)?

Answer: \((=) : (\Omega \to \mathbf R) \times \mathbf R \to \mathcal P(\Omega)\). Note that here the result feeds into \(\Pr\), but that doesn’t change the type of \((=)\).

If we take \(X\) and \(x\) as inputs to \(f(X,x) = \Pr(X=x)\), what is the type of the function \(f\) that produces this output?

Answer: The output is in \(\mathbf R\) so we just have

\[f : (\Omega\to\mathbf R) \times \mathbf R \to \mathbf R\]

When we write \(\Pr (X \leq x)\), what is the type of \(\leq\)?

Answer: \((\leq) : (\Omega \to \mathbf R) \times \mathbf{R} \to \mathbf R\).

What is the type of a Bernoulli distribution? First of all, it takes a real parameter between 0 and 1, so it’s of the form \(\mathrm{Bernoulli} : [0,1] \to A\) for some set \(A\). What is \(A\)?

What is the type of a random variable \(X\) with a Bernoulli distribution taking parameter \(p\) (often written \(X \sim \mathrm{Bernoulli}(p)\))? In particular, what is the sample space (since we already know \(X : \Omega \to \mathbf R\) for some \(\Omega\) just because \(X\) is a random variable)?

Some books write \(\Pr(\omega)\) (implying \(\Pr_\Omega : \Omega \to \mathbf R\)) while others write \(\Pr(\{\omega\})\) (implying \(\Pr_\Omega : \mathcal P(\Omega) \to \mathbf R\)).

Another problem is the distinction between the probability density function \(f_X(x)\) vs writing \(\Pr(X=x)\). Wasserman discusses this a bit in All of Statistics, but I haven’t seen other books talk about this.

Some books define expectation like

\[\mathrm E(X) = \sum_{\omega\in\Omega}(X(\omega) \Pr({\omega}))\]

while others define it like

\[\mathrm E(X) = \sum_{x\in\mathbf R} xp(x)\]

i.e. summing over the domain vs summing over the range. How do show the equivalence of these two definitions?

Some books define expectation as a sum/integral over the values a random variable takes, while others define it as a sum/integral over the sample space, but no book that I have seen connects these two definitions. The only thing I have seen here is To me, this feels like a basic pedagogic oversight in all the books that I’ve seen. Are these two definitions so obviously equivalent that one need not spend even a sentence connecting them?