Probability tutorial

View source | View history | Atom feed for this file

Creation date: 2017-12-29
Last substantive revision date: 2017-12-29
Last modification date: 2022-05-03
Generated on: 2025-03-16

Some motivation: I think probability and statistics are often confusing because the ontology of things like “random variable” and “distribution” are unclear/mysterious. Is a random variable really “random” or is it just a deterministic mathematical object (like everything else in math we like to work with)? Ever since starting to learn Haskell, I’ve been obsessed with asking what type a certain object has.

On notation: by $A \to B$ we mean the set of functions mapping some set $A$ to another set $B$ . This is sometimes written $B^{A}$ , because $| A \to B | = | B |^{| A |}$ . And $f : A \to B$ is a “type annotation” and means $f$ is a function mapping $A$ to $B$ . But combining these, we could also have written $f \in (A \to B)$ , just like we write $x \in X$ to mean $x$ is in some set $X$ . Using this notation we can say things like $(A \to B) \subset (A \times B)$ .

The following questions should be converted to multiple choice questions.

When we write $X = x$ , what is the type of $=$ ?

Answer: $(=) : (Ω \to R) \times R \to P (Ω)$ . Notice that $(=)$ is a binary function that does not return a boolean. This is no ordinary equals sign! By the way, what is the type of an ordinary equals sign?

Answer: $(=) : A \times A \to {True, False}$ , where $A$ can be any set.

When we write $Pr (X = x)$ , what is the type of $=$ ?

Answer: $(=) : (Ω \to R) \times R \to P (Ω)$ . Note that here the result feeds into $Pr$ , but that doesn’t change the type of $(=)$ .

If we take $X$ and $x$ as inputs to $f (X, x) = Pr (X = x)$ , what is the type of the function $f$ that produces this output?

Answer: The output is in $R$ so we just have

$f : (Ω \to R) \times R \to R$

When we write $Pr (X \leq x)$ , what is the type of $\leq$ ?

Answer: $(\leq) : (Ω \to R) \times R \to R$ .

What is the type of a Bernoulli distribution? First of all, it takes a real parameter between 0 and 1, so it’s of the form $B e r n o u l l i : [0, 1] \to A$ for some set $A$ . What is $A$ ?

What is the type of a random variable $X$ with a Bernoulli distribution taking parameter $p$ (often written $X \sim B e r n o u l l i (p)$ )? In particular, what is the sample space (since we already know $X : Ω \to R$ for some $Ω$ just because $X$ is a random variable)?

Some books write $Pr (ω)$ (implying ${Pr}_{Ω} : Ω \to R$ ) while others write $Pr ({ω})$ (implying ${Pr}_{Ω} : P (Ω) \to R$ ).

Another problem is the distinction between the probability density function $f_{X} (x)$ vs writing $Pr (X = x)$ . Wasserman discusses this a bit in All of Statistics, but I haven’t seen other books talk about this.

Some books define expectation like

$E (X) = \sum ω \in Ω (X (ω) Pr (ω))$

while others define it like

$E (X) = \sum x \in R x p (x)$

i.e. summing over the domain vs summing over the range. How do show the equivalence of these two definitions?

Some books define expectation as a sum/integral over the values a random variable takes, while others define it as a sum/integral over the sample space, but no book that I have seen connects these two definitions. The only thing I have seen here is https://math.stackexchange.com/questions/1352884/equivalent-definitions-for-expected-value-of-random-variable. To me, this feels like a basic pedagogic oversight in all the books that I’ve seen. Are these two definitions so obviously equivalent that one need not spend even a sentence connecting them?