Probability tutorial
Some motivation: I think probability and statistics are often confusing because the ontology of things like “random variable” and “distribution” are unclear/mysterious. Is a random variable really “random” or is it just a deterministic mathematical object (like everything else in math we like to work with)? Ever since starting to learn Haskell, I’ve been obsessed with asking what type a certain object has.
On notation: by we mean the set of functions mapping some set to another set . This is sometimes written , because . And is a “type annotation” and means is a function mapping to . But combining these, we could also have written , just like we write to mean is in some set . Using this notation we can say things like .
The following questions should be converted to multiple choice questions.
When we write , what is the type of ?
Answer: . Notice that is a binary function that does not return a boolean. This is no ordinary equals sign! By the way, what is the type of an ordinary equals sign?
Answer: , where can be any set.
When we write , what is the type of ?
Answer: . Note that here the result feeds into , but that doesn’t change the type of .
If we take and as inputs to , what is the type of the function that produces this output?
Answer: The output is in so we just have
When we write , what is the type of ?
Answer: .
What is the type of a Bernoulli distribution? First of all, it takes a real parameter between 0 and 1, so it’s of the form for some set . What is ?
What is the type of a random variable with a Bernoulli distribution taking parameter (often written )? In particular, what is the sample space (since we already know for some just because is a random variable)?
Some books write (implying ) while others write (implying ).
Another problem is the distinction between the probability density function vs writing . Wasserman discusses this a bit in All of Statistics, but I haven’t seen other books talk about this.
Some books define expectation like
while others define it like
i.e. summing over the domain vs summing over the range. How do show the equivalence of these two definitions?
Some books define expectation as a sum/integral over the values a random variable takes, while others define it as a sum/integral over the sample space, but no book that I have seen connects these two definitions. The only thing I have seen here is https://math.stackexchange.com/questions/1352884/equivalent-definitions-for-expected-value-of-random-variable. To me, this feels like a basic pedagogic oversight in all the books that I’ve seen. Are these two definitions so obviously equivalent that one need not spend even a sentence connecting them?