A random variable is not a variable or random. It is a function that maps the output to the real numbers.

We will assume that the sample space is finite. Thus, given a random variable, F, from a sample space S, the set of numbers n that take the values of F is finite as well.

The probability that F takes the value N, in symbols (F=N), is defined as:

When defining a probability distribution P for a random variable F we often do not apply it’s sample space S but *directly* assign a probability to the event that F takes a certain value.

Thus we define the probability P(f=r) of the event that F has value R as: This is just basic probability. The probability of one single random variable is between 0 and 1. The sum of all random variables is 1.

We write Where “,” is used as “and” & “and” is used as “intersection”

If p(F_2 = r_2) does not = 0 then:

The multiplication rule is also applicable to random variables

We sometimes use symbols distinct from numbers to represent the value of a random variable. Like F(weather = sunny).

The probability distrubtion for a random variable gives the probabilities of all the possible values of the variable. Assume the order of the variables is fixed then:

Let f1,…,fk be random variables then a *joint probability distribution* for them gives the probabilities P(f1=r1,…,fk=rk) for a domain of interest.

A full joint probability distribution is a joint probability distribution for all relevant random variables f1,…,fk for a domain of interest.

Every probability question about a domain can be answered by the full joint probability distrubtion because the probabilirty of any event is a number of probabillities.

Note: n1…nk are often called data points or sample points.

A full joint probability distrubtion will *only* have information about a domain of interest. A non-full distrubtion could contain information about a domain you don’t care about.

Given a joint distribution P(f1,…,Fk), one can compute the *unconditional* on *marginal* probabillities of the random variables Fi by summing out the remaining values.

We can also compute *conditional / posterior* distributions from the full joint distribution. We use the *P* notation for conditional distributions.

P(F |
G) gives the conditional / postieor distrubtion of F given G given by the probabilities P(f=r | G=s) for all values r and s. |

Using this notation, the general version of the multiplication / product rule is:

P(F, G) = P(F |
G)P(G) |

Can be charecterised as the computation of potential probabilities For every variables F given derived evidence E_1,…,E_2.

The denominator can be viewed as a marginalation constant for the distrubtion *P*, ensuring that it adds up to 1.