Probability Distributions: Discrete, Continuous, Means, Variances & CDFs

Expert reviewed 21 July 2024 11 minute read


HSC Maths Advanced Syllabus

  • use relative frequencies and histograms obtained from data to estimate probabilities associated with a continuous random variable
  • understand and use the concepts of a probability density function of a continuous random variable

Note:

Video coming soon!

Review of Discrete Distributions

Before we begin this module, we must first review items relating to the mean and variance of a distribution, discussed in the year 11 course and the previous module.

A discrete distribution describes the probabilities of the possible values of a discrete random variable. A random variable is ‘discrete’ if it has a countable number of distinct values. This means that the values do not go on indefinitely.

Additionally, a discrete distribution is typically defined by a probability mass function p(x)=P(X=x)p(x)=P(X=x). This gives the probability that the discrete random variable XX can take a value of xx, where xx is some value. It is also important to note that p(x)p(x) has two distinct qualities:

  • p(x)p(x) cannot be negative. This means that p(x)0p(x)\geq 0.
  • The sum of the probabilities of all possible values of XX equals 11. That is: p(x)=1\sum p(x)=1

How to Calculate the Mean and Variance of a Discrete Distribution

If there is a discrete random variable XX, Let p(x)=P(X=x)p(x)=P(X=x)

The formula to determine the expected value of a discrete distribution is as follows:

μ=E(X)=xp(x)\mu=E(X)=\sum xp(x)

where,

  • xx represents each possible value of the random variable
  • p(x)p(x) represents the probability that XX equals some value of xx

There are two formulas to determine the variance of a discrete distribution. Both work, however, the choice of which one to use depends on the scenario and personal preference. The formulae are as follows:

Var(X)=E((Xμ)2)=(xμ)2p(x)orVar(X)=E(X2)μ2=x2p(x)μ2Var(X)=E((X-\mu)^2)=\sum(x-\mu)^2p(x)\\or\\Var(X)=E(X^2)-\mu^2=\sum x^2p(x)-\mu ^2

where,

  • xx represents each possible value of the random variable
  • μ\mu is the expected value or mean of XX
  • p(x)p(x) represents the probability that XX equals some value of xx

These formulae, relating to the mean and variance of a discrete distribution, will be important in solving problems in the coming chapters.

The Cumulative Distribution Function

The Cumulative distribution function (CDF) F(x)F(x) of any numerical probability distribution is the probability that the score is less than or equal to xx. In this chapter, we are exploring the CDF only in relation to discrete distributions. Thus, in this case, the CDF is a way to determine the combined probability value at any specified range or point involving xx. The formula to determine the CDF is given by:

F(x)=P(Xx)F(x)=P(X\leq x)

Practice Question 1

Suppose XX is a discrete random variable representing the roll of a fair six-sided die. Determine the cumulative distribution function, when x=3x=3.

The first step in solving this question is to determine the probability of landing on a singular side of the die. As the chance of landing on a specific side of the die is equal, we can see that the probability is 16\frac{1}{6} as there are six equal sides.

Now, we have to solve the given expression:

F(3)=P(X3)F(3)=P(X\leq 3)

We can rewrite this expression as follows to easily determine the solution to this problem:

F(3)=P(X=1)+P(X=2)+P(X=3)=16+16+16=12F(3)=P(X=1)+P(X=2)+P(X=3)\\=\frac{1}{6}+\frac{1}{6}+\frac{1}{6}\\=\frac{1}{2}

\therefore We see that the CDF =12=\frac{1}{2}

What are Relative and Cumulative Frequencies?

Relative frequency refers to the proportion or fraction of times a particular value or a set of values occurs, relative to the total number of observations in a dataset. Relative frequencies are a way to understand how data is distributed across different values by expressing the frequencies as proportions that sum up to 1. In simple terms, relative frequencies measure an estimate of the probabilities of a dataset.

Cumulative frequency refers to the sum of frequencies accumulated up to a certain point in a data set. It provides a running total of frequencies by adding each frequency to the value that preceded it in a frequency distribution table. To put it simply, cumulative frequencies are estimates of the cumulative distribution function.

Thus, relative frequency and cumulative frequency are found by dividing each respective frequency (relative or cumulative) by the total frequency. Both are often represented in tables, as shown below. This becomes important and useful for calculating both types of frequencies.

This table is an example of a relative frequency table:

Score (x)Frequency (f)Relative Frequency
7050.25
7530.15
8060.30
8530.15
9030.15
Total201

As we can see, the relative frequency column represents each frequency as a proportion of the overall frequency. For example, the relative frequency of the first row is 0.250.25 which is the same as 520\frac{5}{20}.

This table is an example of a cumulative frequency table:

Score (x)Frequency (f)Cumulative Frequency
7055
7535 + 3 = 8
8068 + 6 = 14
85314 + 3 = 17
90317 + 3 = 20
Total2020

As we can see, the cumulative frequency column, continuously provides an ongoing cumulative frequency as the scores frequencies are added.

Return to Module 10: Continuous Probability Distributions