Formulas for Correlation and Regression

Expert reviewed β€’ 21 July 2024 β€’ 5 minute read


  • calculate and interpret Pearson’s correlation coefficient (π‘Ÿ) using technology to quantify the strength of a linear association of a sample
  • model a linear relationship by fitting an appropriate line of best fit to a scatterplot and using it to describe and quantify associations
    • fit a line of best fit to the data by eye and using technology
    • fit a least-squares regression line to the data using technology
    • interpret the intercept and gradient of the fitted line

Note:

Video coming soon!

In the previous module, we discussed calculating correlation and regression using a calculator. Although using a calculator is a significantly more efficient method, we must also know how to calculate correlation and regression using their given formula. This chapter will discuss how to do this in detail.

The Formula for Correlation

The formula to calculate the correlation (Pearsons Correlation Coefficient) of a dataset is as follows:

r=βˆ‘(xβˆ’xβ€Ύ)(yβˆ’yβ€Ύ)βˆ‘(xβˆ’xβ€Ύ)2βˆ‘(yβˆ’yβ€Ύ)2r=\frac{\sum (x-\overline{x})(y-\overline{y})}{\sqrt{\sum(x-\overline{x})^2\sum(y-\overline{y})^2}}

where,

  • rr is Pearson’s Correlation Coefficient
  • xx and yy are scores of two different variables
  • xβ€Ύ\overline{x} is the average of all xx values
  • yβ€Ύ\overline{y} is the average of all yy values

The Formula for Regression

As discussed in the previous chapter, the formula for the line of regression (line of best fit) is y=mx+cy=mx+c. The slope mm and intercept cc can be calculated using the following formulas:

m=βˆ‘(xβˆ’xβ€Ύ)(yβˆ’yβ€Ύ)βˆ‘(xβˆ’xβ€Ύ)2m=\frac{\sum (x-\overline{x})(y-\overline{y})}{\sum(x-\overline{x})^2}

and

c=yβ€Ύβˆ’mxβ€Ύc=\overline{y}-m\overline{x}

where,

  • mm is the slope of the regression line
  • cc is the intercept of the regression line
  • xx and yy are scores of two different variables
  • xβ€Ύ\overline{x} is the average of all xx values
  • yβ€Ύ\overline{y} is the average of all yy values

Why we use a Calculator for Calculating Correlation and Regression?

Although it is important to know the formulas listed above, in the HSC course they are inefficient and less accurate at determining results compared to a calculator. Thus, in exams, it is heavily advised to use a calculator to find correlation and regression, as discussed in the previous chapter.

Return to Module 9: Displaying and Interpreting Data