1.0 REGRESSION TO THE MEAN

The term regression was introduced by Sir Francis Galton (1822-1911). He noticed that for any measurement
there is regression to the mean. He reached this conclusion after his classical study of the heights and fathers and their
sons.

The phenomenon of regression to the mean may also be one of the basic laws of nature that all variations
tend to move towards the average value. Thus very tall fathers have sons who are not as tall. Very short fathers have sons
who are not as short. The regression to the mean is necessary for balance and equilibrium of biological phenomena since variation
does not get 'out of control'.

A similar phenomenon is seen in social change. Very rich fathers tend to have children who achieve
less and may even squander all their inheritance. The sons of the poor struggle to escape poverty and end up doing better
than their fathers. If this regression to the mean did not happen, advantages of wealth would be transmitted from generation
to generation creating a very unjust society with very few super-rich and many paupers.

2.0 INDEPENDENT and DEPENDENT VARIABLES: Both correlation and regression address the relation between
2 variables. The scatter-gram is basic to both. In correlation both x and y are random. In regression x is independent (i.e.
random) whereas y is dependent being determined by x. The outcome variable in regression is measured as means. The independent
variable can be continuous or categorical. The dependent variable can be continuous or binary.

3.0 THE SIMPLE LINEAR REGRESSION EQUATION

The mathematical model of simple linear regression is shown in the regression equation/regression function/regression
line: y = a + bx where ‘y’ is the dependent/response variable, ‘a’ is the intercept, ‘b’
is the slope/regression coefficient, and ‘x’ is the dependent/predictor variable. Both ‘a’ and ‘b’
are in a strict sense regression coefficients but the term is usually reserved for ‘b’ only.

4.0 HYPOTHESIS TESTING

The t test can be used to test the significance of the regression coefficient.

5.0
USES OF THE REGRESSION EQUATION

The
regression equation is used for 2 main purposes: (a) testing for association between ‘x’ and ‘y’ and
(b) predicting ‘y’ from ‘x’.

The
regression coefficient ‘b’ is used to determine if ‘x’ is associated with ‘y’. By doing
a t test on the regression coefficient, we can derive a p-value. If p <0.05 we conclude that there is significant association.
If p>0.05 we conclude that there is no significant association.

Once
the regression equation is constructed, we can predict ‘y’ by putting any selected value of ‘x’ in
the equation.