0703-Multiple Linear Regression

Lecture by Professor Omar Hasan Kasule Sr. for Year 2 Semester 2 PPSD session on Wednesday 15th March 2007


Multivariate analysis determines the relative contribution of different causes to a single event.


Multiple linear regression, a form of multivariate analysis, is used to study the relation between 2 variables while adjusting or controlling for several confounding variables.


The multiple linear regression equation is Y =a + b1X1 + b2X2 + b3X3 ….. where Y is the dependent/response variable, a is the intercept, b1, b2, b3 are the slope/regression coefficients. The independent/predictor variables are designated as X1 , X2 ,  and b3X3.


A regression equation with up to three independent variables can be represented graphically using three-dimensional geometry. Equations with more than 3 variables cannot be represented graphically.



Linear regression models can be used to predict but not to study causal relations in a definitive way.


Simple linear regression is used in prediction of Y for given values of X.


Both linear extrapolation and linear intrapolation are possible; the latter is more reliable.


 Extrapolation is prediction for X values outside the range of the data on which the regression model was built.


Intrapolation is prediction within the range of the data.



Fitting the simple regression model is very straightforward since it has only one independent variable.


Three procedures are used for fitting the multiple regression line: step-up, step-down, and step-wise.


Step-up is forward entry or forward selection and it starts with a minimal model. It involves adding one variable at a time without trying to delete any variable.


In step-down or backward elimination we start with a full model or maximal model consisting of all variables then we delete one variable at a time without trying to add any new variables.









Step-wise selection is a combination of step up and step down selection. All variables are run to select the one with the largest absolute value of the t ratio. The selected variable is entered first into the model. Variables are added to the model one at a time if they make a significant change in p-value.



Selection of the best model is guided by the coefficient of determination, the p-value associated with the predictor variable, and other more complex methods.


The best model is one with the highest coefficient of determination or one for which any additions do not make any significant changes in the p-value.


The maximum value of the coefficient of determination is 1.0 and the minimum value is 0. The best model has maximum CD.

ŠProfessor Omar Hasan Kasule, Sr. March 2007