1.0 PRINCIPLES OF MULTIVARIATE ANALYSIS
Multivariate analysis determines the relative contribution of different causes to a single event.
Multiple linear regression, a form of multivariate analysis, is used to study the relation between
2 variables while adjusting or controlling for several confounding variables.
The multiple linear regression equation is Y =a + b1X1 + b2X2
+ b3X3 ….. where Y is the dependent/response variable, a is the intercept, b1, b2,
b3 are the slope/regression coefficients. The independent/predictor variables are designated as X1
, X2 , and b3X3.
A regression equation with up to three independent variables can be represented graphically using three-dimensional
geometry. Equations with more than 3 variables cannot be represented graphically.
2.0 PREDICTION IN REGRESSION
Linear regression models can be used to predict but not to study causal relations in a definitive way.
Simple linear regression is used in prediction of Y for given values of X.
Both linear extrapolation and linear intrapolation are possible; the latter is more reliable.
Extrapolation is prediction for X values outside the range
of the data on which the regression model was built.
Intrapolation is prediction within the range of the data.
3.0 FITTING REGRESSION MODELS
Fitting the simple regression model is very straightforward since it has only one independent variable.
Three procedures are used for fitting the multiple regression line: step-up, step-down, and step-wise.
Step-up is forward entry or forward selection and it starts with a minimal model. It involves adding
one variable at a time without trying to delete any variable.
In step-down or backward elimination we start with a full model or maximal model consisting of all
variables then we delete one variable at a time without trying to add any new variables.
Step-wise selection is a combination of step up and step down selection. All variables are run to select
the one with the largest absolute value of the t ratio. The selected variable is entered first into the model. Variables are
added to the model one at a time if they make a significant change in p-value.
4.0 ASSESSING REGRESSION MODELS
Selection of the best model is guided by the coefficient of determination, the p-value associated with
the predictor variable, and other more complex methods.
The best model is one with the highest coefficient of determination or one for which any additions
do not make any significant changes in the p-value.
The maximum value of the coefficient
of determination is 1.0 and the minimum value is 0. The best model has maximum CD.