Synopsis of a lecture given on 27th October 2006 to MPH (Epidemiology) students at the Department of Social and Preventive Medicine, Universiti Malaya by Professor Omar Hasan Kasule, Sr. MB ChB (MUK), MPH (Harvard), DrPH (Harvard)


Data analysis affects practical decisions. It involves construction of hypotheses and testing them. The 2-sided test covers both p1 > p2 and p2 > p1. The 1-sided test covers either p1 > p2 or p2 > p1 and not both. The 2-sided test is preferentially used because it is more conservative. Simple manual inspection of the data is needed can help identify outliers, assess the normality of data, identify commonsense relationships, and alert the investigator to errors in computer analysis. Data models for continuous data can be straight line regression, non-linear regression, or trends. Data models for categorical data are the maximum likelihood and the logistic models. Two procedures are employed in analytic epidemiology: test for association and measures of effect. The test for association is done first. The assessment of the effect measures is done after finding an association. Effect measures are useless in situations in which tests for association are negative. The common tests for association are: t-test, F test, chi-square, the linear correlation coefficient, and the linear regression coefficient. The effect measures commonly employed are: Odds Ratio, Risk Ratio, Rate difference. Measures of trend can discover relationships that are too small to be picked up by association and effect measures.



The tests of association for continuous data are the t-test, the F-test, the correlation coefficient, and the regression coefficient. The t-test is used for two sample means. Analysis of variance, ANOVA (F test) is used for more than 2 sample means. 1-way ANOVA involves one factor (explanatory variable). 2-way ANOVA involves 2 factors. Multiple analysis of variance, MANOVA, is used to test for more than 2 factors. Linear regression is used in conjunction with the t test for data that requires modeling. Dummy variables in the regression model can be used to control for confounding factors like age and sex.


The common test of association for discrete data is the chi square test. The chisquare test is used to test association of 2 or more proportions in contingency tables. The exact test is used to test proportions for small sample sizes. The Mantel-Haenszel chi-square statistic is used to test for association in stratified 2 x 2 tables. The chi square statistic is valid in one of the following conditions: (a) if at least 80% of cells have more than 5 observed (b) if at least 80% of cells have more than 1.0 expected, (c) if there are at least 5 observed in 80% of cells. If the observations are not independent of one another as in paired or matched studies, the McNemar chisquare test is used instead of the usual Pearson chisquare test. The chisquare works best for approximately Gaussian distributions.



The Mantel-Haenszel Odds Ratio is used for 2 proportions in single or stratified 2x2 contingency tables. Logistic regression can be used as an alternative to the MH procedure. For paired proportions, a special form of the Mantel-Haenszel OR and a special form of logistic regression called conditional logistic regression are used. Excessive disease risk is measured by Attributable Risk, Attributable Risk Proportion, and Population Attributable Risk. Variation of an effect measure by levels of a third variable is called effect modification by epidemiologists and interaction by statisticians. Synergism/antagonism is when the interaction between two causative factors leads to an effect more than what is expected on the basis of additivity or subtractibility. Interaction can be conceptualized at 4 levels. Statistical, biologic, public health, & decision making. The chi square for heterogeneity can be used to test for effect modification/interaction.

Professor Omar Hasan Kasule Sr. October 2006