Data
analysis affects practical decisions. It involves construction of hypotheses and testing them. The 2-sided test covers both
p_{1} > p_{2} and p_{2} > p_{1}. The 1-sided test covers either p_{1} > p_{2
}or p_{2} > p_{1} and not both. The 2-sided test is preferentially used because it is more conservative.
Simple manual inspection of the data is needed can help identify outliers, assess the normality of data, identify commonsense
relationships, and alert the investigator to errors in computer analysis. Data models for continuous data can be straight
line regression, non-linear regression, or trends. Data models for categorical data are the maximum likelihood and the logistic
models.

Two
procedures are employed in analytic epidemiology: test for association and measures of effect. The test for association is
done first. The assessment of the effect measures is done after finding an association. Effect measures are useless in situations
in which tests for association are negative. The common tests for association are: t-test, F test, chi-square, the linear
correlation coefficient, and the linear regression coefficient. The effect measures commonly employed are: Odds Ratio, Risk
Ratio, Rate difference. Measures of trend can discover relationships that are too small to be picked up by association and
effect measures.

2.0
TESTS OF ASSOCIATION

The tests of association for continuous data are the t-test, the F-test, the correlation coefficient,
and the regression coefficient. The t-test is used for two sample means. Analysis of variance, ANOVA (F test) is used for
more than 2 sample means. 1-way ANOVA involves one factor (explanatory variable). 2-way ANOVA involves 2 factors. Multiple
analysis of variance, MANOVA, is used to test for more than 2 factors. Linear regression is used in conjunction with the t
test for data that requires modeling. Dummy variables in the regression model can be used to control for confounding factors
like age and sex.

The common test of association for discrete data is the chi square test. The chisquare test is used
to test association of 2 or more proportions in contingency tables. The exact test is used to test proportions for small sample
sizes. The Mantel-Haenszel chi-square statistic is used to test for association in stratified 2 x 2 tables. The chi square
statistic is valid in one of the following conditions: (a) if at least 80% of cells have more than 5 observed, (b) if at least
80% of cells have more than 1.0 expected, (c) if there are at least 5 observed in 80% of cells. If the observations are not
independent of one another as in paired or matched studies, the McNemar chisquare
test is used instead of the usual Pearson chisquare test. The chisquare works best for approximately Gaussian distributions.

An epidemiological study should be considered as a sort of measurement with parameters
for validity, precision, and reliability. Validity is a measure of accuracy. Precision measures variation in the estimate.
Reliability is reproducibility. Bias is defined technically as the situation in which the expectation of the parameter is
not zero. Bias may move the effect parameter away from the null value or toward the null value. In negative bias the parameter
estimate is below the true parameter. In positive bias the parameter estimate is above the true parameter. A study is not
valid if it is biased. Systematic errors lead to bias and therefore invalid parameter estimates. Random errors lead to imprecise
parameter estimates. Internal validity is concerned with the results of each individual study. Internal validity is impaired
by study bias. External validity is generalizability of results. Traditionally results are generalized if the sample is representative
of the population. In practice generalizability is achieved by looking at results of several studies each of which is individually
internally valid. It is therefore not the objective of each individual study to be generalizable because that would require
assembling a representative sample. Precision is a measure for lack of random error. An effect measure with a narrow confidence
interval is said to be precise. An effect measure with a wide confidence interval in imprecise. Precision is increased in
three ways: increasing the study size, increasing study efficiency, and care taken in measurement of variables to decrease
mistakes.