Inference on discrete data uses an approximate method (chi-square) used for large samples and an exact method (Fisher's
Exact Method) for small samples. Approximate methods are accurate for large samples and are inaccurate for small samples.
There is nothing to prevent exact methods from being used for large samples. Both the chisquare and the exact methods yield
a p-value. The p-value is used to make conclusions about the null hypothesis.
2.0 THE SIMPLE CHI SQUARE PROCEDURE
The first steps in the analysis are to ascertain the normal distribution of the data, equality of variances in the
sample being compared, and adequacy of the sample size. If the data is not normally distributed or the sample size is too
small, the chisquare will not be valid. If the variances in the groups being compared differ markedly the test will also not
be accurate.
The data is laid out in contingency tables and is inspected manually before application of statistical tests. The Pearson
chi square is computed based on the observed and expected frequencies of each cell in the contingency table and is in essence
a measure of the deviation from the ‘average’. It can be used to test 2 or more proportions. Large contingency
tables are better partitioned or collapsed before applying the chi square test.
3.0 THE STRATIFIED CHISQUARE PROCEDURE
The Mantel-Haenszel chi-square is used to test 2 proportions in stratified data. It is used for example to test the
relation between exercise and cardiac health if the data is grouped (stratified) by gender.
4.0 THE MATCHED CHISQUARE
The MacNemar chi square is used for pair matched data. An example of such data is to
test if exercise improves cardiac health by comparing cardiac performance before and after exercise.
5.0 EXACT ANALYSIS OF PROPORTIONS
Exact methods are used instead of the chisquare test for small samples less than 20.They can be used
in 2 x 2, 2 x k, and r x c contingency tables. They involve direct computation of the p-value using factorials and probability.
The p-value is computed as the probability of results more extreme than the observed data.