Synopsis of lecture by Professor Omar Hasan Kasule Sr. for MPH candidates at Universiti Malaya on 10th November 2006


Misclassification is inaccurate assignment of exposure or disease status. Random or non-differential misclassification of disease biases the effect measure towards the null and underestimates the effect measure but does not introduce bias. Non-random or differential misclassification is a systematic error that biases the effect measures away from the null exaggerating or underestimating the effect measure. Positive association may become negative and negative associations association may become positive. Misclassification bias is classified as information bias, detection bias, and proto-pathic bias. Information bias is systematic incorrect measurement on response due to questionnaire defects, observer errors, respondent errors, instrument errors, diagnostic errors, and exposure mis-specification. Detection bias arises when disease or exposure are sought more vigorously in one comparison more than the other group. Protopathic bias arises when early signs of disease cause a change in behaviour with regard to the risk factor. Misclassification bias can be prevented by using double-blind techniques to decrease observer and respondent bias. Treatment of misclassification bias is by the probabilistic approach or measurement of inter-rater variation.




Selection bias arises when subjects included in the study differ in a systematic way from those not included. It is due to biological factors, disease ascertainment procedures, or data collection procedures. Selection bias due to biological factors includes the Neyman fallacy and susceptibility bias. The Neyman fallacy arises when the risk factor is related to prognosis (survival) thus biasing prevalence studies. Susceptibility bias arises when susceptibility to disease is indirectly related to the risk factor. Selection bias due to disease ascertainment procedures includes publicity, exposure, diagnostic, detection, referral, self-selection, and Berkson biases. The Hawthorne self selection bias is also called the healthy worker effect since sick people are not employed or are dismissed. The Berkson fallacy arises due to differential admission of some cases to hospital in proportions such that studies based on the hospital give a wrong picture of disease-exposure relations in the community. Selection bias during data collection is represented by non-response bias and follow-up bias. Prevention of selection bias is by avoiding its causes that were mentioned above.  There is no treatment for selection bias once it has occurred. There are no easy methods for adjustment for the effect of selection bias once it has occurred.



Confounding is mixing up of effects. Confounding bias arises when the disease-exposure relationship is disturbed by an extraneous factor called the confounding variable. The confounding variable is not actually involved in the exposure-disease relationship. It is however predictive of disease but is unequally distributed between exposure groups. Being related both to the disease and the risk factor, the confounding variable could lead to a spurious apparent relation between disease and exposure if it is a factor in the selection of subjects into the study. A confounder must fulfil the following criteria: relation to both disease and exposure, not being part of the causal pathway, being a true risk factor for the disease, being associated to the exposure in the source population, and being not affected by either disease or exposure. Prevention of confounding at the design stage by eliminating the effect of the confounding factor can be achieved using 4 strategies: pair-matching, stratification, randomisation, and restriction. Confounding can be treated at the analysis stage by various adjustment methods (both non-multivariate and multi-variate). Non-multivariate treatment of confounding employs standardization and stratified Mantel-Haenszel analysis. Multivariate treatment of confounding employs multivariate adjustment procedures: multiple linear regression, linear discriminant function, and multiple logistic regression. Care must be taken to deal only with true confounders. Adjusting for non-confounders reduces the precision of the study.



This type of bias arises when a wrong statistical model is used. For example use of parametric methods for non-parametric data biases the findings.



Total survey error is the sum of the sampling error and three non-sampling errors (measurement error, non-response error, and coverage error). Sampling errors are easier to estimate than non-sampling errors. Sampling error decreases with increasing sample size. Non-sampling errors may be systematic like non-coverage of the whole sample or they may be non-systematic. Non-systematic errors cause severe bias. Sampling bias, positive or negative, arises when results from the sample are consistently wrong (biased) away from the true population parameter. The sources of bias are: incomplete or inappropriate sampling frame, use of a wrong sampling unit, non-response bias, measurement bias, coverage bias, and sampling bias.

Professor Omar Hasan Kasule Sr. November 2006