SAMPLE SIZE DETERMINATION
of the sample depends on the hypothesis, the budget, the study durations, and the precision required. If the sample is too
small the study will lack sufficient power to answer the study question. A sample bigger than necessary is a waste of resources.
Power is ability to detect a difference and is determined by the significance level, magnitude of the difference, and sample
size. Power = 1 – b = Pr (rejecting H0 when H0 is false) = Pr (true negative). The bigger the sample
size the more powerful the study. Beyond an optimal sample size, increase in power does not justify costs of larger sample.
There are procedures, formulas, and computer programs for determining sample sizes for different study designs.
SOURCES OF SECONDARY
is from decennial censuses, vital statistics, routinely collected data, epidemiological studies, and special health surveys.
Census data is reliable. It is wide in scope covering demographic, social, economic, and health information. The census describes
population composition by sex, race/ethnicity, residence, marriage, socio-economic indicators. Vital events are births, deaths,
Marriage & divorce, and some disease conditions. Routinely collected data are cheap but may be unavailable or incomplete. They are obtained from medical facilities, life and health insurance companies, institutions (like prisons, army, and schools), disease registries,
and administrative records. Observational epidemiological studies are of 3 types: cross-sectional, case-control, and
follow-up/cohort studies. Special surveys cover a larger population that epidemiological studies and may be health, nutritional,
or socio-demographic surveys.
PRIMARY DATA COLLECTION BY QUESTIONNAIRE
Questionnaire design involves content, wording of questions, format and layout. The reliability and validity of the questionnaire as well as practical logistics should
be tested during the pilot study. Informed consent and confidentiality must be respected. A protocol sets out data collection
procedures. Questionnaire administration by face-to-face interview is the best but is expensive. Questionnaire administration
by telephone is cheaper. Questionnaire administration by mail is very cheap but has a lower response rate. Computer-administered
questionnaire is associated with more honest responses.
Data can be obtained by clinical examination, standardized psychological/psychiatric evaluation,
measurement of environmental or occupational exposure, and assay of biological specimens (endobiotic or xenobiotic) and laboratory
experiments. Pharmacological experiments involve bioassay, quantal dose-effect curves, dose-response curves, and studies of
drug elimination. Physiology experiments involve measurements of parameters of the various body systems. Microbiology experiments
involve bacterial counts, immunoasays, and serological assays. Biochemical experiments involve measurements of concentrations
of various substances. Statistical and graphical techniques are used to display and summarize this data.
DATA MANAGEMENT AND DATA ANALYSIS
pre-coded questionnaires are preferable. Data is input as text, multiple choices, numeric, date and time, and yes/no responses.
In double entry techniques, 2 data entry clerks enter the same data and a check is made by computer on items on which they
differ. Data in the computer can be checked manually against the original questionnaire. Interactive data entry enables detection
and correction of logical and entry errors immediately. Data replication is a copy management service that involves copying
the data and also managing the copies. Synchronous data replication is instantaneous updating with no latency in data consistency.
In asynchronous data replication the updating is not immediate and consistency is loose.
Data editing is
the process of correcting data collection and data entry errors. The data is 'cleaned' using logical, statistical, range,
and consistency checks. All values are at the same level of precision (number of decimal places) to make computations consistent
and decrease rounding off errors. The kappa statistic is used to measure inter-rater agreement. Data editing identifies and corrects errors such as invalid or inconsistent values. Data
is validated and its consistency is tested. The main data problems are missing data, coding and entry errors, inconsistencies, irregular patterns, digit preference, out-liers, rounding-off / significant
figures, questions with multiple valid responses, and record duplication. Data transformation is the process of creating
new derived variables preliminary to analysis and includes mathematical operations such as division, multiplication, addition,
or subtraction; mathematical transformations such as logarithmic, trigonometric, power, and z-transformations.
Data analysis consists of data summarization, estimation and interpretation. Simple
manual inspection of the data is needed before statistical procedures. Preliminary examination consists of looking at tables and graphics. Descriptive statistics are used to detect errors,
ascertain the normality of the data, and know the size of cells. Missing values may be imputed or incomplete observations
may be eliminated. Tests for association, effect, or trend involve construction and testing of hypotheses. The tests
for association are the t, chi-square, linear correlation, and logistic regression tests or coefficients. The common effect
measures Odds Ratio, Risk Ratio, and Rate difference. Measures of trend can discover relationships that are not picked up
by association and effect measures. The probability, likelihood, and regression models are used in analysis. Analytic procedures
and computer programs vary for continuous and discrete data, for person-time and count data, for simple and stratified analysis,
for univariate, bivariate and multivariate analysis, and for polychotomous outcome variables. Procedures are different for
large samples and small samples.