0702-Discrete Data Summary

Lecture by Prof Omar Hasan Kasule Sr. for Year 1 Semester 2 PPSD Session on Wednesday 21st February 2007


Data can be summarized using parameters, computed from populations, or statistics, computed from samples.


Discrete data can be qualitative or quantitative. It can be categorized before summarization.


The main types of statistics used are measures of location and measures of spread or variation.


Measures of location indicate accuracy or precision. The commonly used measures of location are rates, hazards, ratios, and proportions. Rates and Proportions are the most frequently used.


Measures of spread/variation indicate precision. Commonly used measures are the variance and the range (maximum and minimum). The square root of the variance is called the standard deviation (SD). SD is commonly used to indicate 95% confidence limits or may be added to a measurement in the form +/-SD



A rate is the number of events in a given population over a defined time period and has 3 components: a numerator, a denominator, and time. The numerator is included in the denominator.


Incidence rate is a commonly used measure in medicine and public health. The incidence rate of disease is defined as a /{(a+b)t} where a = number of new cases, b = number free of disease at start of time interval, and t = duration of the time of observation.


A crude rate is computed based on the whole population. It assumes homogeneity and ignores subgroups differences. It is therefore un-weighted, misleading, and unrepresentative. Inference and population comparisons based on crude rates are not valid.


To take sub-group differences into consideration, rates can be specific for age, gender, race, and cause. Specific rates are more informative than crude rates but are cognitively difficult to internalize, digest, and understand so many rates or be able to reach some conclusions.


Another way of taking care of sub-group differences is to use adjusted rates. An Adjusted or standardized rate is a representative summary that is a weighted average of specific rates free of the deficiencies of both the crude and specific rates.


Standardization eliminates the ‘confusing’ or ‘confounding’ effects due to subgroups. Standardization can be by direct standardization, indirect standardization, and other more advanced techniques.


Both direct and indirect standardization involve the same principles but use different weights. Direct standardization is used when age-specific rates are available and indirect standardization is used when age-specific rates are not available. Both direct and indirect standardization use a ‘standard population’ which can be a combination of the two or more populations being compared, use of just one of the comparison populations as a standard for the others, using the national population, and using the world population.



A hazard is defined as the number of events at time t among those who survive until time t. Hazard can also be defined as relative hazard with respect to a specific risk factor. At a specific point in time, relative hazard expresses the hazard among the exposed compared to the hazard among the non-exposed.



Ratio is generally defined as a : b where a= number of cases of a disease and b = number without disease. Examples of ratios are: the proportional mortality ratio, the maternal mortality ratio, and the fetal death ratio. The proportional mortality ratio is the number of deaths in a year due to a specific disease divided by the total number of deaths in that year. This ratio is useful in occupational studies because it provides information on the relative importance of a specific cause of death. The maternal mortality ratio is the total number of maternal deaths divided by the total live births. The fetal death ratio is the ratio of fetal deaths to live births.



A proportion is the number of events expressed as a fraction of the total population at risk without a time dimension. The formula of a proportion is a/(a+b) and the numerator is part of the denominator. The proportion most commonly used in medicine is the prevalence of disease. The term ‘prevalence rate’ is a common misnomer since prevalence is a proportion and not a rate. Prevalence describes a still/stationary picture of disease. Like rates, proportions can be crude, specific, and standard.

ŠProfessor Omar Hasan Kasule, Sr. February 2007