0703- Continuous Data Summary 1: Measures of Central Tendency

Lecture by Professor Omar Hasan Kasule Sr. at a Year 1 Semester 2 PPSD session on Wednesday 15th March 2007


Biological phenomena vary around the average. The average represents what is normal by being the point of equilibrium. The average is a representative summary of the data using one value. Three averages are commonly used: the mean, the mode, and the median.


There are 3 types of means: the arithmetic mean, the geometric mean, and the harmonic mean. The most popular is the arithmetic mean. The arithmetic mean is considered the most useful measure of central tendency in data analysis. The geometric and harmonic means are not usually used in public health. The median is gaining popularity. It is the basis of some non-parametric tests as will be discussed later. The mode has very little public health importance.



The arithmetic mean is the sum of the observations' values divided by the total number of observations and reflects the impact of all observations. The robust arithmetic mean is the mean of the remaining observations when a fixed percentage of the smallest and largest observations are eliminated. The mid-range is the arithmetic mean of the values of the smallest and the largest observations. The weighted arithmetic mean is used when there is a need to place extra emphasis on some values by using different weights. The indexed arithmetic mean is stated with reference with an index mean. The consumer price index (CPI) is an example of an indexed mean.


The arithmetic mean enjoys 2 desirable statistical advantages. It is the best single summary statistic. It has a rigorous mathematical definition. Its disadvantage is that it is affected by extreme values.



The mode is the value of the most frequent observation. It is rarely used in science. It is intuitive, easy to compute, and is the only average suitable for nominal data. It is useless for small samples because it is unstable due to sampling fluctuation. It cannot be manipulated mathematically. It is not a unique average, one data set can have more than 1 mode.



The median is value of the middle observation in a series ordered by magnitude. It is intuitive and is best used for erratically spaced or heavily skewed data. The median can be computed even if the extreme values are unknown in open-ended distributions. It is less stable to sampling fluctuation than the arithmetic mean.



Mean = mode = median for symmetrical data.


Mean > median for right skewed data.


Mean < median for left skewed data..


In general, mode-median = 2(median-mean).


The mean is best used to summarize symmetrical data. The median is used to summarize skewed data. For some data sets it is best to show all the 3 types of averages.

ŠProfessor Omar Hasan Kasule, Sr. March 2007