5.1.6 Measures of Central Tendency and Variation
In this topic we will learn how to:
- understand and use different measures of central tendency (mean, mode and median) and variation (range, interquartile range, standard deviation)
- calculate and use mean and standard deviation of a set of data (including grouped data) either from the data itself or from given totals and
Mode represents the most frequent score. In some cases, data may be bimodal. Meaning that it has two modes. In cases where there are more than two modes, the mode does not give us any useful information and another measure of central tendency must be used to interpret the data.
For grouped data, we cannot find the mode since the data is grouped. Instead, we find the modal class. The modal class is the most frequent class.Median represents the middle score when the data is arranged in rank order. For ungrouped data, to find the median, use the formula,
Where represents the median position and represents the sample size.
For grouped data, to find the median, use the formula,
Where represents the median position and represents the sample size.
The mean represents the average of all the values in the data set. For ungrouped data, it is calculated using the formula,
Where represents the mean, represents the data, represents the sample size.
Note: The symbol is read as ‘ bar’.
For grouped data, to find the mean, use the formula,
Where represents mean, represents the mid interval or midpoint of a class, represents the frequency of a class.
Range is the difference between the largest number and the smallest number in a data set. It helps us to see the spread of the data.
The upper quartile also known as the th percentile, is the region under which of the data lies, when arranged in rank order. For ungrouped data, it is calculated using the formula,
Where represents the position of the upper quartile and represents the sample size.
For grouped data, use the formula,
Where represents the position of the upper quartile and represents the sample size.
The lower quartile also known as the th percentile, is the region under which of the data lies, when arranged in rank order. For ungrouped data, it is calculated using the formula,
Where represents the position of the lower quartile and represents the sample size.
For grouped data, use the formula,
Where represents the position of the lower quartile and represents the sample size.
Note: The formulae above, for ungrouped data, for lower and upper quartile, only work given that the sample size of the data set is odd. If the data set is even, consider the lower quartile as the median for the bottom half of the data set and the upper quartile as the median for the upper half of the data set. See example number .
The interquartile range represents the spread of the data, excluding any outliers (extreme values). This gives a more accurate representation of the spread of the data. To find the interquartile range we use the formula,
Where represents interquartile range, represents the upper quartile, represents the lower quartile.
Standard deviation tells us the average distance of the data points from the mean.
For ungrouped data, it is calculated using the formula,
Where represents standard deviation, represents the data, represents the sample size, represents the mean.
For grouped data, use the formula,
Where represents standard deviation, represents the mid interval of mid point of a class, represents the frequency of a class, represents the sample size, represents the mean.
Variance tells us how far each number in the data set is from the mean and from every other number in the data set. Variance is the square of standard deviation, hence the formula for ungrouped data is,
Where represents variance, represents the data, represents the sample size, represents the mean.
For grouped data, use the formula,
Where represents variance, represents the mid interval of mid point of a class, represents the frequency of a class, represents the sample size, represents the mean.
Let’s look at some past paper questions.
1. Twelve tourists were asked to estimate the height, in mteres, of a new building. Their estimates are as follows. (9709/62/O/N/19 number 1)
50 | 45 | 62 | 30 | 40 | 55 | 110 | 38 | 52 | 60 | 55 | 40 |
Find the median and interquartile range for the data.
The first step is to arrange the data in rank order,
30 | 38 | 40 | 40 | 45 | 50 | 52 | 55 | 55 | 60 | 62 | 110 |
The median is the middle score. Since the sample size is an even number, there are two middle numbers, we have to find the average of those two numbers,
To find the lower quartile consider the first half of the dataset,
30 | 38 | 40 | 40 | 45 | 50 | 52 | 55 | 55 | 60 | 62 | 110 |
The lower quartile is the median of the first half of the dataset. Since there are two middle numbers, we have to find the average of those two numbers,
To find the upper quartile consider the second half of the dataset,
30 | 38 | 40 | 40 | 45 | 50 | 52 | 55 | 55 | 60 | 62 | 110 |
The upper quartile is the median of the second half of the dataset. Since there are two middle numbers, we have to find the average of those two numbers,
Substitute into the formula for interquartile range,
Therefore, the final answer is,
2. The mean and standard deviation of values of are and respectively. (9709/62/O/N/19 number 1)
(a) Find the values of and .
Let’s write out all the information we have been given in the stem of the question,
To find let’s use the formula for ungrouped mean,
Let’s make the subject of the formula,
Substitute the values of and ,
To find let’s use the formula for ,
Square both sides, to get rid of the square root sign,
Make the subject of the formula,
Substitute the values of , and ,
Therefore, the final answer is,
Another values of are such that their sum is and the sum of their squares is .
(b) Find the mean and standard deviation of all these values of .
Let’s write out all the information we have now,
Using the information above we can find the new values of and ,
Now let’s find the new values of mean and standard deviation. Let’s start with the mean,
Substitute into the formula,
Now let’s find the standard deviation,
Substitute into the formula,
Therefore, the final answer is,
3. The distances, metres, travelled to school by children were recorded. The results are summarised in the table below. (9709/52/O/N/21 number 7)
Mid Interval | 100 | 250 | 400 | 700 | 1 050 | 1 400 |
Frequency | 16 | 30 | 42 | 34 | 12 | 6 |
Calculate estimates of the mean and standard deviation of the distances.
To find the mean, we use the formula for grouped mean,
Let’s substitute into the formula,
Now let’s find the standard deviation, considering that our data is grouped,
Let’s first find ,Substitute into the formula for grouped standard deviation,
Therefore, the final answer is,
Note: If you use the statistical mode, on your calculator, you have to show the values of , , and , before you show the final answers for mean and standard deviation.