5.1.6 Measures of Central Tendency and Variation
In this topic we will learn how to:
- understand and use different measures of central tendency (mean, mode and median) and variation (range, interquartile range, standard deviation)
- calculate and use mean and standard deviation of a set of data (including grouped data) either from the data itself or from given totals \Sigma x and \Sigma x^{2}
\textbf{\large\textcolor{gray}{Measures of Central Tendency}}\textbf{\textcolor{gray}{Mode}}Mode represents the most frequent score. In some cases, data may be bimodal. Meaning that it has two modes. In cases where there are more than two modes, the mode does not give us any useful information and another measure of central tendency must be used to interpret the data.
For grouped data, we cannot find the mode since the data is grouped. Instead, we find the modal class. The modal class is the most frequent class.\textbf{\textcolor{gray}{Median}}Median represents the middle score when the data is arranged in rank order. For ungrouped data, to find the median, use the formula,
q_{2} = \frac{n + 1}{2}Where q_{2} represents the median position and n represents the sample size.
For grouped data, to find the median, use the formula,
q_{2} = \frac{1}{2}nWhere q_{2} represents the median position and n represents the sample size.
\textbf{\textcolor{gray}{Mean}}The mean represents the average of all the values in the data set. For ungrouped data, it is calculated using the formula,
\overline{x} = \frac{\Sigma x}{n}Where \overline{x} represents the mean, x represents the data, n represents the sample size.
Note: The symbol \overline{x} is read as ‘x bar’.
For grouped data, to find the mean, use the formula,
\overline{x} = \frac{\Sigma xf}{\Sigma f}Where \overline{x} represents mean, x represents the mid interval or midpoint of a class, f represents the frequency of a class.
\textbf{\large\textcolor{gray}{Variation}}\textbf{\textcolor{gray}{Range}}Range is the difference between the largest number and the smallest number in a data set. It helps us to see the spread of the data.
\textbf{\textcolor{gray}{Upper Quartile}}The upper quartile also known as the 75th percentile, is the region under which 75\% of the data lies, when arranged in rank order. For ungrouped data, it is calculated using the formula,
q_{3} = \frac{3}{4}(n + 1)Where q_{3} represents the position of the upper quartile and n represents the sample size.
For grouped data, use the formula,
q_{3} = \frac{3}{4}nWhere q_{3} represents the position of the upper quartile and n represents the sample size.
\textbf{\textcolor{gray}{Lower Quartile}}The lower quartile also known as the 25th percentile, is the region under which 25\% of the data lies, when arranged in rank order. For ungrouped data, it is calculated using the formula,
q_{1} = \frac{1}{4}(n + 1)Where q_{1} represents the position of the lower quartile and n represents the sample size.
For grouped data, use the formula,
q_{1} = \frac{1}{4}nWhere q_{1} represents the position of the lower quartile and n represents the sample size.
Note: The formulae above, for ungrouped data, for lower and upper quartile, only work given that the sample size of the data set is odd. If the data set is even, consider the lower quartile as the median for the bottom half of the data set and the upper quartile as the median for the upper half of the data set. See example number 1.
\textbf{\textcolor{gray}{Interquartile Range}}The interquartile range represents the spread of the data, excluding any outliers (extreme values). This gives a more accurate representation of the spread of the data. To find the interquartile range we use the formula,
IQR = q_{3} - q_{1}Where IQR represents interquartile range, q_{3} represents the upper quartile, q_{1} represents the lower quartile.
\textbf{\textcolor{gray}{Standard Deviation}}Standard deviation tells us the average distance of the data points from the mean.
For ungrouped data, it is calculated using the formula,
\sigma = \sqrt{\frac{\Sigma x^{2}}{n} - \overline{x}^{2}}Where \sigma represents standard deviation, x represents the data, n represents the sample size, \overline{x} represents the mean.
For grouped data, use the formula,
\sigma = \sqrt{\frac{\Sigma x^{2}f}{\Sigma f} - \overline{x}^{2}}Where \sigma represents standard deviation, x represents the mid interval of mid point of a class, f represents the frequency of a class, n represents the sample size, \overline{x} represents the mean.
\textbf{\textcolor{gray}{Variance}}Variance tells us how far each number in the data set is from the mean and from every other number in the data set. Variance is the square of standard deviation, hence the formula for ungrouped data is,
\sigma^{2} = \frac{\Sigma x^{2}}{n} - \overline{x}^{2}Where \sigma^{2} represents variance, x represents the data, n represents the sample size, \overline{x} represents the mean.
For grouped data, use the formula,
\sigma^{2} = \frac{\Sigma x^{2}f}{\Sigma f} - \overline{x}^{2}Where \sigma^{2} represents variance, x represents the mid interval of mid point of a class, f represents the frequency of a class, n represents the sample size, \overline{x} represents the mean.
Let’s look at some past paper questions.
1. Twelve tourists were asked to estimate the height, in mteres, of a new building. Their estimates are as follows. (9709/62/O/N/19 number 1)
50 | 45 | 62 | 30 | 40 | 55 | 110 | 38 | 52 | 60 | 55 | 40 |
Find the median and interquartile range for the data.
The first step is to arrange the data in rank order,
30 | 38 | 40 | 40 | 45 | 50 | 52 | 55 | 55 | 60 | 62 | 110 |
The median is the middle score. Since the sample size (12) is an even number, there are two middle numbers, we have to find the average of those two numbers,
q_{2} = \frac{\textcolor{red}{50} + \textcolor{red}{52}}{2}q_{2} = 51To find the lower quartile consider the first half of the dataset,
30 | 38 | 40 | 40 | 45 | 50 | 52 | 55 | 55 | 60 | 62 | 110 |
The lower quartile is the median of the first half of the dataset. Since there are two middle numbers, we have to find the average of those two numbers,
q_{1} = \frac{\textcolor{red}{40} + \textcolor{red}{40}}{2}q_{1} = 40To find the upper quartile consider the second half of the dataset,
30 | 38 | 40 | 40 | 45 | 50 | 52 | 55 | 55 | 60 | 62 | 110 |
The upper quartile is the median of the second half of the dataset. Since there are two middle numbers, we have to find the average of those two numbers,
q_{3} = \frac{\textcolor{red}{55} + \textcolor{red}{60}}{2}q_{3} = 57.5Substitute into the formula for interquartile range,
IQR = q_{3} - q_{1}IQR = 57.5 - 40IQR = 17.5Therefore, the final answer is,
q_{2} = 51\ \ \ \ \ \ \ \ IQR = 17.52. The mean and standard deviation of 20 values of x are 60 and 4 respectively. (9709/62/O/N/19 number 1)
(a) Find the values of \Sigma x and \Sigma x^{2}.
Let’s write out all the information we have been given in the stem of the question,
\overline{x} = \textcolor{#2192ff}{60} \ \ \ \ \ \ \sigma = \textcolor{#0f0}{4} \ \ \ \ n = \textcolor{red}{20}To find \Sigma x let’s use the formula for ungrouped mean,
\overline{x} = \frac{\Sigma x}{n}Let’s make \Sigma x the subject of the formula,
\Sigma x = \overline{x} \times nSubstitute the values of \overline{x} and n,
\Sigma x = \textcolor{#2192ff}{60} \times \textcolor{red}{20}\Sigma x = 1\ 200To find \Sigma x^{2} let’s use the formula for \sigma,
\sigma = \sqrt{\frac{\Sigma x^{2}}{n} - \overline{x}^{2}}Square both sides, to get rid of the square root sign,
\sigma^{2} = \frac{\Sigma x^{2}}{n} - \overline{x}^{2}Make \Sigma x^{2} the subject of the formula,
\frac{\Sigma x^{2}}{n} = \sigma^{2} + \overline{x}^{2}\Sigma x^{2} = n\left(\sigma^{2} + \overline{x}^{2}\right)Substitute the values of n, \sigma and \overline{x},
\Sigma x^{2} = \textcolor{red}{20}\left(\textcolor{#0f0}{4}^{2} + \textcolor{#2192ff}{60}^{2}\right)\Sigma x^{2} = 72\ 320Therefore, the final answer is,
\Sigma x = 1\ 2000 \ \ \ \ \Sigma x^{2} = 72\ 320Another 10 values of x are such that their sum is 550 and the sum of their squares is 40\ 500.
(b) Find the mean and standard deviation of all these 30 values of x.
Let’s write out all the information we have now,
n = 30 \ \ \ \ \ \ x_{1} + x_{2} ... + x_{20} = 1\ 200 \ \ \ \ x_{21} + x_{22} ... + x_{30} = 550x_{1}^{2} + x_{2}^{2} ... + x_{20}^{2} = 72\ 320\ \ \ \ x_{21}^{2} + x_{22}^{2} ... + x_{30}^{2} = 40\ 500Using the information above we can find the new values of \Sigma x and \Sigma x^{2},
\Sigma x = \left(x_{1} + x_{2} ... + x_{20}\right) + \left(x_{21} + x_{22} ... + x_{30}\right)\Sigma x = 1\ 200 + 550\Sigma x = 1\ 750
\Sigma x^{2} = \left(x_{1}^{2} + x_{2}^{2} ... + x_{20}^{2}\right) + \left(x_{21}^{2} + x_{22}^{2} ... + x_{30}^{2}\right)\Sigma x^{2} = 72\ 320 + 40\ 500\Sigma x^{2} = 112\ 820Now let’s find the new values of mean and standard deviation. Let’s start with the mean,
\overline{x} = \frac{\Sigma x}{n}Substitute into the formula,
\overline{x} = \frac{1\ 750}{30}\overline{x} = \frac{175}{3}\overline{x} = 58.3Now let’s find the standard deviation,
\sigma = \sqrt{\frac{\Sigma x^{2}}{n} - \overline{x}^{2}}Substitute into the formula,
\sigma = \sqrt{\frac{112\ 820}{30} - \left(\frac{175}{3}\right)^{2}}\sigma = 18.9179515\sigma = 18.9Therefore, the final answer is,
\overline{x} = 58.3 \ \ \ \ \sigma = 18.93. The distances, x metres, travelled to school by 140 children were recorded. The results are summarised in the table below. (9709/52/O/N/21 number 7)
Mid Interval | 100 | 250 | 400 | 700 | 1 050 | 1 400 |
Frequency | 16 | 30 | 42 | 34 | 12 | 6 |
Calculate estimates of the mean and standard deviation of the distances.
To find the mean, we use the formula for grouped mean,
\overline{x} = \frac{\Sigma xf}{\Sigma f}Let’s substitute into the formula,
\overline{x} = \frac{100(16) + 250(30) + 400(42) + 700(34) + 1\ 050(12) + 1\ 400(6)}{140}
\overline{x} = \frac{70\ 700}{140}\overline{x} = 505Now let’s find the standard deviation, considering that our data is grouped,
\sigma = \sqrt{\frac{\Sigma x^{2}f}{\Sigma f} - \overline{x}^{2}}
Let’s first find \Sigma x^{2}f,\Sigma x^{2}f = 100^{2}(16) + 250^{2}(30) + 400^{2}(42) + 700^{2}(34) + 1\ 050^{2}(12) + 1\ 400^{2}(6)\Sigma x^{2}f = 50\ 405\ 000Substitute into the formula for grouped standard deviation,
\sigma = \sqrt{\frac{\Sigma x^{2}f}{\Sigma f} - \overline{x}^{2}}\sigma = \sqrt{\frac{50\ 405\ 000}{140} - (505)^{2}}\sigma = 324Therefore, the final answer is,
\overline{x} = 505 \ \ \ \ \sigma = 324Note: If you use the statistical mode, on your calculator, you have to show the values of \Sigma xf, \Sigma f, \Sigma x^{2}f and \overline{x}, before you show the final answers for mean and standard deviation.