5.1.6 Measures of Central Tendency and Variation

In this topic we will learn how to:

understand and use different measures of central tendency (mean, mode and median) and variation (range, interquartile range, standard deviation)
calculate and use mean and standard deviation of a set of data (including grouped data) either from the data itself or from given totals $\Sigma x$ and $\Sigma x^{2}$

$\textbf{\large\textcolor{gray}{Measures of Central Tendency}}$ $\textbf{\textcolor{gray}{Mode}}$ Mode represents the most frequent score. In some cases, data may be bimodal. Meaning that it has two modes. In cases where there are more than two modes, the mode does not give us any useful information and another measure of central tendency must be used to interpret the data.

For grouped data, we cannot find the mode since the data is grouped. Instead, we find the modal class. The modal class is the most frequent class. $\textbf{\textcolor{gray}{Median}}$ Median represents the middle score when the data is arranged in rank order. For ungrouped data, to find the median, use the formula,
$q_{2} = \frac{n + 1}{2}$ Where $q_{2}$ represents the median position and $n$ represents the sample size.

For grouped data, to find the median, use the formula,
$q_{2} = \frac{1}{2}n$ Where $q_{2}$ represents the median position and $n$ represents the sample size.
$\textbf{\textcolor{gray}{Mean}}$ The mean represents the average of all the values in the data set. For ungrouped data, it is calculated using the formula,
$\overline{x} = \frac{\Sigma x}{n}$ Where $\overline{x}$ represents the mean, $x$ represents the data, $n$ represents the sample size.

Note: The symbol $\overline{x}$ is read as ‘ $x$ bar’.

For grouped data, to find the mean, use the formula,
$\overline{x} = \frac{\Sigma xf}{\Sigma f}$ Where $\overline{x}$ represents mean, $x$ represents the mid interval or midpoint of a class, $f$ represents the frequency of a class.
$\textbf{\large\textcolor{gray}{Variation}}$ $\textbf{\textcolor{gray}{Range}}$ Range is the difference between the largest number and the smallest number in a data set. It helps us to see the spread of the data.
$\textbf{\textcolor{gray}{Upper Quartile}}$ The upper quartile also known as the $75$ th percentile, is the region under which $75\%$ of the data lies, when arranged in rank order. For ungrouped data, it is calculated using the formula,
$q_{3} = \frac{3}{4}(n + 1)$ Where $q_{3}$ represents the position of the upper quartile and $n$ represents the sample size.

For grouped data, use the formula,
$q_{3} = \frac{3}{4}n$ Where $q_{3}$ represents the position of the upper quartile and $n$ represents the sample size.
$\textbf{\textcolor{gray}{Lower Quartile}}$ The lower quartile also known as the $25$ th percentile, is the region under which $25\%$ of the data lies, when arranged in rank order. For ungrouped data, it is calculated using the formula,
$q_{1} = \frac{1}{4}(n + 1)$ Where $q_{1}$ represents the position of the lower quartile and $n$ represents the sample size.

For grouped data, use the formula,
$q_{1} = \frac{1}{4}n$ Where $q_{1}$ represents the position of the lower quartile and $n$ represents the sample size.

Note: The formulae above, for ungrouped data, for lower and upper quartile, only work given that the sample size of the data set is odd. If the data set is even, consider the lower quartile as the median for the bottom half of the data set and the upper quartile as the median for the upper half of the data set. See example number $1$ .

$\textbf{\textcolor{gray}{Interquartile Range}}$ The interquartile range represents the spread of the data, excluding any outliers (extreme values). This gives a more accurate representation of the spread of the data. To find the interquartile range we use the formula,
$IQR = q_{3} - q_{1}$ Where $IQR$ represents interquartile range, $q_{3}$ represents the upper quartile, $q_{1}$ represents the lower quartile.
$\textbf{\textcolor{gray}{Standard Deviation}}$ Standard deviation tells us the average distance of the data points from the mean.

For ungrouped data, it is calculated using the formula,
$\sigma = \sqrt{\frac{\Sigma x^{2}}{n} - \overline{x}^{2}}$ Where $\sigma$ represents standard deviation, $x$ represents the data, $n$ represents the sample size, $\overline{x}$ represents the mean.

For grouped data, use the formula,
$\sigma = \sqrt{\frac{\Sigma x^{2}f}{\Sigma f} - \overline{x}^{2}}$ Where $\sigma$ represents standard deviation, $x$ represents the mid interval of mid point of a class, $f$ represents the frequency of a class, $n$ represents the sample size, $\overline{x}$ represents the mean.
$\textbf{\textcolor{gray}{Variance}}$ Variance tells us how far each number in the data set is from the mean and from every other number in the data set. Variance is the square of standard deviation, hence the formula for ungrouped data is,
$\sigma^{2} = \frac{\Sigma x^{2}}{n} - \overline{x}^{2}$ Where $\sigma^{2}$ represents variance, $x$ represents the data, $n$ represents the sample size, $\overline{x}$ represents the mean.

For grouped data, use the formula,
$\sigma^{2} = \frac{\Sigma x^{2}f}{\Sigma f} - \overline{x}^{2}$ Where $\sigma^{2}$ represents variance, $x$ represents the mid interval of mid point of a class, $f$ represents the frequency of a class, $n$ represents the sample size, $\overline{x}$ represents the mean.

Let’s look at some past paper questions.

1. Twelve tourists were asked to estimate the height, in mteres, of a new building. Their estimates are as follows. (9709/62/O/N/19 number 1)

110

Find the median and interquartile range for the data.

The first step is to arrange the data in rank order,

110

The median is the middle score. Since the sample size $(12)$ is an even number, there are two middle numbers, we have to find the average of those two numbers,
$q_{2} = \frac{\textcolor{red}{50} + \textcolor{red}{52}}{2}$ $q_{2} = 51$ To find the lower quartile consider the first half of the dataset,

110

The lower quartile is the median of the first half of the dataset. Since there are two middle numbers, we have to find the average of those two numbers,
$q_{1} = \frac{\textcolor{red}{40} + \textcolor{red}{40}}{2}$ $q_{1} = 40$ To find the upper quartile consider the second half of the dataset,

110

The upper quartile is the median of the second half of the dataset. Since there are two middle numbers, we have to find the average of those two numbers,
$q_{3} = \frac{\textcolor{red}{55} + \textcolor{red}{60}}{2}$ $q_{3} = 57.5$ Substitute into the formula for interquartile range,
$IQR = q_{3} - q_{1}$ $IQR = 57.5 - 40$ $IQR = 17.5$ Therefore, the final answer is,
$q_{2} = 51\ \ \ \ \ \ \ \ IQR = 17.5$ 2. The mean and standard deviation of $20$ values of $x$ are $60$ and $4$ respectively. (9709/62/O/N/19 number 1)

(a) Find the values of $\Sigma x$ and $\Sigma x^{2}$ .

Let’s write out all the information we have been given in the stem of the question,
$\overline{x} = \textcolor{#2192ff}{60} \ \ \ \ \ \ \sigma = \textcolor{#0f0}{4} \ \ \ \ n = \textcolor{red}{20}$ To find $\Sigma x$ let’s use the formula for ungrouped mean,
$\overline{x} = \frac{\Sigma x}{n}$ Let’s make $\Sigma x$ the subject of the formula,
$\Sigma x = \overline{x} \times n$ Substitute the values of $\overline{x}$ and $n$ ,
$\Sigma x = \textcolor{#2192ff}{60} \times \textcolor{red}{20}$ $\Sigma x = 1\ 200$ To find $\Sigma x^{2}$ let’s use the formula for $\sigma$ ,
$\sigma = \sqrt{\frac{\Sigma x^{2}}{n} - \overline{x}^{2}}$ Square both sides, to get rid of the square root sign,
$\sigma^{2} = \frac{\Sigma x^{2}}{n} - \overline{x}^{2}$ Make $\Sigma x^{2}$ the subject of the formula,
$\frac{\Sigma x^{2}}{n} = \sigma^{2} + \overline{x}^{2}$ $\Sigma x^{2} = n\left(\sigma^{2} + \overline{x}^{2}\right)$ Substitute the values of $n$ , $\sigma$ and $\overline{x}$ ,
$\Sigma x^{2} = \textcolor{red}{20}\left(\textcolor{#0f0}{4}^{2} + \textcolor{#2192ff}{60}^{2}\right)$ $\Sigma x^{2} = 72\ 320$ Therefore, the final answer is,
$\Sigma x = 1\ 2000 \ \ \ \ \Sigma x^{2} = 72\ 320$ Another $10$ values of $x$ are such that their sum is $550$ and the sum of their squares is $40\ 500$ .
(b) Find the mean and standard deviation of all these $30$ values of $x$ .

Let’s write out all the information we have now,
$n = 30 \ \ \ \ \ \ x_{1} + x_{2} ... + x_{20} = 1\ 200 \ \ \ \ x_{21} + x_{22} ... + x_{30} = 550$ $x_{1}^{2} + x_{2}^{2} ... + x_{20}^{2} = 72\ 320\ \ \ \ x_{21}^{2} + x_{22}^{2} ... + x_{30}^{2} = 40\ 500$ Using the information above we can find the new values of $\Sigma x$ and $\Sigma x^{2}$ ,
$\Sigma x = \left(x_{1} + x_{2} ... + x_{20}\right) + \left(x_{21} + x_{22} ... + x_{30}\right)$ $\Sigma x = 1\ 200 + 550$ $\Sigma x = 1\ 750$
$\Sigma x^{2} = \left(x_{1}^{2} + x_{2}^{2} ... + x_{20}^{2}\right) + \left(x_{21}^{2} + x_{22}^{2} ... + x_{30}^{2}\right)$ $\Sigma x^{2} = 72\ 320 + 40\ 500$ $\Sigma x^{2} = 112\ 820$ Now let’s find the new values of mean and standard deviation. Let’s start with the mean,
$\overline{x} = \frac{\Sigma x}{n}$ Substitute into the formula,
$\overline{x} = \frac{1\ 750}{30}$ $\overline{x} = \frac{175}{3}$ $\overline{x} = 58.3$ Now let’s find the standard deviation,
$\sigma = \sqrt{\frac{\Sigma x^{2}}{n} - \overline{x}^{2}}$ Substitute into the formula,
$\sigma = \sqrt{\frac{112\ 820}{30} - \left(\frac{175}{3}\right)^{2}}$ $\sigma = 18.9179515$ $\sigma = 18.9$ Therefore, the final answer is,
$\overline{x} = 58.3 \ \ \ \ \sigma = 18.9$ 3. The distances, $x$ metres, travelled to school by $140$ children were recorded. The results are summarised in the table below. (9709/52/O/N/21 number 7)

Mid Interval	100	250	400	700	1 050	1 400
Frequency	16	30	42	34	12	6

Calculate estimates of the mean and standard deviation of the distances.

To find the mean, we use the formula for grouped mean,
$\overline{x} = \frac{\Sigma xf}{\Sigma f}$ Let’s substitute into the formula,

$\overline{x} = \frac{100(16) + 250(30) + 400(42) + 700(34) + 1\ 050(12) + 1\ 400(6)}{140}$

$\overline{x} = \frac{70\ 700}{140}$ $\overline{x} = 505$ Now let’s find the standard deviation, considering that our data is grouped,
$\sigma = \sqrt{\frac{\Sigma x^{2}f}{\Sigma f} - \overline{x}^{2}}$
Let’s first find $\Sigma x^{2}f$ , $\Sigma x^{2}f = 100^{2}(16) + 250^{2}(30) + 400^{2}(42) + 700^{2}(34) + 1\ 050^{2}(12) + 1\ 400^{2}(6)$ $\Sigma x^{2}f = 50\ 405\ 000$ Substitute into the formula for grouped standard deviation,
$\sigma = \sqrt{\frac{\Sigma x^{2}f}{\Sigma f} - \overline{x}^{2}}$ $\sigma = \sqrt{\frac{50\ 405\ 000}{140} - (505)^{2}}$ $\sigma = 324$ Therefore, the final answer is,
$\overline{x} = 505 \ \ \ \ \sigma = 324$ Note: If you use the statistical mode, on your calculator, you have to show the values of $\Sigma xf$ , $\Sigma f$ , $\Sigma x^{2}f$ and $\overline{x}$ , before you show the final answers for mean and standard deviation.