5.1.6 Measures of Central Tendency and Variation

In this topic we will learn how to:

  • understand and use different measures of central tendency (mean, mode and median) and variation (range, interquartile range, standard deviation)
  • calculate and use mean and standard deviation of a set of data (including grouped data) either from the data itself or from given totals Σx\Sigma x and Σx2\Sigma x^{2}

Measures of Central Tendency\textbf{\large\textcolor{gray}{Measures of Central Tendency}}Mode\textbf{\textcolor{gray}{Mode}}Mode represents the most frequent score. In some cases, data may be bimodal. Meaning that it has two modes. In cases where there are more than two modes, the mode does not give us any useful information and another measure of central tendency must be used to interpret the data.

For grouped data, we cannot find the mode since the data is grouped. Instead, we find the modal class. The modal class is the most frequent class.Median\textbf{\textcolor{gray}{Median}}Median represents the middle score when the data is arranged in rank order. For ungrouped data, to find the median, use the formula,
q2=n+12q_{2} = \frac{n + 1}{2}Where q2q_{2} represents the median position and nn represents the sample size.

For grouped data, to find the median, use the formula,
q2=12nq_{2} = \frac{1}{2}nWhere q2q_{2} represents the median position and nn represents the sample size.
Mean\textbf{\textcolor{gray}{Mean}}The mean represents the average of all the values in the data set. For ungrouped data, it is calculated using the formula,
x=Σxn\overline{x} = \frac{\Sigma x}{n}Where x\overline{x} represents the mean, xx represents the data, nn represents the sample size.

Note: The symbol x\overline{x} is read as ‘xx bar’.

For grouped data, to find the mean, use the formula,
x=ΣxfΣf\overline{x} = \frac{\Sigma xf}{\Sigma f}Where x\overline{x} represents mean, xx represents the mid interval or midpoint of a class, ff represents the frequency of a class.
Variation\textbf{\large\textcolor{gray}{Variation}}Range\textbf{\textcolor{gray}{Range}}Range is the difference between the largest number and the smallest number in a data set. It helps us to see the spread of the data.
Upper Quartile\textbf{\textcolor{gray}{Upper Quartile}}The upper quartile also known as the 7575th percentile, is the region under which 75%75\% of the data lies, when arranged in rank order. For ungrouped data, it is calculated using the formula,
q3=34(n+1)q_{3} = \frac{3}{4}(n + 1)Where q3q_{3} represents the position of the upper quartile and nn represents the sample size.

For grouped data, use the formula,
q3=34nq_{3} = \frac{3}{4}nWhere q3q_{3} represents the position of the upper quartile and nn represents the sample size.
Lower Quartile\textbf{\textcolor{gray}{Lower Quartile}}The lower quartile also known as the 2525th percentile, is the region under which 25%25\% of the data lies, when arranged in rank order. For ungrouped data, it is calculated using the formula,
q1=14(n+1)q_{1} = \frac{1}{4}(n + 1)Where q1q_{1} represents the position of the lower quartile and nn represents the sample size.

For grouped data, use the formula,
q1=14nq_{1} = \frac{1}{4}nWhere q1q_{1} represents the position of the lower quartile and nn represents the sample size.

Note: The formulae above, for ungrouped data, for lower and upper quartile, only work given that the sample size of the data set is odd. If the data set is even, consider the lower quartile as the median for the bottom half of the data set and the upper quartile as the median for the upper half of the data set. See example number 11.

Interquartile Range\textbf{\textcolor{gray}{Interquartile Range}}The interquartile range represents the spread of the data, excluding any outliers (extreme values). This gives a more accurate representation of the spread of the data. To find the interquartile range we use the formula,
IQR=q3q1IQR = q_{3} - q_{1}Where IQRIQR represents interquartile range, q3q_{3} represents the upper quartile, q1q_{1} represents the lower quartile.
Standard Deviation\textbf{\textcolor{gray}{Standard Deviation}}Standard deviation tells us the average distance of the data points from the mean.

For ungrouped data, it is calculated using the formula,
σ=Σx2nx2\sigma = \sqrt{\frac{\Sigma x^{2}}{n} - \overline{x}^{2}}Where σ\sigma represents standard deviation, xx represents the data, nn represents the sample size, x\overline{x} represents the mean.

For grouped data, use the formula,
σ=Σx2fΣfx2\sigma = \sqrt{\frac{\Sigma x^{2}f}{\Sigma f} - \overline{x}^{2}}Where σ\sigma represents standard deviation, xx represents the mid interval of mid point of a class, ff represents the frequency of a class, nn represents the sample size, x\overline{x} represents the mean.
Variance\textbf{\textcolor{gray}{Variance}}Variance tells us how far each number in the data set is from the mean and from every other number in the data set. Variance is the square of standard deviation, hence the formula for ungrouped data is,
σ2=Σx2nx2\sigma^{2} = \frac{\Sigma x^{2}}{n} - \overline{x}^{2}Where σ2\sigma^{2} represents variance, xx represents the data, nn represents the sample size, x\overline{x} represents the mean.

For grouped data, use the formula,
σ2=Σx2fΣfx2\sigma^{2} = \frac{\Sigma x^{2}f}{\Sigma f} - \overline{x}^{2}Where σ2\sigma^{2} represents variance, xx represents the mid interval of mid point of a class, ff represents the frequency of a class, nn represents the sample size, x\overline{x} represents the mean.

Let’s look at some past paper questions.

1. Twelve tourists were asked to estimate the height, in mteres, of a new building. Their estimates are as follows. (9709/62/O/N/19 number 1)

5045623040551103852605540

Find the median and interquartile range for the data.

The first step is to arrange the data in rank order,

3038404045505255556062110

The median is the middle score. Since the sample size (12)(12) is an even number, there are two middle numbers, we have to find the average of those two numbers,
q2=50+522q_{2} = \frac{\textcolor{red}{50} + \textcolor{red}{52}}{2}q2=51q_{2} = 51To find the lower quartile consider the first half of the dataset,

3038404045505255556062110

The lower quartile is the median of the first half of the dataset. Since there are two middle numbers, we have to find the average of those two numbers,
q1=40+402q_{1} = \frac{\textcolor{red}{40} + \textcolor{red}{40}}{2}q1=40q_{1} = 40To find the upper quartile consider the second half of the dataset,

3038404045505255556062110

The upper quartile is the median of the second half of the dataset. Since there are two middle numbers, we have to find the average of those two numbers,
q3=55+602q_{3} = \frac{\textcolor{red}{55} + \textcolor{red}{60}}{2}q3=57.5q_{3} = 57.5Substitute into the formula for interquartile range,
IQR=q3q1IQR = q_{3} - q_{1}IQR=57.540IQR = 57.5 - 40IQR=17.5IQR = 17.5Therefore, the final answer is,
q2=51        IQR=17.5q_{2} = 51\ \ \ \ \ \ \ \ IQR = 17.52. The mean and standard deviation of 2020 values of xx are 6060 and 44 respectively. (9709/62/O/N/19 number 1)

(a) Find the values of Σx\Sigma x and Σx2\Sigma x^{2}.

Let’s write out all the information we have been given in the stem of the question,
x=60      σ=4    n=20\overline{x} = \textcolor{#2192ff}{60} \ \ \ \ \ \ \sigma = \textcolor{#0f0}{4} \ \ \ \ n = \textcolor{red}{20}To find Σx\Sigma x let’s use the formula for ungrouped mean,
x=Σxn\overline{x} = \frac{\Sigma x}{n}Let’s make Σx\Sigma x the subject of the formula,
Σx=x×n\Sigma x = \overline{x} \times nSubstitute the values of x\overline{x} and nn,
Σx=60×20\Sigma x = \textcolor{#2192ff}{60} \times \textcolor{red}{20}Σx=1 200\Sigma x = 1\ 200To find Σx2\Sigma x^{2} let’s use the formula for σ\sigma,
σ=Σx2nx2\sigma = \sqrt{\frac{\Sigma x^{2}}{n} - \overline{x}^{2}}Square both sides, to get rid of the square root sign,
σ2=Σx2nx2\sigma^{2} = \frac{\Sigma x^{2}}{n} - \overline{x}^{2}Make Σx2\Sigma x^{2} the subject of the formula,
Σx2n=σ2+x2\frac{\Sigma x^{2}}{n} = \sigma^{2} + \overline{x}^{2}Σx2=n(σ2+x2)\Sigma x^{2} = n\left(\sigma^{2} + \overline{x}^{2}\right)Substitute the values of nn, σ\sigma and x\overline{x},
Σx2=20(42+602)\Sigma x^{2} = \textcolor{red}{20}\left(\textcolor{#0f0}{4}^{2} + \textcolor{#2192ff}{60}^{2}\right)Σx2=72 320\Sigma x^{2} = 72\ 320Therefore, the final answer is,
Σx=1 2000    Σx2=72 320\Sigma x = 1\ 2000 \ \ \ \ \Sigma x^{2} = 72\ 320Another 1010 values of xx are such that their sum is 550550 and the sum of their squares is 40 50040\ 500.
(b) Find the mean and standard deviation of all these 3030 values of xx.

Let’s write out all the information we have now,
n=30      x1+x2...+x20=1 200    x21+x22...+x30=550n = 30 \ \ \ \ \ \ x_{1} + x_{2} ... + x_{20} = 1\ 200 \ \ \ \ x_{21} + x_{22} ... + x_{30} = 550x12+x22...+x202=72 320    x212+x222...+x302=40 500x_{1}^{2} + x_{2}^{2} ... + x_{20}^{2} = 72\ 320\ \ \ \ x_{21}^{2} + x_{22}^{2} ... + x_{30}^{2} = 40\ 500Using the information above we can find the new values of Σx\Sigma x and Σx2\Sigma x^{2},
Σx=(x1+x2...+x20)+(x21+x22...+x30)\Sigma x = \left(x_{1} + x_{2} ... + x_{20}\right) + \left(x_{21} + x_{22} ... + x_{30}\right)Σx=1 200+550\Sigma x = 1\ 200 + 550Σx=1 750\Sigma x = 1\ 750
Σx2=(x12+x22...+x202)+(x212+x222...+x302)\Sigma x^{2} = \left(x_{1}^{2} + x_{2}^{2} ... + x_{20}^{2}\right) + \left(x_{21}^{2} + x_{22}^{2} ... + x_{30}^{2}\right)Σx2=72 320+40 500\Sigma x^{2} = 72\ 320 + 40\ 500Σx2=112 820\Sigma x^{2} = 112\ 820Now let’s find the new values of mean and standard deviation. Let’s start with the mean,
x=Σxn\overline{x} = \frac{\Sigma x}{n}Substitute into the formula,
x=1 75030\overline{x} = \frac{1\ 750}{30}x=1753\overline{x} = \frac{175}{3}x=58.3\overline{x} = 58.3Now let’s find the standard deviation,
σ=Σx2nx2\sigma = \sqrt{\frac{\Sigma x^{2}}{n} - \overline{x}^{2}}Substitute into the formula,
σ=112 82030(1753)2\sigma = \sqrt{\frac{112\ 820}{30} - \left(\frac{175}{3}\right)^{2}}σ=18.9179515\sigma = 18.9179515σ=18.9\sigma = 18.9Therefore, the final answer is,
x=58.3    σ=18.9\overline{x} = 58.3 \ \ \ \ \sigma = 18.93. The distances, xx metres, travelled to school by 140140 children were recorded. The results are summarised in the table below. (9709/52/O/N/21 number 7)

Mid Interval
100
250
400
700
1 050
1 400
Frequency
16
30
42
34
12
6

Calculate estimates of the mean and standard deviation of the distances.

To find the mean, we use the formula for grouped mean,
x=ΣxfΣf\overline{x} = \frac{\Sigma xf}{\Sigma f}Let’s substitute into the formula,

x=100(16)+250(30)+400(42)+700(34)+1 050(12)+1 400(6)140\overline{x} = \frac{100(16) + 250(30) + 400(42) + 700(34) + 1\ 050(12) + 1\ 400(6)}{140}

x=70 700140\overline{x} = \frac{70\ 700}{140}x=505\overline{x} = 505Now let’s find the standard deviation, considering that our data is grouped,
σ=Σx2fΣfx2\sigma = \sqrt{\frac{\Sigma x^{2}f}{\Sigma f} - \overline{x}^{2}}
Let’s first find Σx2f\Sigma x^{2}f,Σx2f=1002(16)+2502(30)+4002(42)+7002(34)+1 0502(12)+1 4002(6)\Sigma x^{2}f = 100^{2}(16) + 250^{2}(30) + 400^{2}(42) + 700^{2}(34) + 1\ 050^{2}(12) + 1\ 400^{2}(6)Σx2f=50 405 000\Sigma x^{2}f = 50\ 405\ 000Substitute into the formula for grouped standard deviation,
σ=Σx2fΣfx2\sigma = \sqrt{\frac{\Sigma x^{2}f}{\Sigma f} - \overline{x}^{2}}σ=50 405 000140(505)2\sigma = \sqrt{\frac{50\ 405\ 000}{140} - (505)^{2}}σ=324\sigma = 324Therefore, the final answer is,
x=505    σ=324\overline{x} = 505 \ \ \ \ \sigma = 324Note: If you use the statistical mode, on your calculator, you have to show the values of Σxf\Sigma xf, Σf\Sigma f, Σx2f\Sigma x^{2}f and x\overline{x}, before you show the final answers for mean and standard deviation.