5.1.4 Histograms
In this topic we will learn how to:
- draw and interpret histograms
A histogram is used to represent grouped continuous data. However, it does not show all the data points. It consists of bars of different widths joined together. There are no gaps between the bars. This is the difference between a bar chart and a histogram. Data for a histogram is usually displayed in the form of a class (a range of values), with its respective frequency.
We can use that information to help us calculate the information we need to draw a histogram:
- class width
- frequency density
The modal class is the class with the highest frequency i.e the highest bar on the histogram.To calculate the median, we use the formula,
The formula above, gives us the position of the median, which is denoted by and represents the sample size. We can use that number to find the class which contains the median.
Note: Since the data is continuous we cannot find the exact value of the median, but we can find the median class.
To calculate the lower quartile, we use the formula,
Where represents the lower quartile. This gives us the position of the lower quartile. Again since the data is continuous, we can only find the class with the lower quartile. However, we can use that class to find the maximum and minimum values of the lower quartile, denoted by the upper and lower bounds, respectively.
To calculate the upper quartile, we use the formula,
Where represents the upper quartile. This gives us the position of the upper quartile. Again since the data is continuous, we can only find the class with the upper quartile. However, we can use that class to find the maximum and minimum values of the lower quartile, denoted by the upper and lower bounds, respectively.
To calculate the interquartile range, we use the formula,
Where represents the interquartile range, represents the upper quartile, represents the lower quartile.
To calculate the mean when data is displayed in the form of a histogram, we need to first find the mid interval. This is the middle value for each class i.e the midpoint. We use the formula,
Where represents mean, represents the mid-interval, represents the frequency.
To calculate the variance, we use the formula,
Where represents variance, represents the mid interval, represents the frequency, represents mean.
Standard deviation is the square root of variance. Therefore, the formula for standard deviation is,
Where represents standard deviation, represents the mid interval, represents the frequency, represents mean.
Let’s look at some past paper questions.
1. The times taken by players to solve a computer puzzle are summarised in the following table. (9709/51/M/J/21 number 5)
Time (t seconds) | 0 ≤ t ≤ 10 | 10 ≤ t ≤ 20 | 20 ≤ t ≤ 40 | 40 ≤ t ≤ 60 | 60 ≤ t ≤ 100 |
Number of players | 16 | 54 | 78 | 32 | 20 |
(a) Draw a histogram to represent this information.
To be able to draw a histogram, we need to first find the class width and the frequency density,
Class Width | 10 | 10 | 20 | 20 | 40 |
Frequency Density | 1.6 | 5.4 | 3.9 | 1.6 | 0.5 |
Plot the classes on the -axis, ensuring that each bar has the corresponding class width. Then plot the frequency density on the -axis. Label the -axis with the class name ‘Time ( seconds)’. Label the -axis with ‘frequency density’.
(b) Calculate an estimate for the mean time taken by these players.
Time (t seconds) | 0 ≤ t ≤ 10 | 10 ≤ t ≤ 20 | 20 ≤ t ≤ 40 | 40 ≤ t ≤ 60 | 60 ≤ t ≤ 100 |
Number of players | 16 | 54 | 78 | 32 | 20 |
To find the mean, we need to first find the mid intervals,
Mid Interval | 5 | 15 | 30 | 50 | 80 |
Frequency | 16 | 54 | 78 | 32 | 20 |
The formula for calculating mean is,
Substitute into the formula,
Therefore, the final answer is,
(c) Find the greatest possible value of the interquartile range of these times.
Time (t seconds) | 0 ≤ t ≤ 10 | 10 ≤ t ≤ 20 | 20 ≤ t ≤ 40 | 40 ≤ t ≤60 | 60 ≤ t ≤ 100 |
Number of players | 16 | 54 | 78 | 32 | 20 |
The formula for interquartile range is,
To find the greatest possible value of the interquartile range, we need to find the maximum value of the upper quartile and the minimum value of the lower quartile,
When we add up the frequencies, we notice that lies in the class,
Therefore, the maximum value in that class is , so the maximum value of the upper quartile,
Let’s find the minimum value of the lower quartile,
When we add up the frequencies, we notice that lies in the class,
Therefore, the minimum value in that class is , so the minimum value of the lower quartile,
Therefore, the greatest possible value of the interquartile range is,
Therefore, the final answer is,
2. The numbers of chocolate bars sold per day in a cinema over a period of days are summarised in the following table. (9709/51/M/J/20 number 7)
Number of chocolate bars sold | 1 – 10 | 11 – 15 | 16 – 30 | 31 – 50 | 51 – 60 |
Number of days | 18 | 24 | 30 | 20 | 8 |
(a) Draw a histogram to represent this information.
You’ll notice that there are gaps between our classes. If we were to draw a histogram with these classes we would have gaps between our bars, and this would cease to be a histogram. To fix this we have to do continuity correction. For example, if the data is continuous, the number represents any number that lies between and .
To apply this to our classes, subtract from the lower bounds and add to the upper bounds, so that the classes represent the whole range of values,
Number of chocolate bars sold | 0.5 – 10.5 | 10.5 – 15.5 | 15.5 – 30.5 | 30.5 – 50.5 | 50.5 – 60.5 |
Number of days | 18 | 24 | 30 | 20 | 8 |
Now let’s use the classes after continuity correction to find the class width and frequency density,
Class Width | 10 | 5 | 15 | 20 | 10 |
Frequency Density | 1.8 | 4.8 | 2.0 | 1.0 | 0.8 |
Plot the classes on the -axis, ensuring that each bar has the corresponding class width. Then plot the frequency density on the -axis. Label the -axis with the class name ‘Number of chocolate bars sold’. Label the -axis with ‘frequency density’.
(b) What is the greatest possible value of the interquartile data?
Number of chocolate bars sold | 1 – 10 | 11 – 15 | 16 – 30 | 31 – 50 | 51 – 60 |
Number of days | 18 | 24 | 30 | 20 | 8 |
The formula for interquartile range is,
To find the greatest possible value of the interquartile range, we need to find the maximum value of the upper quartile and the minimum value of the lower quartile,
When we add up the frequencies, we notice that lies in the class,
Therefore, the maximum value in that class is , so the maximum value of the upper quartile,
Let’s find the minimum value of the lower quartile,
When we add up the frequencies, we notice that lies in the class,
Therefore, the minimum value in that class is , so the minimum value of the lower quartile,
Therefore, the greatest possible value of the interquartile range is,
(c) Calculate estimates of the mean and standard deviation of the number of chocolate bars sold.
Number of chocolate bars sold | 1 – 10 | 11 – 15 | 16 – 30 | 31 – 50 | 51 – 60 |
Number of days | 18 | 24 | 30 | 20 | 8 |
To find the mean, we need to first find the mid interval,
Mid Interval | 5.5 | 13 | 23 | 40.5 | 55.5 |
Number of players | 18 | 24 | 30 | 20 | 8 |
The formula for calculating mean is,
Substitute into the formula,
Therefore, the mean is,
The formula for standard deviation is,
Let’s start by finding ,
Let’s substitute into the formula,
Therefore, the final answer is,