5.1.1 Advantages and Disadvantages of Representation of Data

In this topic we will learn how to:

  • select a suitable way of presenting raw data, and discuss the advantages and/or disadvantages that particular representations may have

We will learn how to represent data using a stem-and-leaf diagrams, box-and-whisker plots, histograms and cumulative frequency graphs. We will also learn to calculate measures of central tendency and variation. We’re going to explore the advantages and disadvantages of particular representations and statistical data.
\textbf{\large\textcolor{gray}{Stem and Leaf Diagram}}\textbf{\textcolor{gray}{Advantages}}

  • It shows all of the original data
  • It shows the shape of the distribution i.e skew
  • The mode, median and quartiles can be found from the diagram
  • It is useful for comparing two sets of data

\textbf{\textcolor{gray}{Disadvantages}}

  • It is not suitable for large amounts of data

\textbf{\large\textcolor{gray}{Box and Whisker Plot}}\textbf{\textcolor{gray}{Advantages}}

  • It is easy to see whether the distribution is symmetrical or whether there is a tail to the left or right
  • It can be used to investigate extreme values (outliers)
  • It is easy to see the range and interquartile range
  • You can compare two or more sets of data by drawing on the same diagram

\textbf{\textcolor{gray}{Disadvantages}}

  • It does not show frequencies
  • It only shows particular values of the data

\textbf{\large\textcolor{gray}{Histogram}}\textbf{\textcolor{gray}{Advantages}}

  • It can represent groups of different widths
  • It shows whether the distribution is symmetrical or skew
  • The mean and standard deviation can be estimated from the histogram
\textbf{\textcolor{gray}{Disadvantages}}
  • The visual impact can be altered by using different scales

\textbf{\large\textcolor{gray}{Cumulative Frequency Graph}}\textbf{\textcolor{gray}{Advantages}}

  • The median and quartiles can be estimated from the graph
  • Sets of data can be compared by drawing graphs on the same diagram
\textbf{\textcolor{gray}{Disadvantages}}
  • The visual impact can be altered by using different scales

\textbf{\Large\textcolor{gray}{Measures of Central Tendency}}\textbf{\large\textcolor{gray}{Mean}}\textbf{\textcolor{gray}{Advantages}}

  • It is calculated using all the data so it represents all the items
  • It is calculated using a mathematical formula so calculators can be programmed to find it
  • It is extremely useful for further analysis
\textbf{\textcolor{gray}{Disadvantages}}
  • It can be unduly affected by one or two extreme values

\textbf{\large\textcolor{gray}{Mode}}\textbf{\textcolor{gray}{Advantages}}

  • Useful when the most popular category is required, e.g clothes or shoe sizes
\textbf{\textcolor{gray}{Disadvantages}}
  • Not very useful for small data sets, or when there are more than two modes
  • There may not be a mode
  • It may not be representative, e.g. it could be the lowest value
  • Modal class depends on the grouping of the data
  • It is not useful for further analysis

\textbf{\large\textcolor{gray}{Median}}\textbf{\textcolor{gray}{Advantages}}

  • It is not affected by extreme values
  • It can be found as soon as a middle value is known
\textbf{\textcolor{gray}{Disadvantages}}
  • It does not use the whole data set
  • It is not useful for further analysis

\textbf{\Large\textcolor{gray}{Variation}}\textbf{\large\textcolor{gray}{Range}}\textbf{\textcolor{gray}{Advantages}}

  • It is easy to calculate
  • It represents the complete spread of the data
\textbf{\textcolor{gray}{Disadvantages}}
  • It is affected by extreme values

\textbf{\large\textcolor{gray}{Interquartile Range}}\textbf{\textcolor{gray}{Advantages}}

  • It is not unduly influenced by extreme values
  • It can be used to investigate extreme values
\textbf{\textcolor{gray}{Disadvantages}}
  • It depends only on particular values when the data is ranked

\textbf{\large\textcolor{gray}{Standard Deviation}}\textbf{\textcolor{gray}{Advantages}}

  • It is calculated using all the data and so represents every item
  • It is calculated using a mathematical formula so calculators can be programmed to find it
  • It is very useful for further analysis
  • It is useful in comparing two sets of data, for example by showing which is more consistent
\textbf{\textcolor{gray}{Disadvantages}}
  • It can be unduly affected by one or two extreme values
  • For a single set of data its value is difficult to interpret

Let’s look at some past paper questions.

1. Twenty children were asked to estimate the height of a particular tree. Their estimates, in metres, were as follows. (9709/53/M/J/22 number 2)

4.1
4.2
4.4
4.5
4.6
4.8
5.0
5.2
5.3
5.4
5.5
5.8
6.0
6.2
6.3
6.4
6.6
6.8
6.9
19.4

It is given that the mean is 6.17 and the median is 5.45. Give a reason why the median is likely to be more suitable than the mean as a measure of the central tendency for this information.

Since we have a value that appears anomalous (does not follow the trend), 19.4, the mean will be inflated due to this value. However, this extreme value has no effect on the median.

2. Twelve tourists were asked to estimate the height, in metres, of a new building. Their estimates were as follows. (9709/62/O/N/19 number 1)

5045623040551103852605540

Give a disadvantage of using the mean as a measure of central tendency in this case.

The mean will be unduly affected by the extreme value, 110.

3. The heights, in cm, of the 11 basketball players in each of two clubs, the Amazons and the Giants, are shown below. (9709/52/M/J/21 number 7)

Amazons
205
198
181
182
190
215
201
178
202
196
184
Giants
175
182
184
187
189
192
193
195
195
195
204

State an advantage of using a stem-and-leaf diagram compared to a box-and-whisker plot to illustrate this information.

The stem-and-leaf diagram includes all the data, whereas the box-and-whisker plot does not.