Developing box plots while navigating the maze of data representations.
Duncan, Bruce ; Fitzallen, Noleine
The learning sequence described in this article was developed to
provide students with a demonstration of the development of box plots
from authentic data as an illustration of the advantages gained from
using multiple forms of data representation. The sequence follows an
authentic process that starts with a problem to which data
representations provide the solution. The advantage of using box plots
is that they allow clear and efficient comparison of related data sets.
In this case, students are given a maze on paper and timed while they
complete it. This produces the first set of data. They then attempt the
maze again, expecting that their time to do this will decrease. The need
to compare these two data sets arises from the question, "Did the
group improve their maze times on their second attempt?"
Background
The use of graphs in the mathematics classroom is first introduced
in the Australian Curriculum: Mathematics in Year 2, when students are
expected to create and interpret picture graphs. Column graphs are
introduced in Year 3, dot plots in Year 5, stem-and-leaf displays in
Year 7, histograms in Year 9, and box plots and scatter plots in Year 10
(Australian Curriculum, Assessment and Reporting Authority [ACARA],
2013). The intention is that the introduction of different graphical
representations be developmental and cumulative.
The thinking and reasoning required to interpret the different
graphical representations increases as students progress through the
compulsory years of schooling. Once a new graph type is introduced the
expectation is that it will be used in future years, even though it may
not be named explicitly in the content descriptions of the curriculum
for the proceeding years. There is, however, very little attention given
to the need to assist students to make connections among the various
graphical representations in the curriculum.
The connections among the "range of graph types" and
"multiple representations" are not acknowledged in the
curriculum until Year 10 when students are expected to "Compare
shapes of box plots to corresponding histograms and dot plots"
(ACARA, 2013, p. 71). Year 10 graphing activities often include
generating box plots from histograms. To be able to do this confidently
it would be beneficial to provide students with the opportunity to
establish an understanding of the relationship between different
graphical representations earlier on in the curriculum and in contexts
in which the purpose for representing data in different ways is made
clear. It is appropriate to do so because younger students have
demonstrated the ability to create and interpret scatter plots and box
plots long before they are formally introduced in the curriculum (e.g.,
Cobb, McClain & Gravemeijer, 2003; Fitzallen, 2012; Ozgun &
Edwards, 2013).
The benefits of using box plots and scatter plots in classrooms
prior to Year 10 are that students have the time to develop exploratory
data analysis strategies and fundamental intuitions about working with
data before focusing on the formal statistical interpretation of data
using correlation coefficients for scatter plots and quartiles for box
plots. Likewise, providing students with the opportunity to develop an
understanding of the relationship between different graphical
representations before Year 10 is beneficial.
Box plots
A box plot summarises a data set, locates the median, displays the
spread and skewness of the data, as well as identifies the outliers, but
does not display the overall distribution of the data (Friel, Curcio
& Bright; 2001). A box is comprised of the interquartile range
(IQR), which represents the middle 50% of the data. The IQR extends from
the first quartile to the third quartile. Figure 1 shows that the IQR is
divided by the median. The range of the left hand side of the IQR is
smaller than the range of the right hand side of the IQR. This poses
problems for some students because they find it difficult to understand
that although there is the same number of data points represented in
each section of a box plot, the size of each section is dependent on the
density or spread of the data (Bakker, Biehler & Konold, 2005). This
means that the data in the left hand side of the IQR in Figure 1 are
closer together than in the right hand side of the IQR. Attached to the
box on the left hand side and extending from the first quartile to the
minimum value of the data set is a whisker, which represents the lower
25% of the data set. Another whisker is attached to the right hand side
of the box. That whisker represents the upper 25% of the data set and
extends from the third quartile to the maximum value. The whiskers obey
the same principles of density and distribution as the box.
[FIGURE 1 OMITTED]
Although individual box plots are useful, box plots were developed
for comparing multiple data sets (Tukey, 1977). Direct comparison of
several data sets or subsets of data can be conducted efficiently by
analysing the box plots displayed in parallel, as can be seen in Figure
2, which displays the data for the body weight of students from Years 1,
3, 5, and 7.
TinkerPlots: Dynamic Data Exploration (Konold & Miller, 2011)
is a statistical software program that students can use easily to
generate box plots (Watson, Fitzallen, Wilson & Creed, 2008).
Another useful technology tool for generating box plots is the CAS
calculator, such as the TI-Nspire (Ozgun & Edwards, 2013). The
advantage of using the displays generated by these two options or
similar technology innovations is that the data points can be displayed
in conjunction with the box plot representation (Figure 3). Such
displays allow students to see the direct connections among the
distribution of the data and the corresponding parts of the box plot,
thereby making links between the two graphical representations (Watson
et al.). This enhances the opportunity for students to develop
understanding of the purpose of each type of graphical representation as
they interpret and make sense of the displays.
[FIGURE 2 OMITTED]
Stem-and-leaf displays
The stem-and-leaf display is an alternative to tallying values into
frequency distributions. It organises a batch of numbers graphically and
directs attention to various features of the data. It displays a
distribution of a variable with the digits themselves making up the
leaves of the display. The interval widths are displayed on a contracted
number line, which makes up the stem of the display. Usually displayed
vertically, it resembles a horizontal stacked dot plot (Figure 4). The
development of stem-and-leaf displays should be understood as a way of
representing the characteristics of the data set, while maintaining the
identity of each datum. Groups are conserved and frequencies are clearly
represented in the stem-and-leaf display, which can be seen as a
sophisticated variation of the stacked dot plot.
[FIGURE 3 OMITTED]
[FIGURE 4 OMITTED]
[FIGURE 5 OMITTED]
Stacked dot plots, like in Figure 3, provide a representation of
frequency distribution that can be easily described. Because each datum
is represented in relation to each other, although not explicitly, the
characteristics of the data set are revealed. The distribution of two
data sets can also be compared when displayed as a back-to-back
stem-and-leaf display. This is demonstrated in Figure 5, which displays
students' pulse rates before and after undertaking some exercise.
Generating box plots from stem-and-leaf displays: The maze
investigation
The Maze Investigation is an activity that provides students with
the opportunity to answer the question: "Do people complete mazes
faster the second time around?" To be able to answer the question
there is a need to have two data sets to see if maze completion times
improve if completed twice. The activity is run twice with students
recording the time it takes for Trail 1 and Trial 2. The maze used to
collect the data presented in this article can be downloaded from
www.printablemazes.net (Figure 6).
[FIGURE 6 OMITTED]
Following social-constructivist pedagogy (Simon, 1995), the
potential to develop students' understanding is increased when they
themselves are required to determine the method by which the problem
should be solved. For this activity, carefully scaffolded discussion can
guide students from the raw data, through the process of analysis and
representation, to the final representation that allows effective
comparison between data sets. As each representation is developed, the
discussion identifies the advantages gained by each progressive
representation as a response to the question "Do people complete
mazes faster the second time around?" is formulated. At the same
time, the corresponding disadvantages that come from simplifying the
representation should also be made explicit. The following activity
sequence outlines the activity process and teaching opportunities that
arise. The data for the worked example were generated by a class of
adult learners.
Activity sequence Description Teaching opportunities
1. Posing the "Do people complete The process of data
problem and mazes faster the collection and
identifying the second time around?" representation is
question to be shown to have an
answered. authentic purpose.
Medium Mazes Set 5:
Run-of-the-Mill
(www.printablemazes.
net)
2. The event. Every student The teacher may need
receives a copy of to establish an
the maze, face down, upper limit for the
and is instructed to duration of this
turn the paper over task, by which time
and attempt the maze some students may
when the teacher not have finished.
says, "Go." The Stopwatch (www.
teacher starts a online-stopwatch.com
stopwatch on a data /large-stopwatch/)
projector that all
students can see and
they attempt to
complete the maze by
drawing a path from
start to finish
without crossing any
lines. When students
finish they record
the time on the
stopwatch as the
duration of their
attempts.
3. Raw data. The time taken for The need to organise
each student to data can be made
complete the maze is clear by first
collected on a board collecting data from
at the front of the students in a random
classroom. order, such as
Initially, these "around the room."
data are collected
in a random order to
produce a list.
4. Ordering data. Students asked to The advantage in
consider, "How can ordering data can be
we make these data made clear to
easier to read?" and students by
"How can we describe scaffolding
this set of discussion about
results?" organising the data.
5. Grouping. Description be Teaching
grouped and then opportunities from a
group the data possibly continuous
according to a range of
strategy selected by measurements and
the class, which that it therefore
becomes the stem of makes sense to speak
the stem-and-leaf of the frequency of
display. outcomes within
specified intervals
(grouped data)
rather than the
frequency of
occurrence of
particular
measurements.
6. Stem-and An appropriate scale Now the purpose of
leaf-displays. is determined by organising the data
discussion and drawn can be made clear
on the board and the through discussions
data are recorded. that attempt to
describe the data
set by asking
questions such as
"What can we say
about the data?" The
data are analysed,
organised, and
represented in
different ways to
identify the range,
any skewed
distribution, and
central tendency.
The focus now shifts
from students
identifying their
individual
information to
looking more broadly
at the data from the
whole group.
7. The second The maze activity Discussion should
event. (step 2) is repeated elicit the
with the same maze expectation that
and times recorded. durations to
complete the maze
the second time
around may become
shorter. This
comparison can be
discussed informally
after the data have
been collected but
before the data are
organised so that
the data are seen to
confirm an
explanation.
9. Organising the Students organise This process is a
second set of data from Trial 2 repetition of the
data. into a back-to-back process undertaken
stem-and-leaf on the first data
display with the set. The opportunity
data from Trial 1. exists, therefore,
to allow students to
carry out this
process with greater
independence from
the teacher. In the
example the data
shows a very
dramatic improvement
in times, one that
would be obvious
from the raw data. A
more challenging
maze or a younger
group of students
may produce data
that are less
markedly different.
10. Comparing Description does Teaching
data sets: this representation opportunities
Representations help us answer our comparing data sets
with a shared question? Are the on a common scale.
scale in a second times faster? Once again the
back-to-back Why do you say discussion should be
stem-and-leaf that?" guided by the
display. purpose so a good,
guiding question
here is, "How can we
compare your maze
completion time from
Trial 1 with the
completion time in
Trial 2?" Discussion
includes the
comparison of the
characteristics of
each data
set--range, skew,
central tendency.
11. Medians and Students discuss Establishment of
quartiles. "What is the middle these features
score?" or "What pre-empts the box
score divides this plots but the
group in half?" discussion must
focus students'
understanding on
these terms as
characteristics of
the population, not
the range. Once
students understand
that the median is
determined by
considering the
number of scores in
order, rather than
the value of each
score, the concept
of quartiles,
dividing the
population into four
equal sized groups,
follows as a natural
progression.
12. Box plots. Students identify Box plots can be
the five points on seen as simplified
the stem-and-leaf stem-and-leaf
display (minimum, displays. Although
first quartile, the detail of each
median, third datum is lost, the
quartile, maximum) simplification of
and mark against the this representation
same scale to create allows the data set
the box plot. to occupy less space
and, therefore,
makes box plots
appropriate for the
purpose of
comparison.
13. Answering the "Do people complete Comparison of the
question. mazes faster the two box plots shows
second time around?" that the
Attention can then interquartile ranges
be given to thinking do not overlap,
about the informal therefore, the claim
inferences that can can be made that the
be made from the people in the group
data, asking "Do you were faster the
think another group second time round.
of students would Note that the first
get the same result? quartile and the
Can we claim that median in Trial 2
students always fall at the same
complete Trial 2 point on the
quicker than Trial vertical scale. That
1? results in an
unconventional
looking box
(interquartile
range). Anomalies
such as this arise
when using real life
data and present the
opportunity to
discuss why the
representation looks
different to what
was expected.
Conclusion
By using a problem as a context for developing data representations
the process is seen to be authentic. Maintaining students'
involvement in that process by asking questions such as "How can we
make this clearer?" illustrates not only the construction of the
graphical representations but also the application of the properties of
those representations. However, data collected from real life situations
do not always result in a perfect example of the graphical
representation developed. Although more challenging for teachers, it is
worthwhile students exploring those data sets to develop the skills
needed to be able to think flexibly when interpreting graphs. Although
using contrived data sets that behave in a particular way may result in
graphical representations that are simpler to explain, collecting data
generated from an activity contributes to the authenticity of the
learning experience.
References
Australian Curriculum, Assessment and Reporting Authority (ACARA).
(2013). The Australian Curriculum: Mathematics, Version 5.1, Monday,
August 5, 2013. Retrieved from http://www.
australiancurriculum.edu.au/Mathematics/Curriculum/F-10
Bakker, A., Biehler, R. & Konold, C. (2005). Should young
students learn about box plots? In G. Burrill & M. Camden (Eds),
Curricular development in statistics education: International
Association for Education (IASE) Roundtable (pp. 163-173). The
Netherlands: International Statistics Institute.
Cobb, P., McClain, K., & Gravemeijer, K. (2003). Learning about
statistical covariation. Cognition and Instruction, 21(1), 1-78.
Fitzallen, N. (2012). Students reasoning about covariation. In J.
Dindyal, L. P. Cheng & S. F. Ng (Eds), Mathematics education:
Expanding horizons (Proceedings of the 35th annual conference of the
Mathematics Education Research Group of Australasia, Singapore). Sydney:
MERGA.
Friel, S. N., Curcio, F. R. & Bright, G. W. (2001). Making
sense of graphs: Critical factors influencing comprehension and
instructional implications. Journal for Research in Mathematics
Education, 32, 124-158.
Konold, C. & Miller, C. (2012). TinkerPlots: Dynamic data
exploration, Version 2 [software]. Emeryville, CA: Key Curriculum Press.
Ozgun, S. A. & Edwards, T. G. (2013). Interpreting box plots
with multiple linked representations. Mathematics Teaching in the Middle
School, 18(8), 508-511.
Simon, M. A. (1995). Reconstructing mathematics pedagogy from a
constructivist perspective. Journal for Research in Mathematics
Education, 26(2), 114-145.
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA:
Addison-Wesley Publishing Company.
Watson, J. M., Fitzallen, N. E., Wilson, K. G. & Creed, J. F.
(2008). The representational value of hats. Mathematics Teaching in the
Middle School, 14(1), 4-10.
Bruce Duncan
University of Tasmania
<
[email protected]>
Noleine Fitzallen
University of Tasmania
<
[email protected]>