Box 1. Sample instructor guide for a figure used in the FotD
activity. The guide shown here is for the cumulative
population figure seen in Figure 1A, B (source: https://
Instructor Guide: Living Population Compared to
• x-axis: Population in billions.
• y-axis: Year.
• Green line labeled “Cumulative births” shows births since
• Yellow line labeled “Living population” shows currently
• Green line increases with a higher slope than yellow line.
• Both lines become exponential at ~1900.
• Green line increases the whole time.
• Yellow line starts level, then increases.
Possible Questions to Ask
• What does the author of this figure want to communicate?
• Is the author effective in communicating this?
• Why the break in the graph before year 1 A.D.?
Break is to show scale of years better.
• Why does the green line increase so much during that time?
Green line increases because many years are compressed
into that small space.
• Why don’t the yellow line and the green line appear to
increase at the same rate?
• Could this graph be represented differently? (Bars/pie?
Scatterplot? No shading?)
We used two simple assessments to (1) gauge the impact of the
treatment and control activities on students’ figure creation abilities,
and (2) solicit students’ perceptions of the control and treatment
To measure the impact of the FotD activities on students’ figure
creation abilities, students completed a pre- and post-FotD activity
figure-drawing task that read as follows:
Create a figure that best represents the following:
• The Fictus Fish lives in the ocean at a depth between 4 and 10
ft; a single individual has an average of 15 offspring every year.
• The GelCap Jellyfish lives in the ocean at a depth between 8
and 15 ft; a single individual has an average of 300 offspring
• The Mountain Whale lives in the ocean at a depth between 15 and
50 ft; a single individual has an average of 2 offspring every year.
• The 7-Point Starfish lives in the ocean at a depth between 2 and 30 ft;
a single individual has an average of 1000 offspring every year.
This task was given to students before and after the six-week figure-
set implementation. Students did not receive feedback on their pre-
FotD assessment figures. Thus, their decision to make an identical
or new version of their drawing for the post-assessment was inde-
pendent of whether they thought their first figure was correct or
incorrect. This question required that students represent three var-
iables (a fictitious marine species, its reproductive output, and the
depth at which it can be found) on a single figure. Student figures
were scored in one of seven categories (Table 1). The rubric incor-
porated the “completeness” of a figure in addition to correctness.
For example, a figure that was technically correct but omitted one
of the variables was not scored as correct. After designing the
rubric, two members of the FotD research team (C.K.K. and
P.J.T.W.) independently applied the rubric to a set of 40 student
figures and provided identical scores in 38 of 40 cases (95%). The
rest of the scoring of student figures was done by C.K.K. and P.J. T. W.
After scoring the pre- and post-FotD figures, we conducted statistical analyses to determine whether the gains from pre- to post-FotD were different for the control vs. treatment FotD activities.
Prior to the analysis, we removed students with a score of 6, 7, or
8 in the pre- or post-project drawings (n = 27). These students did
not follow the instructions of the assignment, and it was not possible
to logically conclude that a score change represented an improvement or a decline in figure-making abilities. Conversely, scores of
5, 4, 3, 2, and 1 represent a hierarchical evaluation of student figures
from the poorest (5) to the best (1) representation of the data. Following these two filtering decisions, we were left with 82 students
in the control group and 81 students in the treatment group.
We first ran a paired Wilcoxon signed-rank (WSR) test to determine whether there were differences from pre- to post-FotD within
each group ( i.e., a WSR test for each of the control and treatment
datasets). A WSR test is analogous to a t-test but is appropriate for
paired, ranked nonparametric data. Because students were nested
in class sections, and students in the same class are more likely to
experience the same instructional environment, it is important to
account for the non-independence of students in the same class section (Paterson & Goldstein, 1991; Kreft & de Leeuw, 2002; see also
Eddy et al., 2014). We therefore addressed this nested or hierarchical structure of the data in our WSR tests by incorporating a random
effect that accounts for the variation between classes within the same
treatment group (package “coin” in R; Hothorn et al., 2008). The
WSR tests for each group provided information about changes from
pre- to post-FotD in the treatment group and in the control group
but did not inform whether or not the change from pre- to post-FotD was different between the two groups. To test for differences
between groups, we used a nonparametric bootstrap of the WSR test
(n = 10,000) for each group (R Development Core Team, 2011;
Canty & Ripley, 2017). This provided a mean effect size for each
group ( i.e., a representation of the average difference between pre-and post-FotD scores) along with standard error estimates.
For the second assessment, students responded to two open-ended questions:
1. List some of the positive aspects of the Figure of the Day
2. List some of the negative aspects of the Figure of the Day
Students’ answers were given anonymously, but each respondent
included their lab section, which allowed their responses to be
sorted into treatment and control groups. Responses were coded
into categories of either positive or negative aspects, with grounded