Can tracking improve learning? Evidence from Kenya.
Duflo, Esther ; Dupas, Pascaline ; Kremer, Michael 等
Tracking students into different classrooms according to their
prior academic performance is controversial among both scholars and
policymakers. If teachers find it easier to teach a homogeneous group of
students, tracking could enhance school effectiveness and raise test
scores of both low- and high-ability students. But if students benefit
from learning with higher-achieving peers, tracking could disadvantage
lower-achieving students, thereby exacerbating inequality.
Debates over tracking reached their high point in the United States
in the 1990s. An influential report published in 1998 by the Thomas B.
Fordham Foundation argued that the available research did not support
the contention that tracking doomed impoverished students to inferior
schooling, nor did it support universal adoption of the practice. Over
the last decade, patterns in grouping students have changed markedly in
the U.S.; high school students are no longer placed in rigidly defined
general-education or noncollege tracks but have the flexibility to move
between course levels for different subjects. These changes may have
assuaged some critics, but the broader debate over tracking remains
unsettled.
[ILLUSTRATION OMITTED]
The central challenge in measuring the effect of tracking on
performance is that schools that track students may be different in many
respects from schools that do not. For example, they may attract a
different pool of students and possibly a different pool of teachers.
The ideal situation to assess the impact of tracking on test scores of
different groups of students would be one in which students were
assigned to tracking or nontracking schools randomly, and the
performance of students could be compared across school types.
We shed light on these issues using data from Kenya. In 2005, each
of 140 primary schools in western Kenya received funds from the
nongovernmental organization International Child Support (ICS) Africa to
hire an extra teacher. One hundred twenty-one of these schools had a
single 1st-grade class and used the new teacher to split the students
into two classes. In 61 randomly selected schools, students were
assigned to classes based on prior achievement as measured by test
scores. In the remaining 60 schools, students were randomly assigned to
one of the two classes, without regard to their prior academic
performance.
[ILLUSTRATION OMITTED]
The results showed that all students benefited from tracking,
including those who started out with low, average, and high achievement.
At the tracking schools, the test scores of students who started out in
the middle of their class do not seem to be affected by which section
(top or bottom) the students were later assigned to. In other words, any
negative effects of being with lower-achieving peers were more than
offset in tracked settings by the benefit of the teacher being able to
better tailor instruction to students' needs.
Primary Education in Kenya
The Kenyan education system includes eight years of primary school
and four years of secondary school. Like many other developing
countries, Kenya has recently made rapid progress toward the goal of
universal primary education. After the elimination of school fees in
2003, primary school enrollment rose nearly 30 percent, from 5.9 million
in 2002 to 7.6 million in 2005. This is typical of what is happening in
sub-Saharan Africa overall, where the number of new entrants to primary
school increased by more than 30 percent between 1999 and 2004.
This progress creates its own new challenges, however.
Pupil-teacher ratios have grown dramatically, particularly in lower
grades. In our sample of schools in western Kenya, the median 1st-grade
class in 2005 (after the introduction of free primary education, but
before the class-size-reduction program we study here) had 74 students
and the average class size was 83. These classes are heterogeneous in a
number of ways: Students differ vastly in age, school readiness, and
support at home. Many of the new students are first-generation learners
and have not attended preschools, which are neither free nor compulsory
in Kenya. These challenges are not unique to Kenya; they confront many
developing countries where school enrollment has risen sharply in recent
years. Understanding the roles of tracking and peer effects in this type
of environment is thus critically important.
Our results are most likely to be directly applicable to settings
where classes are large, the student population is heterogeneous, and
few additional resources are available to teachers. It is unclear
whether similar results would be obtained in different contexts, such as
developed countries, where smaller class sizes may allow more tailored
instruction even without tracking, and extra resources, such as remedial
education, computer-assisted learning, and special education programs,
may already provide tools to help teachers deal with different types of
students.
Design of the Experiment
This study takes advantage of a class-size-reduction program and
evaluation that involved primary schools in Bun-goma and Butere-Mumias
in Western Province, Kenya. Of 210 primary schools in these districts,
140 schools were randomly selected to participate in the Extra-Teacher
Program. With funding from the World Bank, ICS Africa provided each of
the 140 selected schools with funds to hire an additional 1st-grade
teacher on a contractual basis starting in May 2005, the beginning of
the second
term of that school year. Most of the schools (121) had only one
1st-grade class, which was split into two classes when the new teacher
was hired. The 19 schools that already had two or more 1st-grade classes
added another class.
It is important to note that the incentives facing the newly hired
teachers differed from those facing civil-service teachers already
working in program schools. The new teachers had clear incentives to
work hard to increase their chances of having their short-term contracts
renewed and of eventually being hired as civil-service teachers--a
desirable outcome in a society where government jobs are highly valued.
In contrast, the difficulty of firing civil-service teachers implies
that they had weak extrinsic incentives and may be more sensitive to
factors affecting their intrinsic motivation.
Average class size was reduced from 84 to 46 students in the 140
schools that received funds for a new teacher. The program continued for
18 months, which included the last two terms of 2005 and the entire 2006
school year, and the same cohort of students remained enrolled in the
program.
From the 121 schools that had originally only one 1st-grade class,
60 schools were randomly selected to assign students to one of the two
classes by chance. We call these schools the "nontracking
schools." In the remaining 61 schools (the "tracking
schools"), the children were divided into two sections according to
their scores on exams administered by the school during the first term
of the 2005 school year. The 50 percent of the class with the lowest
exam scores were assigned to one section (the "bottom class")
and the rest were assigned to the other (the "top class").
After students were assigned to classes, the contract teacher and
the civil-service teacher were also randomly assigned to classes. In the
second year of the program, all children not repeating the grade
remained assigned to the same group of peers and the same teacher.
Data
Our initial sample consists of approximately 10,000 students
enrolled in 1st grade in March 2005 in one of the 121 primary schools
participating in the study. The outcome of interest is student academic
achievement, as measured by scores on a standardized math and language
test first administered in all schools 18 months after the start of the
program. Trained proctors administered the test, which was then graded
blindly by data processors. In each school, 60 students (30 per class)
were drawn from the initial sample to participate in the tests. If a
class had more than 30 students, students were randomly sampled.
The test was designed by a cognitive psychologist to measure a
range of skills students may master by the end of 2nd grade. One part of
the test was written and the other part oral, administered one-on-one.
Students answered math and literacy questions ranging from counting and
identifying letters to subtracting three-digit numbers and reading and
understanding sentences.
To limit attrition from the experiment, proctors were instructed to
go to the homes of sampled students who had dropped out or were absent
on the day of the test and to bring them to school for the test. It was
not always possible to find the child, however, and the resulting
attrition rate on the test was 18 percent. However, there was no
difference between tracking and nontracking schools in overall attrition
rates. In total, we have postintervention test-score data for 5,796
students.
[ILLUSTRATION OMITTED]
In addition, each school received unannounced visits several times
during the course of the study. During these visits enumerators checked,
upon arrival, whether teachers were present in school and whether they
were in class and teaching, and then took a roll call of the students.
To measure whether the effects of the program persisted, the
children who had been sampled for the first postintervention test were
tested again in November 2007, one year after the program ended. During
the 2007 school year, these students were overwhelmingly enrolled in
grades for which their school had a single class, so tracking was no
longer an option. Most of these students had reached 3rd grade by that
time, but those repeating an earlier grade were also tested. The
attrition rate for this portion of the experiment was 22 percent.
Neither the proportion nor the characteristics of children who could not
be tested differed between the tracking and nontracking schools.
The Impact of Tracking
We estimate the impact of tracking on student achievement by
comparing the postintervention (18 months after the experiment began)
test scores of students in the tracking and nontracking schools. Taking
the average of students' scores on math and literacy exams, we find
that students in tracking schools scored 0.14 standard deviations higher
than students in nontracking schools overall. When we adjust the
comparison to take into account minor differences in student
characteristics across the two groups of schools, the effect increases
to 0.18 standard deviations. There was no significant difference between
the impact of the program on math and literacy scores when we examined
the subjects separately.
How large were these effects? A typical student with a literacy
score one standard deviation above that of the average student could
correctly spell 5.5 of 10 words included on the exam, while the average
student could spell only two. Similarly, students with a math score one
standard deviation above the average were able to perform single-digit
multiplications, whereas those at the mean could not. The average effect
of tracking was roughly one-fifth the size of these performance
differences.
These gains persisted beyond the duration of the program (see
Figure 1). When the program ended, most students had reached 3rd grade,
and all but five schools had only one 3rd-grade class. The remaining
students had repeated and were in 2nd grade where, once again, most
schools had only one large class, since after the program ended they did
not have funds for additional teachers. Even so, the test scores of
students in tracking schools remained 0.16 standard deviations higher
than those of students in nontracking schools overall (and 0.18 standard
deviations higher with control variables). The persistence of the
benefits of tracking is striking, as many evaluations find that the
test-score effects of successful interventions fade over time. It seems
that tracking helped students master core skills in 1st and 2nd grade
that in turn helped improve their learning later on.
Tracking Gains (Figure 1)
The benefits of tracking persisted even one year after the intervention
ended and students returned to regular classrooms. Among students
assigned to civil-service teachers, the gains from tracking were
statistically significant only for students who had been assigned to
the top class.
Effect of Tracking One Year After the Intervention Ended
Overall 0.18 *
Bottom class with contract teacher 0.18 *
Top class with contract teacher 0.25 *
Bottom class with civil-service teacher 0.09
Top class with civil-service teacher 0.20 *
Note: * indicates that the effect is statistically significant at the
10 percent level. All effects are measured relative to students in
nontracking schools.
SOURCE: Authors' calculations
Note: Table made from bar graph.
We also examine whether the effect of tracking differs between
initially high-scoring students (who are grouped with other strong
students in tracking schools) and initially low-scoring students (who
are grouped with other low-scoring students in tracking schools). We
find that both groups of students benefited from tracking, and by
approximately the same amount. A year after the intervention ended, the
effect persisted for both the top and bottom classes.
Tracking increases test scores for students taught by contract
teachers. In fact, students initially scoring low who were assigned to
contract teachers benefited even more from tracking than students who
initially scored high. But students who initially scored low showed only
a small and statistically insignificant benefit if assigned to a
civil-service teacher. In contrast, tracking substantially increased
scores for students who initially scored high and were assigned to a
civil-service teacher. Below we discuss other evidence that tracking led
civil-service teachers to increase effort when they were assigned to
high-scoring students but not when assigned to low-scoring students.
[ILLUSTRATION OMITTED]
Changes in Peer Achievement
Data from the tracking schools allow us to estimate the effect of
being taught with a higher-achieving vs. lower-achieving peer group by
comparing students with baseline test scores in the middle of the
distribution. Because of the way tracking was done (splitting the grade
into two classes at the median baseline test score), the two students
closest to the median within each school were assigned to classes where
the average prior achievement of their classmates was very different.
By comparing pairs of students right around the cutoff, we can
estimate the effect of being the lowest-achieving child in the class
compared to being the highest-achieving student in the class. We find
that, despite the large gap in average peer achievement (1.6 standard
deviations in baseline test scores) between the top and bottom classes,
the students just below the cutoff have postintervention test scores
similar to students just above the cutoff. Moreover, when we compare
students around the cutoff at the tracking schools with students of
similar ability at the nontracking schools, we find that students at the
tracking schools score higher at the end of the intervention than the
comparable students in the nontracking schools. These results imply that
being the best student in a class of relatively weak students and being
the worst student in a class of relatively strong students are both
better than being the middle student in a heterogeneous class. This
evidence suggests that students benefit from homogeneity because the
teacher does not need to spend time addressing the needs of students
performing at widely varying levels.
Learning from Peers vs. Learning from Teachers We took a separate
look at students in schools where students were not tracked but instead
assigned to classes randomly. The random assignment of students and
teachers within these schools made it possible to see whether and how
peer achievement affected the performance of individual students when
education took place in an untracked setting. We found that it did. If
peer achievement was higher--0.10 standard deviations higher, to be
exact--students learned 0.04 standard deviations more than they would
have otherwise.
These results, taken together with those reported earlier, indicate
that peer influence depends on whether or not classes are tracked. In
untracked classes, where there is considerable heterogeneity of
performance, students learn less if their peers are lower performing. At
least in this particular setting, however, the homogeneous classes that
are created by tracking seem to allow the teacher to deliver instruction
at a level that reaches all students, thus offsetting the effect of
having lower-performing peers. Interestingly, combining the direct
effect of peer achievement with the fact that the median children in
each school did not suffer from being assigned to the bottom track
suggests that teachers focus their attention not on the median student
in the class, but at students considerably above the median.
Why Did Tracking Work?
Two additional pieces of evidence shed light on the question of why
tracking had such clear benefits. First, we look at teacher presence and
effort. Do they spend more time in class and teaching? Then, we examine
whether the test-score gains in tracking schools were concentrated among
simpler or more complex tasks and whether this varied by students'
initial achievement levels. Our results confirm that students in tracked
classes seem to have benefited from more-focused teaching and perhaps
also from greater teacher effort.
Teacher absence is a major problem in Kenya, as in many developing
countries. Only 59 percent of teachers were in class and teaching during
unannounced visits to a comparable sample of schools that did not
receive an additional teacher. Overall, teachers in tracking schools
were 9.6 percentage points more likely to be found in school and
teaching during random spot checks than their counterparts in
nontracking schools, who were present and teaching only about half of
the time. There were, however, large differences across teachers. The
contract teachers were much more likely to be found in school and
teaching (74 percent versus 45 percent for the civil-service teachers),
and their absence rate was unaffected by tracking (see Figure 2). The
civil-service teachers were 10 percentage points more likely to be in
schools and teaching in tracking schools than in nontracking schools
when they were assigned to the top class. This difference is
statistically significant and amounts to a 25 percent increase in
teaching time. However, the difference between tracking and nontracking
school types was smaller and statistically insignificant for
civil-service teachers assigned to the bottom classes.
Teacher Commitment (Figure 2)
Contract teachers in both tracking and nontracking schools were much
more likely to be present and teaching than their civil-service
counterparts, but tracking increased attendance and effort by a
statistically significant amount for civil-service teachers assigned to
the top class.
Share of Teachers Present and Teaching in Tracking and Nontracking
Schools
No tracking Tracking, Tracking, assigned
assigned to to top class
bottom class
Civil-service teacher 45 55 *
Contract teacher 75 78
Note: * indicates that the difference between levels in tracking and
nontracking schools is statistically significant at the 10 percent
level.
SOURCE: Authors' calculations
Note: Table made from bar graph.
These results suggest that teachers may be more motivated to teach
a group of students with high initial scores than a group with low
initial scores or a heterogeneous group. Recall that students assigned
to the top class with a civil-service teacher benefited more from
tracking than those assigned to the bottom class with a civil-service
teacher. Increased teacher effort may help explain this pattern.
Another hypothesis consistent with both the tracking results and
the effects from random peer assignment is that tracking by initial
achievement improves student learning because it allows teachers to
focus instruction. Teaching a more homogeneous group of students might
allow teachers to adjust the material covered and the pace of
instruction to students' needs. For example, a teacher might begin
with more basic material and instruct at a slower pace, providing more
repetition and reinforcement, when students are initially less prepared.
With a group of initially higher-achieving students, the teacher can
increase the complexity of the tasks and pupils can learn at a faster
pace. With a heterogeneous group, they may be compelled to cover both
simple and advanced material, spending less time on each, which would
hurt all students.
One way to examine this is to see whether children with different
initial achievement levels gained from tracking differentially in terms
of the difficulty of the material that they learned. While the results
for language are mixed, the estimates for math suggest that, although
the total effect of tracking on children in the bottom class is
significantly positive for all levels of difficulty, these children
gained from tracking more than other students on the easier questions
and less on the more-difficult questions. Conversely, students assigned
to the top class benefited less on the easier questions, and more on the
more-difficult questions. In fact, they did not significantly benefit
from tracking for the easier questions, but they did significantly
benefit from it for the more-difficult questions. These results suggest
that tracking helped by giving teachers the opportunity to focus on the
competencies that children were not mastering.
[ILLUSTRATION OMITTED]
Conclusion
A central challenge of education systems in developing
countries--the context for which our results are most relevant--is that
students in the same grades and classrooms are extremely diverse. Our
results show that grouping students by preparedness or prior achievement
and focusing the teaching material at the most appropriate level could
potentially have large positive effects with little or no additional
resource cost. One could also target more resources to the weaker group,
further helping them to catch up with their more-advanced counterparts.
It is often suggested that there is a trade-off between the value of
targeting resources to weaker students, and the costs imposed on them by
separating them from stronger students. We find no evidence for such a
trade-off in this context.
Our results may also have implications for debates over school
choice and voucher systems. A common criticism of such programs is that
they may hurt some students if they lead to increased sorting of
students by initial achievement and if all students benefit from having
peers with higher initial achievement. If tracking is indeed beneficial,
this is less of a concern.
Esther Duflo is professor of economics at the Massachusetts
Institute of Technology. Pascaline Dupas is assistant professor of
economics at University of California, Los Angeles. Michael Kremer is
professor of economics at Harvard University.