Can Research Design Explain Variation in Head Start Research Results?

Hilary M. Shager
University of Wisconsin-Madison
Holly A. Schindler
University of Washington, Seattle
Katherine A. Magnuson
University of Wisconsin-Madison
Greg J. Duncan
UC Irvine
Hirokazu Yoshikawa
Harvard University
Cassandra M. D. Hart
UC Davis

The recognition that school-entry academic skills of poor children lag well behind those of their more advantaged peers has focused attention on early childhood education (ECE) as a potential vehicle for remediating early achievement gaps. The proliferation of high quality evaluations of ECE programs can yield important information about differences in the effectiveness of particular program models, but only if we understand the context of ECE research and are confident that divergent findings reflect meaningful differences in program effectiveness rather than technical differences in study design.

In our article, “Can Research Design Explain Variation in Head Start Research Results? A Meta-Analysis of Cognitive and Achievement Outcomes,” we use meta-analysis, a method of quantitative research synthesis, to estimate the short-term cognitive and achievement impacts of Head Start, a federally funded early education program for low-income children. In particular, we are interested whether research design plays a role in predicting these impacts. Our study sample includes 28 rigorous Head Start evaluations conducted between 1965 and 2007.  To combine findings across studies, estimates were transformed into a common metric, an effect size, which expresses differences between treatment and control groups as a fraction of the standard deviation of the given outcome.

Overall, we find a statistically significant average effect size of 0.27, suggesting that Head Start is effective in improving children’s short-term cognitive and achievement outcomes, and that the magnitude of Head Start’s impacts are similar to those of other ECE programs. Specifically, this effect size is smaller than the short-term cognitive effect sizes found in evaluations of more intensive ECE programs (e.g., Perry Preschool), but within the range of the overall average effect sizes on cognitive outcomes found in meta-analyses including a wider set of ECE programs. 

We also find that several research design factors significantly predict differences in effect sizes. The largest difference relates to what kind of alternative to Head Start the control group experiences.  In particular, studies with an “active” control group—one in which children experienced other forms of center-based education and care—produce smaller effect sizes than studies in which the control group is “passive” (i.e., received no alternative ECE).

Today, almost 70% of 4-year-olds and 40% of 3-year-olds attend some form of ECE; thus, an active control group is likely to be the norm. As a result, Head Start evaluations in communities where many of the control group children have access to other ECE programs are likely to produce substantially smaller effect sizes than those in communities where there are few other ECE programs available. Such a pattern of small or even null effect sizes does not indicate that Head Start is ineffective, but follows from the fact that an array of other public and private ECE programs are both accessible and effective in improving low-income children’s cognitive and achievement outcomes.  It also suggests that it is misguided to compare the effectiveness of other program models to Head Start if the activity level of the control group is likely to differ.

Consistent with previous research, we also find that skills more closely aligned with ECE curriculum such as early reading, early math, and letter recognition skills appear to be more sensitive to Head Start attendance than a broader set of cognitive skills such as IQ, vocabulary, and attention, which are less sensitive to classroom instruction. This finding has important implications for designers and evaluators of ECE programs; namely, that expectations for effects on omnibus measures such as vocabulary or IQ should be tempered. Our finding that less reliable dependent measures yield larger effect sizes also argues for considering the quality of the measures when interpreting program evaluation results.

The full study is in Shager, H., et al, “Can Research Design Explain Variation in Head Start Research Results? A Meta-Analysis of Cognitive and Achievement Outcomes, “ Educational Evaluation and Policy Analysis, March 2013 vol. 35 no. 1, 76-95.


Suggested citationDuncan, G. J., Hart, . M. D., Magnuson, K. A., Schindler, H. A., Shager, H. M., & Yoshikawa, H. (2013, May). Can Research Design Explain Variation in Head Start Research Results? [Commentary[. Policy Analysis for California Education.