How Teacher Evaluation Methods Matter for Accountability
Policymakers are revolutionizing teacher evaluation by placing greater focus on student test scores and classroom observations of practice and by increasing the stakes attached to evaluations. The federal program, Race to the Top, requires participating states and school districts to measure and reward teachers and school leaders based on contributions to student achievement, or “value-added.” States such as Florida and Ohio have legislation requiring that value-added and other student performance measures comprise roughly 50 percent of the teacher evaluation. The other 50 percent consists of teacher evaluations, typically conducted by building-level administrators. These evaluations are the basis for high-stakes decisions about promotion, tenure, dismissal, and compensation for both the teachers and principals. Notably, the Bill and Melinda Gates Foundation has invested $45 million in the Measures of Effective Teaching (MET) project, informing this ongoing national experimentation by measuring teacher effectiveness in multiple ways, including student evaluations of teachers, student classroom work, and evaluations of classroom practice using multiple rubrics.
Yet, despite the increasing use of these different types of evaluation, little is known about the relationship between teachers’ value added scores and principals’ evaluations of the same teachers. To understand this, Douglas Harris, Kyle Ingle, and Stacey Rutledge undertook a mixed methods analysis of data from mid-sized Florida school district. Thirty principals were asked to rate 294 teachers on their overall effectiveness and specific teacher characteristics identified in the broader research (e.g., caring, enthusiasm, subject matter knowledge) and to describe the teachers in their own words. The principals’ ratings of teachers were compared to the matched value-added scores. Harris and colleagues found that teachers with very good value-added ratings were more likely to get very good ratings from principals, but the principals’ ratings and the value-added scores were weakly correlated. Deeper analysis of principals’ open-ended interview responses revealed that some principals gave low ratings to high value-added teachers because they believed the teachers exerted too little effort, especially in pursuing professional development opportunities. Several principals described teachers with high value added scores but low principal ratings as “lone wolves” who worked in isolation and contributed little to the broader school community, focusing narrowly on the students in their classrooms. Other teachers had high principal ratings but low value added scores. In the case of these teachers, principals looked beyond test scores at the contributions that these low value added teachers brought beyond their classrooms and they considered the life challenges that some of these teachers faced (e.g., aging parents, young children).
The findings suggest that the choice of evaluation tools within accountability systems could influence not only which teachers are rewarded (e.g., through tenure) in the short term, but the qualities and activities of the teaching profession in the long term. If high stakes are attached more to principal evaluations, then the work of teachers is likely to shift toward visible effort and social interaction with colleagues, whereas if they are applied more to value added, then teachers will probably focus more on classroom activities.
The full study appears in Douglas N. Harris, W. Kyle Ingle, and Stacey A. Rutledge, How Teacher Evaluation Methods Matter for Accountability: A Comparative Analysis of Teacher Ratings by Principals and Teacher Value-added Measures, American Educational Research Journal, 51(1), 73-112.