Commentary

Measuring Relative Performance

Author
David N. Plank
Stanford Graduate School of Education

Brad Olsen’s thoughtful PACE commentary on the forthcoming NCTQ/US News rankings of teacher education programs echoes some more general concerns about holding teachers and schools accountable. These concerns have come forward most recently in discussions of “value-added” assessment of individual teachers, but they are equally familiar from the decade-long debate over how to measure schools’ performance under NCLB.

In every case, the publication of information on relative performance—of teachers, schools, school districts, colleges, graduate schools, or teacher education programs—meets with resistance, from the targets of accountability but often also from researchers, on the grounds that the measures (AYP, API, teacher value-added, prestige rankings) on which performance is assessed are insufficient to the task.

Instruments have not been validated. Scores are unstable and unreliable. Performance is measured too narrowly. The system is subject to gaming. Successful teachers, schools, and programs will be subjected to undeserved stigma and shame.

These concerns are warranted.

AYP is a thoroughly inadequate measure of school performance, and teachers’ value-added scores provide a very limited amount of useful information about the quality of teaching.

The rankings that US News and World Reports publishes on American colleges are notoriously subject to gaming. Many of the teachers, schools, and programs that fall toward the bottom of the published scale are in fact doing a perfectly acceptable job.

All of these concerns arise in force when it comes to the NCTQ/US News rankings of teacher education programs, as Brad’s post makes clear. Acknowledging this fact, though, simply brings us face to face with a harder question: how much information do we believe is necessary to make consequential judgments about educational performance?

Or, to put the question another way, is some information about relative performance always better than no information?

Critics of school rankings and other measures of relative performance implicitly argue that imperfect information does more harm than no information at all. Our current measures of performance are flawed, and we should therefore withhold judgment about how teachers, schools, and programs are doing until we have developed measures that are generally recognized as accurate, reliable and fair.

At first glance this seems reasonable, but the wait for perfect information is likely to be a very long wait, if we even believe that the goal is attainable. Scholars are committed to standards of evidence that satisfy the criteria of scientific rigor. The data that come to hand for policymaking rarely if ever meet this standard.

AYP is far from perfect, but it has shone a light on critical performance issues in schools.

Teachers’ value-added scores do not tell us everything we want to know about the quality of their teaching, but they provide valuable information about teachers’ performance that would not otherwise be available. Rankings of teacher education programs will not provide a full or fully accurate account of each program’s effectiveness, but they will give us more information about the relative performance of these programs than we have now.

All of these measures have flaws and limitations, and it’s important to point them out. More and better measures would allow us to make more accurate judgments and wiser decisions. But until these new measures are developed (and Brad provides some useful advice about what they should look like for teacher education programs) we’re better off with the information that flawed measures produce than with no information at all.

Suggested citationPlank, D. N. (2011, March). Measuring relative performance [Commentary]. Policy Analysis for California Education. https://edpolicyinca.org/newsroom/measuring-relative-performance