As shown in Table 1, students in the viewing condition had a higher mean
score on the 12 - item written classroom
observation test (7.74
correct, sd = 1.64) than those in the coding condition (6.64, sd = 1.75) or the test - only control condition (6.48, sd = 1.18).
Regardless, and assuming that Barnum's original misinterpretation was
correct, I think how Katharine Strunk put it is likely more representative of the group of researchers on this topic as a whole as based on the research: «I think the research suggests that we need multiple measures — test
scores [depending on the extent to which evidence supports low - and more importantly high - stakes use],
observations, and others — to rigorously and fairly evaluate teachers.»