Yet even as they surfaced possible misconceptions, the new teaching strategies these data teams came up with focused on helping students perform better
on those particular test items, rather than on improving instruction (Horn, Kane, & Wilson, 2015).
Not exact matches
To take an example, imagine that a
particular sub-group of students do more poorly than expected (based
on their performance
on other questions
testing the same math skill)
on a math
item that uses the word «foyer,» while other groups of students do just as well as expected.
This suggests an alternative criterion by which to judge changes in student performance - namely, that achievement gains
on test items that measure
particular skills or understandings may be meaningful even if the student's overall
test score does not fully generalize to other exams.
Surely Michelle Rhee must know that if children are drilled
on a
particular test, that
test can not be used to measure what they have learned, except perhaps the
test items themselves.
The technical explanation, in part, is that
test designers try to build questions that avoid Differential
Item Functioning (DIF)- items in which students from different groups (commonly gender or ethnicity) with the same underlying achievement levels have a different probability of giving a certain response on that particular i
Item Functioning (DIF)-
items in which students from different groups (commonly gender or ethnicity) with the same underlying achievement levels have a different probability of giving a certain response
on that
particular itemitem.
The example suggested that if students do not know the meaning of a
particular word in a
test item, they would be taught to replace it with an «X» and focus instead
on grasping the logic of the question phrasing that will give them a better chance of selecting the correct answer.