Better Data From Better Measurements Using Computerized Adaptive Testing
Abstract
The process of constructing a fixed-length conventional test frequently focuses on maximizing internal consistency reliability by selecting test items that are of average difficulty and high discrimination (a “peaked” test). The effect of constructing such a test, when viewed from the perspective of item response theory, is test scores that are precise for examinees whose trait levels are near the point at which the test is peaked; as examinee trait levels deviate from the mean, the precision of their scores decreases substantially. Results of a small simulation study demonstrate that when peaked tests are “off target” for an examinee, their scores are biased and have spuriously high standard deviations, reflecting substantial amounts of error. These errors can reduce the correlations of these kinds of scores with other variables and adversely affect the results of standard statistical tests. By contrast, scores from adaptive tests are essentially unbiased and have standard deviations that are much closer to true values. Basic concepts of adaptive testing are introduced and fully adaptive computerized tests (CATs) based on IRT are described. Several examples of response records from CATs are discussed to illustrate how CATs function. Some operational issues, including item exposure, content balancing, and enemy items are also briefly discussed. It is concluded that because CAT constructs a unique test for examinee, scores from CATs will be more precise and should provide better data for social science research and applications.
DOI:10.2458/azu_jmmss_v2i1_weiss
Keywords
Full Text:
PDFDOI: http://dx.doi.org/10.2458/v2i1.12351