Thursday, March 06, 2008

Prichard Committee touting flawed critique of SB 1

Georgetown College's Center for Advanced Study of Assessment (CASA), it turns out, is not so expert after all. In a recent report being touted by the Prichard Committee, Skip Kifer, Ben Oldham and Tom Guskey criticized Senate Bill 1, which would replace the CATS test with tests that are actually objective, reliable, and useful. But, as it turns out, according to another testing expert, several of their criticisms got basic things wrong about the CATS test, calling into question the report's credibility.

George Cunningham, an emeritus professor from the University of Louisville, the author of numerous books on educational testing and a nationally recognized measurement expert, points out that the CASA report made fundamental errors in describing the CATS tests and what SB1 would do.

Is CATS a "standards-based" test?
The CASA report made the assertion that the CATS test was "criterion-referenced" or "standards based", and that SB1 would replaced it with a "norm-referenced" test:
The new legislation, while not requiring an off-the-shelf set of tests, appears to favor such an approach by requiring norm-referenced tests for individual students rather than the criterion-referenced or standards-based ones which historically the Commonwealth has used to measure school outcomes. (p. 7)

"The authors are confused," says Cunningham, "or perhaps just dated in their use of measurement terminology." "The criticism of SB1 tests that they will be norm-referenced is nonsensical because the current test, CATS, is also norm-referenced."

Oops. It might be a good idea, folks, before we start defending the CATS test to know what kind of test it is.

He points out that to say that CATS is somehow "standards-based" is misleading, and that it is only standards based in the same sense that all test are standards based:
The term “criterion-referenced” has lost its meaning. At one time it referred to the process of reporting results on an objective-by-objective basis and it was closely associated with mastery learning. Outside of special education, it would be difficult to find examples of this sort of criterion-referenced testing. Certainly, neither KIRIS nor CATS was ever criterion-referenced in this sense. Because the term apparently focus-groups well, a more modern usage of the term has emerged.
Ouch.

A "criterion-referenced" test is one that sets forth certain objective criteria and the score depends upon how a student meets those criteria. If a student, say, gets 6 out of 10 questions right, and 60 percent is a D on a predetermined grading scale, then the student gets a "D". A "norm-referenced" test is like test graded on a curve. If a student gets the same 6 out of 10, but the average in the class is a 6 out of 10, then the student gets a "C".

Cunningham's point is that neither the the KIRIS (the CATS before 1998) or CATS tests (KIRIS after 1998)--or the tests proposed by SB1 are "criterion-referenced". They're all norm-referenced. Of course Bob Sexton and the Prichard Commitee have been spreading this disinformation for years despite the fact that it has been pointed out publicly a number of times. In fact, I pointed it out in an opinion piece in the Herald Leader after the CATS test was first implemented.

Can multiple-choice tests measure complex knowledge and skills?
The CASA report repeats the completely unfounded assertion that multiple choice tests have some problem measuring advanced knowledge and skills:
The major strength of multiple-choice items in an assessment is that they are efficient. That is, in a relatively short amount of time, it is possible to get information about array of knowledge and skills. Their strength is not in measuring complex skills and knowledge.
Wrong again, Cunningham points out. "High quality, reliable and valid, off-the-shelf, standardized achievement tests are available to assess reading and math," he says, "...These available tests also do a good job of assessing high level thinking skills." In fact, Cunningham apparently considers the error bad enough to call CASA's credentials into question:
It is a little surprising to read a statement like this written by members of an organization that claims to focus on the advanced study of assessment. A more nuanced discussion about test type and high level thinking might be expected...It is axiomatic in educational measurement, that high level thinking is measured well by multiple-choice items. The authors should know this.
That's about as strong as academic take downs get. Once again, multiple choice tests can accurately and reliably measure high level thinking skills. In fact, it's done all the time. Just repeating a discredited view that they can't doesn't make it true.

I should point out here that I have questions concerning how well writing skills can be assessed using any system of measurement. Only another competent writer can assess competent writing. But that is not what is at issue here.

Are multiple choice tests less reliable for assessing schools?
The CASA report argues that the CATS test is a better measure of school performance than the more objective tests proposed by SB 1:
SB 1 changes the fundamental purpose of the assessment from emphasizing school outcomes to measuring individual student achievements. This, of course, has consequences. The most important one is whether the new emphasis and assessment is a better measure what Kentucky wants its schools to do ... The assessment envisaged by SB 1 would take, by design, a substantially narrower sample of the domain of desirable outcomes. (p. 8)
Well, not so fast. Says Cunningham, "There is no reason that test scores cannot be valid for both individual students and schools. Actually, the validity of school scores is dependent on the validity of individual students."
Kifer, Oldham, and Gusky acknowledge that matrix sampling renders individual students scores unusable but they claim that they make the school scores better. They assert that the SB 1 test sacrifices the validity of the school scores to get individual scores. While it is true that it is possible to include more open-ended items if multiple forms are used, by using a multiple-choice format even more items can be included, more than enough to compensate for the broader coverage from matrix sampling.
One wonders if the Prichard Committee had a role in getting this self-serving report produced in the first place, or whether they were attracted by the misinformative nature of it after the fact, and simply saw another opportunity to serve up disinformation. We do know that Helen Mountjoy, the Governor's education secretary requested the report, and that Mountjoy has long been a blind apologist for the state's flawed testing system. She has worked hand in glove with the Prichard Committee to oppose attempts to address the flaws in the tests. In any case, one wonders why there are those who still consider these people reliable sources of information.

No comments: