I should actually extend the title to ask why John White wants to rid the state of teachers who specialize in teaching math, science or social studies to kids who are either doing very well or doing very poorly. Why would I ask that? Because the Very Awful Mess (VAM, sometimes incorrectly referred to as the Value-Added Model) is built in such a way that teachers of those subjects, when their students’ expected scores are near the top of the range (500) or the bottom (100), are much less likely to get a benign rating from VAM than English teachers or teachers with middle-of-the-range students. Why? Let me try to explain in a simple way, so that maybe even John White could understand.

The VAM scheme calculates an expected score for students subjected to LEAP (4th and 8th grades) and iLEAP (3rd, 5th, 6th and 7th grades) tests and the End-of-Course tests in Algebra and Geometry. For third graders, expected scores are calculated on only English-Language-Arts (ELA) and Math. For the other grades 4-8, there are four tests evaluated: ELA, Math, Science and Social Studies. For purposes of this post, I will ignore Algebra and Geometry, since the scoring details of those tests seem to be state secrets of John White’s office. Apparently, so are the possible scores for 2013 administrations of the LEAP and iLEAP tests. However, I have the technical reports containing the conversion tables used to convert raw scores on the 2012 tests to scaled scores between 100 and 500. It’s reasonable to assume the test-scoring scheme didn’t change enough in one year to correct the problem I am about to describe.

Consider the Social Studies portion of the iLEAP test given to 7th grade students. There were 40 points available in the raw score (mostly if not entirely from multiple-choice questions each counting one point). If a student gets none right, his scaled score is 100 (the lowest available). If he gets one, two, three, four, five, six, seven or eight points, the scaled scored remains 100. However, getting a ninth answer correct jumps the scaled scored to 161. Now, let us assume that the student’s expected score from the VAM (Very Awful Mess) is 130. The “expected score” is calculated from the mysterious black box that is the Very Awful Mess and allegedly contains inputs regarding previous test scores and student characteristics, stirred together in a caldron in the Claiborne Building and reported to the teachers of the state (after the current tests have already been taken) as though it means anything beyond an attempt to fire a random chunk of teachers by calling them ineffective.

Back to the example: This student’s contribution to the teacher’s average “teacher effect” will either be -30 (130 minus 100, considered ineffective) or 31 or more (161 minus 130, considered highly effective). The same situation arises at the high end of the scale; on this same test a student with an expected score of 450 would generate a teacher effect of 50 (if the kid gets all the answers right and earns a scaled score of 500) or -22 at best (if the student misses at least one question and gets a score of 428 or lower). Now consider an “average” kid in terms of social-studies test-taking proficiency. If the student has a calculated expected score of 300, she can get very close by getting 19 questions correct (scaled score of 297, teacher effect of -3) or 20 correct (scaled score of 302, teacher effect of 2). A student would have to get a much higher or lower raw score to have the same effect on teacher effect as compared to a single question when the student is expected to do very well or very poorly.

The reason English doesn’t have the issue to the same degree is that the 400-point range from 100 to 500 is divided into many more pieces. There are more questions on the English exams, and they can be scored in half-point increments. There’s still volatility at the high and low ends, but it is not as pronounced as with the other subjects.

So, if you were considered “highly effective” by the Very Awful Mess, be wary; it might be your turn to be unlucky next year if this problem isn’t fixed. And if you were considered “ineffective” look closely at whether you students’ expected scores were even possible.

To see the data and some graphs, go to https://sites.google.com/site/drjamescfinney/home/files and select either the .xlsx or .pdf version of raw_to_scale_conversions_final. I also put copies of the LEAP and iLEAP technical reports there, in case they can’t be obtained from the state.

I’d love to be more specific about how many teachers got screwed by this particular characteristic of the Very Awful Mess; unfortunately the Department of Education needs to be further educated on public records law in Louisiana. In the mean time, I offer this as yet one more reason to hate the Very Awful Mess of VAM.

Well said. Wouldn’t it be nice to get to open the black box and examine the inner mechanics of the system directly? Of course, it might be as ugly as Medusa and we might turn to stone…