Placebo buttons and state exams

"The sign says push, so I push. I think it works."

Aug 17, 2024

If you are in New York City, and you push the button to cross the street, you have done nothing to hasten your journey. According to The NY Times, almost all of those buttons had been disconnected in the 1980s. It was more cost effective to just leave them in place.

The article, written in 2004, describes a hospital worker named Réna who was told that the buttons don’t work. Her response? "The sign says push, so I push. I think it works."

I’m sure sometimes it does, Réna. But then again, sometimes it doesn’t. Maybe it works best when pressing is coupled with harsh language and earnest prayers.

I’m not picking on Réna. Knowing that most buttons have been disconnected in Albany, NY, where I’m from, I’m still going to push them. Not only am I going to push them, I’m going to continue to believe that I am doing something by pushing them. The illusion of control is comforting.

I have found a similar illusion of control in standardized testing, specifically, on the ELA Regents exam. I have heard teachers claim that, “They get good results”, and administrators claim that, “Our department’s exam results beat the so-and-so department every June.” I think we’re pushing the crosswalk button and claiming we made the light turn green. It’s comforting, but it’s not true.

I explained the math that supports my claim in a previous post. This time, I’m going to use visuals. Below is a scatterplot of Albany High School’s June 2024 ELA Regents results and the twelve 11th grade teachers assigned to each student.

The results of the linear regression suggest that the effect the teacher has on exam results is statistically significant, and the model can explain a good portion of the variance. Here is a scatterplot of the same teachers and their students’ predicted June 2024 ELA Regents results:

Here, we can clearly see that Teacher 12 is the best at preparing students for the exam, Teachers 8-11 are pretty good, and Teachers 1-11 need more training, as the model is predicting that there is a better than average chance that their students are going to fail this exam (less than 65).

So it would seem that teachers do have a statistically significant influence and student exam performance, and that this influence can be used to predict student performance with some accuracy.

Except…

Teacher 4 and Teacher 10 (highlighted) are co-teachers in the same exact classroom. The model predicts student results will be 53 and 78, respectively. The students in this classroom are blended together and the teachers teach together; the students themselves are only assigned different teachers due to the teacher’s certification (Special Ed and English, respectively). In fact, “teacher assignment” is often just a proxy for “student placement”.

The Teacher highlighted green is the Advanced Placement (AP) teacher. The teachers highlighted in blue teach either English Language Learners, Special Ed, or Alternative Placement students, and the teachers highlighted yellow teach students who aren’t AP, ELL, Special Ed, or in alternative placement (i.e., mainstream students).

You can’t compare AP with other APs, because we only have one AP teacher. If you compare the teachers in yellow, the results are not statistically significant (students perform the same regardless of the teacher, and there is no predictive power). If you compare the teachers in blue then the results are statistically significant, however, the predictive power is quite low. There is some more to be said about the students in blue, however, it drags us from the main point, which is:

I would argue that we aren’t comparing teachers at all; we are observing students who have been grouped based upon their academic abilities, performing as you’d expect based on their academic abilities.

We can test this by taking the teachers out of the equation, and attempting to predict scores based on nothing but student performance on previous exams. Any combination of the Algebra, Global, and/or US Regents exams. I chose the US Regents for this graph just because that gave me the most data points, but I’ve ran the regression with all combinations, and the results are very similar.

This model (using the US History Regents to predict the ELA Regents) is a better fit for predicting performance than “assigned teacher” (or more accurately, “observed academic ability”). Here you may get a better idea of the accuracies of each model, side-by-side:

The students who were highlighted in blue are more difficult to build prediction models for because they have more confounding factors. I’m reminded of the “Anna Karenina Principle”: All happy families are alike; each unhappy family is unhappy in its own way. The lower scores for these students are lower each in its own way, which have disparate effects that aren’t always simple to capture. If we want to compare teacher impact with general academic ability, without having to contend with confounds like not knowing English, it would be helpful to remove those students from the set.

I don’t believe these exams are sensitive to instruction. I think they reflect a more general academic ability that tends to be resistant to teacher instruction.

Perhaps this is not always true. I would love to investigate further with data from other school districts. However, I don’t have access to that data. As it stands, from what I’ve discovered here, Regents exams are a good tool for separating students by general academic ability, but are not a great reflection on the quality of teacher instruction. We push the button because we believe it changes the traffic light, but those buttons aren’t connected to anything.

A brief epilogue:

I can imagine some administrators looking at this, and thinking, “AP students are clearly held to a higher standard. If we make all students AP students, then the rigor will increase, and test scores will go up.”

Please don’t do that. That’s not how reality works.

I can imagine some teachers (likely most teachers, to be honest) who will (1) get defensive (“Oh so you don’t think teachers can educate! You think teachers are worthless! Diane Ravitch told me about you!” etc. etc.) and (2) morph into Réna, saying, “Well I still think it works.”

As to #1, this is not what I’m suggesting. Teachers of course can educate students. As a homeschool father, I literally spend thousands of dollars a month on teachers, tutors, and coaches for my sons. If I didn’t think they had the capacity to improve my kids, I would put that money towards a Mediterranean cruise. What I’m saying is that there are test designers who engineer standardized exams to have a certain result, not everyone can pass or else the test loses its discriminatory value (if everybody is proficient then nobody is proficient), and they are very good at accomplishing this.

As to #2, bring receipts. I’ve brought mine.

Photo by Erik Mclean: https://www.pexels.com/photo/sign-with-pedestrian-crosswalk-call-button-against-road-7447902/

Huskie's Newsletter

Discussion about this post