The Case Against Regents Exams

The case for decoupling Regents exams as a graduation requirement in NY

Nov 07, 2023

Back in June, Kathleen Moore with the Times Union wrote an article titled An advocacy block studied Regents exams - and gave them an ‘F’: Research that shows tests don’t improve outcomes or increase learning could fuel move to dump them which cited a three page “research brief” from the Coalition for Multiple Pathways to a Diploma. The coalition of “more than 50 advocates and nonprofit groups” claim that research suggests the exams, as a graduation requirement, are ineffective at raising student achievement.

While I agree that Regents exams ought to be dropped as a graduation requirement, I’m concerned that some of the arguments put forth in the brief are terminally bad; it has a real “throw everything at the wall and see what sticks” feel to it, which makes me think that there is some other underlying reason, unknown to me, why this group would like to dump Regents exams. Maybe I’m being too cynical, but I doubt it.

The main thrust of my argument is that public secondary schools in NY lack opportunity for educational and experiential diversity. Every student must complete 22 credits in order to graduate, of which 18.5 are scripted. Five Regents exams must be passed, three of which are totally scripted, the other two more-or-less scripted. I’m proud of Albany School District, where I work, for offering so many opportunities for students outside of the traditional curriculum - however, these are opportunities that very few students are able to actually take advantage of, and of those students who do take advantage, they make up a tiny portion of their overall high school experience. Eighty four percent of every NY high schooler’s experience has been predetermined.

It was through that lens that I read the actual brief, several times, as carefully as I could. I also read some of the research that was linked to the brief. I want to support this coalition, whoever they are, in first “decoupling” Regents exams from graduation requirements, and then, hopefully, relaxing or eliminating credit requirements. I was, however, underwhelmed. If we don’t get these arguments right, we risk rebuilding on a foundation of sand; whatever replaces Regents exams could easily be even worse. Here are the main flaws in the coalition’s brief:

“...exit exams increase high school dropout rates, particularly for students of color…”

For one, it’s not true, unless you define “students of color” as “some groups of black students and, to a lesser degree, Hispanic students.” If we are using “white students” as some kind of baseline for every group's expected educational achievement, there are multiple groups of non-white students who are performing above that baseline. For another, even if it were true, it’s not necessarily an argument for ending the exams. The fact that groups have differences at all implies that all sorts of outcomes will be different, including educational outcomes. I would say that if there were differences in graduation rates between racial and other groups (there are), then that’s a flag alerting us to investigate further, but once the investigation is complete, the conclusion could just as easily be to not change a thing about graduation requirements. For what it’s worth, I spilled quite a bit of ink describing how girls do better than boys in school (here, here, here, and here). It’s a much more reliable predictor of academic success than race - should the conclusion be that we need to rework all of school until academic outcomes are equal between the two sexes? Maybe. Maybe not. It’s a flag and should be investigated. Final point on this: I don’t think “blacks can’t pass tests” is quite the argument the Coalition thinks it is.

“Standardized tests are an unreliable gauge of graduation-readiness.”

Well, if passing standardized tests are a requirement for graduation, then the correlation between being “ready for graduation” and “passing exit exams” is 1.0. “Graduation-readiness” is somewhat arbitrary. For example, consider the Criteria for Kindergarten Graduation 1979 vs. 2012, excerpted and adapted from The Coddling of the American Mind, pages 186-187:

1979:
-Can your child tell, in such a way that his speech is understood by a school crossing guard or policeman, where he lives?
-Can he draw and color and stay within the lines of the design being colored?
-Can he stand on one foot with eyes closed for five to ten seconds?
-Can he ride a small two-wheeled bicycle without helper wheels?
-Can he travel alone in the neighborhood (four to eight blocks) to store, school, playground, or to a friend’s home?
-Can he be away from you all day without being upset?
-Can he repeat an eight- to ten-word sentence, if you say it once, as “The boy ran all the way home from the store”?
-Can he count eight to ten pennies correctly?

2012:
-Identify and write numbers to 100
-Count by 10’s to 100, by 2’s to 20, by 5’s to 100
-Interpret and fill in data on a graph
-Read all kindergarten-level sight words
-Be able to read books with five to ten words per page
-Form complete sentences on paper using phonetic spelling (i.e., journal and story writing)

“Exit exam policies assume that the tests in question are an accurate measure of student learning, yet there is evidence that factors outside of any student’s control—like the weather on the day of the exam—can significantly affect their performance. An analysis of the June Regents exam scores of nearly one million New York City students between 1999 and 2011 found that, for the average student, having to take a Regents exam on a day when it is 90°F outside reduces the chances of passing that subject by roughly 10%, relative to taking the exam on a 75°F day.”

I mean - yes, interesting, but if this means anything at all, it means that administrators ought to be more mindful in which classrooms they seat students on test day, assuming there are cooler parts of the school building than others. Or maybe be more strategic in scheduling students for June Regents exams. “If the temperature is fifteen degrees different then there is data from 12 years ago that suggests a 10% dip in pass rates” does not feel like a strong argument for decoupling Regents exams from graduation requirements. It sounds like an interesting factoid that administrators ought to keep in the back of their mind during Regents week.

“GPA is more strongly correlated with success in college than is performance on the SAT or ACT.”

The study does not reference majors in college - it’s a claim on a correlation between the average 4-year graduation rate of all majors, and success on the SAT or ACT vs. GPA. I suspect that success on the SAT or ACT - or the Regents exams, for that matter - would be more positively correlated with majors in the hard sciences, and would be less correlated with majors in the grievance studies. This is because one of those things requires an above average capacity for abstract thought, and one of those things requires an unscratchable itch to smash the patriarchy. Aside from GPA and entrance exams, the study suggests that “grit” is predictive of 4-year college graduation rates - but “grit” is inseparable from the big five personality trait “conscientiousness”, which is essentially a student’s inclination towards organization and hard work, which isn’t particularly mind-blowing. Also, the SAT/ACT are entrance exams for colleges, whereas the Regents exams are exit exams for High School. Entrance exams are optional and typically taken by those who are going on to the colleges that require them, whereas the Regents exams are taken by everybody who wishes to graduate High School (essentially everybody). The purpose of the exams are different and the context in which they are taken are different. But most importantly, “success in college” should not be the presupposed endpoint of High School. The Coalition writes, “Supporters of exit exams argue that tests are neutral indicators of college-readiness”, the counter argument shouldn’t be, “The purpose of our schools is to churn out college-graduates and our GPAs predict that better than Regents exams so get rid of them.” The counter argument is, “The median age of a tradesman in the USA is 55, and no amount of government infrastructure stimulus money is going to draw men from the ether who can actually keep the bridges from collapsing.”

My own observations are as follows - and I have no idea why these things are rarely (if ever) talked about in discussions regarding abolishing Regents exams.

Consider the following:

An exam could be constructed that nearly 100% of all high schoolers would fail. It could be something ridiculous like a single question asking students to recall pi to a thousand places, or an exam on building a helicopter, or something slightly less silly like neuroscience or a fluency test in Sanskrit. Can we agree, in principle, that it is possible to construct an exam that nearly all students would fail?
The opposite is also true. We could construct an exam in which nearly all students would pass. Recite your ABCs up to G, or something similar. Again, in principle, I think we can agree that this is true.
Therefore, if we accept that human diversity is real, and that a diversity in academic ability exists, it is possible to construct an exam that falls in the middle of these extremes. We could construct an exam in which some percentage of high school students are expected to fail, and some percentage of high school students are expected to succeed.

I don’t think many people spend much time with the Regents exam technical reports, but they go into great depth on how the exams are constructed. I spent quite a bit of time on the English report, since I am an English teacher and most familiar with the exam.

One of the first sections is on “Item Difficulty”. When constructing the test, the questions are first given a trial run (field testing) on a “representative sample” of students. In this context, “p-value” is simply what percentage (as a decimal) of students answered each item correctly. The range of p-values during their field testing was .43 to .88, with an average of .70, which means that, on average, 70% of students will answer each question correctly. If this were a completely random event, then we would expect that more-or-less students would answer an average of 70% of the multiple choice questions correctly. Alas, in order to have any validity at all as an instrument, the test must discriminate between “strong” and “weak” students; which means that, generally speaking, any student who does well on the test overall must have a higher than average chance of answering any of the individual items correctly (and vice versa).

This is the concept known as “unidimensionality.” What this means is that there is an expectation that the test isn’t testing a set of different skills, but rather, a single dominant dimension. The group of students who do well on the entire exam ought to, on average, do well on each individual item, because the exam is testing one thing. Not the “ELA State Standards.” Not any particular writing prompt or genre of literature. One single dominant dimension is being tested, or else the test is considered invalid.

Returning to my original proposition: “We could construct an exam in which some percentage of high school students are expected to fail, and some percentage of high school students are expected to succeed.” This is what we’re doing. The exam is constructed with an expectation that it discriminates between strong and weak students on the basis of some dominant dimension (that has not been articulated in the technical report, but seems to be “academic ability”). The test is constructed for some percentage of students to fail; which would be fine as a diagnostic tool for academic ability, but is brutal when considering it as a requirement for graduation.

Not in the technical report, but my own observation and hypothesis, is that there is a strong possibility that all of the required Regents exams (English, US History, World History, Living Environment, and Algebra) are testing the same single dominant dimension. I have access to the data from the district where I work. At least in our district (and likely everywhere) the English Regents exam scores are correlated with the average of all other exam scores at .81. In other words, there is consistency in the relationship between how students score on one exam, and how they score on average on all the others, and that relationship is positive and strong. The same thing is being tested on all the exams.

If the Regents exams were not used as a requirement for graduation, and were simply a diagnostic tool for “academic ability”, you probably wouldn’t need all five. You could likely get away with the Algebra and any one of the other four, and get a complete enough picture of student ability; US History and Global are correlated with the ELA exam at .76 and .73, respectively. In other words, US, Global, and ELA are basically all the same exam. Living Environment is correlated with ELA at .66, which is also very strong - I don’t know how strong a correlation needs to be before we can say it’s the same thing, but .66 is pretty strong. Algebra is correlated with ELA at .43 which is still a positive correlation, although if used as a tool to discriminate between ability, it would probably be useful to use the Algebra Regents plus any one of the others. If a statistician is reading this, I’d love for you to weigh in.

This is not actually surprising. If the five required “Regents” exams were changed from English, History, etc., to pull-ups, pushups, leg lifts, a two mile run, and a hundred meter swim, then we would expect that these things would be strongly correlated, and in effect, be testing the same dominant dimension (physical fitness), with “swimming” being the “algebra” of the lot (correlated, but less strongly). We would also expect that if we separated classes by “athletes” and so-called “emerging athletes” that the “athlete” classes would perform better than the “emerging athletes”, regardless of the teacher.

What is surprising is what is highlighted in yellow. These are four different 11th grade classes with four different teachers, four different co-teachers, and three different curricula (ELA Regents is typically administered in June of students’ Junior year).

The first observation of the classes highlighted in yellow is that student performance on the ELA Regents is almost the same. Even where there is a difference (the largest difference between averages being .31 points) those groups of students had nearly identical correlations with all other Regents exams. They didn’t just perform a bit better or worse on the ELA, they performed as expected based on their other exams. In other words, “11th Grade Honors” performed a bit better than one of the World Experience classes, but those Honors students also tended to perform better on all of their other exams.

That said, even if we just consider the average score and the distributions, the four highlighted classes performed nearly the same when held in contrast to the AP Language class, the Special Ed class, the ENL class, and the Night School class. The most significant variable in predicting ELA test outcomes does not seem to be the classroom teacher, class size, or the curriculum, but rather the student population.

To expand on this point, consider the first “World Experience” course and the “11th Grade Integrated, ELA” course, which is co-taught with the course highlighted in green (they are separated on the sheet because the exam results are sorted by “teacher of record”, however the integrated courses are one set of classes, co-taught by both teachers). Consider this:

The integrated course has 107 students, including 17 special ed students, divided into six sections. They are taught by an English teacher and a special ed teacher. The curriculum, I believe, is the same or similar as the 11th grade honors course.
The World Experience course has 57 students (almost half that of the integrated class), divided into three two-period block sections. They are taught by an English teacher and a Social Studies teacher. My understanding is that the curriculum is developed to parallel and support the 11th grade Global curriculum.
Despite being in the same classroom with the same teachers, the integrated students who have been identified as special ed performed much lower than the other students in their class, and the World Experience students.
Despite the differences in class size, scheduling, curriculum, and teacher background (and, if you can take my word for it, teacher experience, personality, and style), non-special ed students in the “Integrated” course performed within one one-hundredth of a point to the “World Experience” course (i.e., exactly the same).
Again - if a statistician is reading this, I would love your input.

This is not to say that teachers, curriculum, and class size don’t matter in terms of student experience. I want to emphasize that I truly don’t believe that Regents exam outcomes should be driving course offerings whatsoever. This is to say that teachers, curriculum, and class size seems to have a negligible impact on student performance on the ELA Regents, and I would suppose all of the Regents exams.

Strictly in terms of success on the exam, there is certainly some benefit of test prep. I suspect the returns diminish precipitously after four to eight weeks of training, and I doubt that the “skills learned” are universally transferable. I’m an English teacher, but I’ve on occasion been involved in helping students prep for the Algebra Regents, and many of them are well trained in what sequence of buttons to push on the calculator to get the correct answer, yet have no math sense. A similar thing happens with the ELA exam.

To summarize my case against Regents Exams as an exit criteria for high school:

It is important to get these arguments right. The Coalition for Multiple Pathways for a Diploma gets many things wrong in their case for decoupling Regents exams from graduation.
That said, we absolutely need to decouple Regents exams from graduation and have multiple pathways to graduation.
The Regents exams are constructed to discriminate between student ability in a singular, dominant dimension (“academic ability”). In order to be valid and reliable, the tests necessarily must result in some percentage of students failing every year. There’s nothing nefarious here, it’s the nature of test construction.
However, I repeat: the tests are designed for some students to fail. It is inevitable, which is inherently unfair when used as an exit requirement.
There is nothing wrong with discriminating based on academic ability; however, there are other dimensions of ability and skills that can also be taught and assessed. I gave examples of physical fitness and mechanical abilities. There are others besides those. The crux of my argument has always been to differentiate educational experiences in order to maximize student potential.
I did not, and I should, give fair space to the purported purpose of the Regents Exams, i.e., without universal high stakes assessments, teachers and administrators cannot be held accountable for providing a rich and rigorous educational experience to students. I don’t think the concern is completely unwarranted, nor do I believe it’s easily fixed. I have some thoughts for another post, on another day.

Huskie's Newsletter

Discussion about this post