"The person doing the smiling is the person doing the learning"

It's nearly impossible to evaluate a teacher based on student outcomes, so instead, we evaluate teachers based on...whatever.

Dec 19, 2024

Years ago a local college held a poetry contest for teenagers, and the person running it asked me if I would ask students to submit their own original poems. I had a flood of poems submitted to me on the day they were due. At the time all of my students were all immigrants, so I did them the favor of scanning for spelling and grammar errors, and then I submitted the poems as they were. I didn’t read them very closely or alter them in any other way.

None of my students won first place, but two of them did get an honorable mention, which was essentially a three way tie for third place (two of my students plus one other individual). Curious, I looked up the students and their poems. One of them seemed particularly well written - too well written - and when I googled it, I discovered that the poem was “My Lover Asks Me” by Nizr Qabbani, who was, according to wikipedia, “Syria’s national poet”; I hadn’t noticed that she had plagiarized, and neither did the judge of the contest.

Which begs the question: who was this first prize winner? Who was this teenage poet from Albany who is a greater poet than the greatest contemporary poet of Syria? I do not have her original poem (she wasn’t my student), but I also don’t think that’s the point.

To be fair: maybe the winner wrote a single poem that was better than a single poem of Qabbani’s; maybe Qabbani’s poem, which was written in Arabic, lost something in the English translation; maybe the winner had plagiarized an even greater poet; maybe there’s an explanation.

Regardless, the end result was that Syria’s National Poet lost a poetry contest to an unnamed, unknown 16-year-old from Albany. For the sake of argument, let’s assume that this is an error in judgement. Where then did the error occur?

There are a couple of moving parts to this problem. There’s reliability: do your evaluation criteria give results that are consistent? There’s validity: do your evaluation criteria measure what it is you think it’s measuring? There’s confounding variables: what are the biases of the evaluator? Were there issues in the English translation of the poem? Did a barking dog distract the evaluator while she was pinning a red ribbon on the poem of a man who won the Al Owais Award for Cultural & Scientific Achievements? Not to mention the inherent subjectivity of poetry; while I’m of the opinion that “subjectivity” in literature is overplayed, it also can’t be denied that determining a poem to be “good” or “bad” can be a messy business.

There are some parallels between evaluating poetry and evaluating “good” or “bad” teaching. You can read a bad poem and know it’s a bad poem; you can read a good poem and know it’s a good poem. The same typically goes for teaching; and with both, we run into all kinds of problems when we get down to brass tacks. How good is good? How bad is bad?

With teaching, what we ultimately want to see is good outcomes in students. However, there are so many confounding variables which swallow up so much of the variance, that it’s nearly impossible to feather out the teacher’s impact on their specific students. The Bill and Melinda Gates Foundation spent $575 million trying to accurately assess teacher effectiveness and weren’t able to build a model that was both reliable and valid. My favorite anecdote comes from this study in which a teacher teaches facts incorrectly and still scores as an above average teacher according to the model:

In every [lesson], there were significant problems with the basic mathematics of middle school. She reasons incorrectly about unit rates. She concludes that an answer of 0.28 minutes must actually be 0.28 seconds because one cannot have a fraction of a minute. She tells students that integers include fractions. She reads a problem out of the text as 3/8 +2/7 but then writes it on the board and solves it as 3.8 + 2.7. She calls the commutative property the community property. She says proportion when she means ratio. She talks about denominators being equivalent when she means the fractions are equivalent. (Hill et al., 2010, p. 820)

Hill et al. (2010) reported that this teacher’s value added score was in the second highest quartile. They thoughtfully considered whether her instruction might have had other redeeming features that compensated for her basic lack of mathematical competence and were able to find none.

If over half a billion dollars and political headwinds weren’t enough to develop a model to quantify teacher quality in terms of student achievement, then I doubt such a model exists. There are too many externalities to make such a model work.

In the absence of a model that can accurately capture validity, we eke out fairness by capturing reliability. In the recent past, we’ve been told that, “The person doing the talking is the person doing the learning.” This is “fair” in the sense that it is reliable; if students are talking during the majority of your class, but I’m talking during the majority of my class, then we can observe and quantify that. Of course, there are many reasons to doubt the efficacy of student-talk - here is a high school lecture on Sophistry with no student talk, and which closed with applause from his students - but an evaluator can observe and record student-talk, which is student activity, which suggests student engagement, which implies student learning, and so must be preferable to teachers...teaching?

More recently, “the person doing the talking is the person doing the learning” has been upgraded to “the person doing the smiling is the person doing the learning.” During a recent professional development, we were told that we would know when students are learning when they showed “visible delight”:

CSDA: Student engagement is the degree to which students are attracted to their work, persist in their work despite challenges and obstacles, and take visible delight in accomplishing their work.

Much as my student tried to win the poetry contest by plagiarizing the Qabbani poem, this sentiment, which was not attributed to anyone except the school district (“CSDA”), also seems to be plagiarized, this time not from a great poet, but rather from a 2014 Edutopia blog post:

“Visible delight” is something that can be observed and recorded, and so we can reliably evaluate a teacher based on the frequency and duration of their students’ delight. If we do this consistently across classrooms, then we would have a fair and reliable tool for teacher evaluation. The problem, again, is that it’s unlikely that there’s anything remotely resembling a 1:1 between “visible delight” and, say, “learning how to think critically” or “memorizing times tables” or “reading and understanding The Great Gatsby”; it’s unlikely students will be engaged and attracted to their work more so than to their phones (or to donuts, for that matter); but we don’t know how to quantify validity in teaching, so we settle for reliability.

The most effective way to evaluate the quality of literature is to allow two generations to pass and observe whether or not the poem or the story is still relevant. This is not the most actionable method, particularly for something like a poetry contest, but it is the most effective in evaluating art with validity. I think the same has to be said for teaching methods; we may struggle to quantify teacher quality with reliability and validity in the present, but we can look at larger trends over longer periods of time and see pretty clearly what works. We want, as much as possible, a knowledgeable expert to deliver content in terms that students understand; students should practice until they reach a level of proficiency; and then students should be assessed. Students need to be placed in classes based on their ability level. Disruptive students need to be removed. After we settle all of that, we should talk about diversity of programming (not every student should follow the same academic track).

It’s not about student talk or student delight or social-emotional or learning styles or anything else from the clown show. Teachers teach. Students practice. Students test. Make sure students are grouped by ability. Make sure discipline is a school-wide priority.

Huskie's Newsletter

Discussion about this post