Data fixation in education
Why do school staff act as though they have skill sets and training that they clearly don't have?
A combination of the completion of a graduate-level school administration program, the popularity of John Hattie’s Visible Learning (and its in-depth critique), and my own personal work experience as a 7-12 ELA teacher, has led me to the conclusion that when it comes to data, school staff are delusional. Public school administrators especially, in a dramatic game of The Emperor’s New Clothes, have somehow convinced themselves that they are statisticians with a firm grasp on research methodology.
I am not a statistician or a researcher. I also don’t know how data is being collected, analyzed, or discussed in every single school in America. I have a very good feeling, however, that school people are often committing several errors when discussing data, some of them mathematical, some of them methodological, and some of them analytical.
Here are some of the errors that I have seen routinely committed in school, usually without any awareness whatsoever that there may be an issue:
The question doesn’t assess the standard that it supposes to assess, or assesses multiple standards while claiming to assess only one.
On a multiple choice question, there is no statistical control to differentiate between a student who knows the correct answer and a student who guessed. With a 25% chance at guessing, some correct answers are going to be lucky guesses and not reflective of what a student actually knows. This error is more significant when there are only one or two questions assessing that standard.
On a multiple choice question, there is no statistical control to differentiate between a student with an incorrect answer who has no idea which answer is correct, and a student who chose an incorrect answer, but has a plausible rationale (which arguably demonstrates that they understand the standard, if not necessarily the correct answer for the particular question). Think of a math question, in which the student understands the concept, but makes a simple computational error.
There is no statistical control to differentiate between a student who doesn't understand the standard being assessed, and a student who didn't comprehend the vocabulary on the assessment.
There is no statistical control to differentiate between students who have built schema around the question (have familiarity with the passage or poem, or the historical event, etc.) and who answer the question correctly based on that background knowledge, but who may not be proficient in the standard being assessed, and those who have a decent grasp on the standard being assessed, but had no schema built around the question and therefore did not answer correctly.
A pre- and post- test with exactly the same questions risks "practice effects." Completely different tests risks many of the above issues being present in one test and not another, and any difference in score may be explained by reasons beyond proficiency in the standard.
There is no statistical control to differentiate between students learning the skills/content in the class being tested, or learning the skills/content in a different class, or learning them at home.
The results are compiled in a simple average. In my experience, all they do with the data is take the average, which is not a sophisticated or even helpful way to compile data.
The sample sizes are usually much too small to have significance, or sample size isn’t taken into account while using the data available.
The time between testing is usually much too short to be meaningful, and may not suggest much beyond a particular group of students’ ability to memorize in the short term for a test.
Baseline assessments favor a specific type of student, i.e., a student who is determined to learn material of which they have had no exposure. That is, bigger gains are given more credit, with the same credit going to students who already knew the material and therefore didn’t make as much gain as is given to students who knew nothing and demonstrated no learning on the post-assessment (for whatever the reason).
The stakes attached to the data are too high (I recommend the book The Tyranny of Metrics). I've known teachers who say to students "don't try on the first test", knowing the teacher’s evaluations depend on it. In fact, I’ve been encouraged to do the same, and students, who know the game, actually refused to put effort into a pre-test in my own class, even when I encouraged them (I don’t like the idea of telling students not to try, even on a test that I think is worthless). I don’t mean to make this seem diabolical - it’s perfectly predictable. The higher the stakes, the more likely there is going to be "gaming" across the board.
This doesn’t always happen, but to the extent we incorporate demographic information into the data, not a lot of care is taken with groupings, especially with black students and with immigrants. For example, a third generation Jamaican, a Tanzanian immigrant, and an American-born “African American”, would all be considered the same category of student due to their skin color, despite often dramatic differences in culture as well as in test results. Another example is grouping all “English Language Learners” into one category, despite any number of dramatic cultural and linguistic differences that may manifest within test scores.
I’m sure there are more pitfalls in the use of data in education. Like I said, I know enough to know that I don’t know.
And to be clear, I’m less worried about the existence of pitfalls - there is no avoiding some of them, and everything is a trade-off - and more concerned about what seems to me a near complete ignorance, or denial, that the pitfalls and trade-offs exist.
From my perspective, it looks as though a major American institution is playing make-believe, and using the results of their fantasies to drive curriculum and instruction.