At school, children get tested a lot. How do those tests impact their learning? How can tests be made fairer? Educational psychologist Jacqueline Leighton introduces the best books in the evolving field of educational testing.
Can you tell me a bit about what’s going on in the field of educational testing that’s important for us, as outsiders, to know about?
I have been in educational testing for close to 20 years. One of the aspects of testing that has really changed over the last two decades is the awareness that testing what people know is not as easy as it seems. I am a trained psychologist and I’m very interested in the psychology of it. What are the variables that can interfere and that we can identify to facilitate the measurement of what people know?
We are increasingly aware, as educational testing specialists, that it’s not as simple as assigning numbers to test questions and adding them up and saying, ‘Well, out of 100, somebody knows this much about the material.’ There are questions like, do the items actually measure what we want them to measure? Are the items in some ways biasing the way that test-takers understand or respond to the questions? What are the emotions associated with responding to test items and are test scores a true reflection of what people know about a subject?
Those are really important questions that are increasingly being attended to in the field.
Aren’t standardized tests much more straightforward for subjects like math, where there is a clear ‘right’ answer, than they are for more subjective subjects like English?
A lot of math questions can be presented in really short sentences with very clear, objective responses. But even in domains such as math, it’s not straightforward to measure. Part of the reason is because it’s not just about whether the answer is objectively right or wrong. The way we ask questions can direct students to think about things in better or worse ways. Questions can either prompt somebody to understand exactly what they need to do or perhaps throw them off.
Think about test items the same way that you think about conversations—the way that somebody asks you a question can direct the way that you think about the answer. One of the things that we always have to be really careful about is that we are asking questions, including creating test items, that are clear to students so that if they do have the answer, if they have learned the material, they are able to demonstrate that.
“One of the aspects of testing that has really changed over the last two decades is the awareness that testing what people know is not as easy as it seems.”
Another important issue is that we also want to ensure that, in the questions that we ask, we don’t penalise students and learners from generating responses using approaches that we may be unaware of or that are innovative or creative. We don’t want to penalise students from thinking differently about problems but also thinking about them accurately. We want to make sure that we’re measuring the right things.
In terms of the issue of fairness: isn’t there always the problem that children from privileged backgrounds will have had a lot more practice—and so are going to do much better than those whose parents are less well-off or less focused on their child doing well at school?
Familiarity with the way questions are asked is an important variable. If you’re a child or a student or any kind of learner from a background where you’re comfortable answering questions, understanding the intent of questions, being able to formulate responses in a certain way that you know is the right way to respond to a question—whether it’s multiple choice or a constructed response—then you are going to be at an advantage.
Furthermore, I would argue that would also give you confidence in terms of the emotions associated with being evaluated. You would be more comfortable, more at ease—whereas a learner or a student who doesn’t have that type of background may be more anxious, more concerned about the way that they will be evaluated if they misunderstand the question. So fairness is an important part of making sure that we get the test items right.
Do you think testing is moving in that direction now?
Yes, I do. I am really happy about that because it’s about time.
Does it take a while before the research that people like you are doing is actually visible in terms of the tests students are taking across the country or around the world?
Large-scale testing programmes that have a really rich history are usually slower to move on changes just because they have that rich history, and change is always more difficult in large organisations with large programmes.
However, having said that, I’ve also seen large assessment programmes be some of the first to begin to invest resources into finding better ways to validate test items, to make sure that the test items are working properly, to make sure that they are investing in research studies that produce more evidence, that the test items are not penalising some students against others. So even though we have a long way to go, we’re getting there.
Let’s go through the books now and say a bit about each and why they’re important. So the first one on your list is by Denny Borsboom. It’s called Measuring the Mind (2005). Can you measure the mind?
He argues that we can, but we have to make sure that we articulate the way that we think the mind works so that we can actually create test items that tap into parts of the way the mind might be measurable.
I know that sounds circular, but the contents of the mind are not observed and so we need to find instruments to probe and make observable the contents of the mind—the way that it thinks or the way that it creates ideas, the way that it’s able to formulate ideas before they’re spoken. That can help us create tests that actually are the keys that can unlock the expression of the mind in written form.
Get the weekly Five Books newsletter
What I love most about this book is that it challenges assumptions about what we think about tests. Not only is it solidly conversant in psychometric theory, mathematically, psychometrically, it’s very good—certainly, an expert wrote it— but this expert is challenging the way that we’re thinking about tests, especially in terms of how we think about validity.
Could you give me an example?
One of the ways that we can better measure the mind, according to the book, is by developing theories of the way that we expect people to think about things—like mathematics, for example. So before we start creating a math test that, say, measures someone’s understanding of trigonometry, creating questions that we think are the best way to tap into what they know, it would be really useful to understand, first of all, how people think mathematically.
How do people, especially those who do so correctly, solve problems in trigonometry? How do they think about numbers? How do they put this information together? How does the mind—with the limitations that we know it has, like working memory—select information, combine information, express information?
So before we even begin to ask questions, he argues, we need to understand exactly what we’re measuring. So if we want to measure mathematical reasoning, problem solving, in trigonometry, how do people think about that? That’s the first question to ask. Then, once we understand that, we can then begin to ask questions about different parts of the measurement process.
So how do you set about understanding that?
Well, that’s actually one of the other books on my list, Protocol Analysis (1993). So one way that we can understand how people think is by conducting interviews for what are called ‘think aloud’ studies. You give somebody a problem—let’s say like a test item or a task—and say, ‘I want you now to think out loud as you solve this problem, and as you’re doing so, tell me everything that’s going through your mind as you try and figure it out.
Now, that’s just one way of understanding because there are certain limitations with the approach. First of all, if the problem is too easy or too difficult, there won’t be anything to report. So the problem has to be of moderate difficulty for the individual to actually be able to report something. That assumes, as well, that they’re making use of their short-term working memory for the thoughts or ideas, that they’re then able to report.
Another way is to use eye-tracking devices. This is done especially with reading comprehension problems. You give somebody a script to read or some text, and then you ask them some questions, and you follow their eye movements as they try and respond to the questions. So you track the way that they read the text and then you track the way that they go back to the text and look at certain words, look at certain parts, as they’re trying to respond to the question. Those kinds of behavioural indicators give clues as to the way that people might actually go about discriminating, selecting information, combining information ultimately to generate a response to a test item.
Let’s go on to your next book, which is Knowing What Students Know (2001). What’s this book about and why is it on your list?
This book had a big influence on me because it came out in 2001, just as I finished my post-doc and got my first job as a professor.
I was a psychologist interested in testing, and I knew that I had a real uphill battle. There weren’t many psychologists, it seemed, who were actually focused on cognitive psychology really working in the field, in the sense of really wanting to understand what test items measured in students. Most were quantitative psychologists working on just the numbers, mainly measurement models.
“I think that we should, as testing specialists, be very aware of what the unintended consequences might be of the tests that we administer to children and frankly all learners. What are they really learning?”
This book was one of the first to clearly articulate that we really have to think about the cognitive psychology behind the tests that we create.
So one of the important parts of the book is that it includes what is called the ‘assessment triangle.’ We have to think about the observations we make, the interpretations that we make, and also about the cognition that we observe. So it’s a book that very clearly emphasises the need for us, as testing specialists, not just to think about the quantitative measurement model aspects of creating tests but also the psychological aspects of exactly what we’re measuring.
Are you saying that at the time you were coming into the field that was quite unusual?
Certainly there were some people writing about it. Susan Embretson, for example, is a psychologist and a testing specialist that I have admired for decades. She was one of the first scholars who I began reading about who was making this point—that we really need to create tests that are closely linked to the psychological constructs that we’re interested in measuring.
But she was a lone voice in many ways. It wasn’t part of the everyday conversation about testing. Now it is. This book really opened the floodgates for a lot of research focused on cognition and assessment, with an increasing number of scholars taking Susan’s work and really running with it.
Let’s go on to book no. 4 on your list, which is How Children Fail (1964). What’s this about?
This is a book written by a schoolteacher back in the 1960s. It’s a really interesting book. It forces the reader to consider the unintended consequences of what we do in classrooms, including testing, and what we subject students to.
One of the things that John Holt talks about is how children can learn to game the system, because they begin to realise what it will take to do well in school. So they can “fake” doing well, but that doesn’t mean they’re learning. They’re just learning to do well in the system, and the system rewards certain ways of thinking, certain ways of behaving, certain ways of communicating. That can actually turn children off the real learning that we need them to do.
“I think there is virtue in understanding why there is a segment of the population that is against testing. ”
The reason I think this is an important book for testing is because I think that we should, as testing specialists, be very aware of what the unintended consequences might be of the tests that we administer to children and frankly all learners. What are they really learning? What is the evidence that they are truly learning content or to think deeply? Or are they simply showing us what it is we want them to show us now but will soon forget as meaningless later? So they’re showing us what they know, but what might be the consequences or the unintended consequences of the way that we do it?
I’m intrigued this book is on your list about educational testing—wouldn’t the author be against testing altogether?
Yes, he would. So this is just my opinion, but I think there is virtue in understanding why there is a segment of the population that is against testing. There is value for testing specialists, like myself, to understand their perspective because I think we have a lot to learn. That perspective is highlighting the things that tests might do that is actually not very good. How can we improve on that?
He has some great lines. I think one of them is, ‘Kids love to learn but hate to be taught.’
Yes. And you know what I love about so many of the quotes from that book is that they also apply to adults. I’m a university professor, and I’m here in this great institution of higher learning. I love to learn, but even I don’t want to be taught, necessarily.
Shall we talk about your last book? This is The Freedom to Learn (1969). One reviewer writes, ‘This book revolutionised the way we teach children.’ Is that true?
They might have a much broader understanding of the way that this book has influenced teaching. The reason I have it on my list is because it’s influenced the way that I think about testing. Because even though the book is about learning, assessment is part of the way that we currently teach students.
So you have teachers who obviously are teaching and hoping students learn, but are also part of a system of accountability and so we have assessments. This book, more than any other book that I have come across, articulates the importance of the social and the emotional variables that are important to consider not only when children are learning, but also when they’re being assessed.
So for example, one of the great things about this book is that it involves issues of trust. As instructors, as teachers, are we trusted by our students? Are we trusted by them to have their best interests at heart? Are we trusted by them to help them when they make mistakes, to help them express themselves? Because trust is important if we are then going to give them feedback about where they need to improve and the methods by which they should improve.
“Part of really feeling free to learn is that you feel supported in your entirety as a learner, not just cognitively but emotionally and socially.”
And all of this is taking place within a social environment, with other children. Are those other children part of the learning process in a positive way? Is there empathy in that classroom?
All these aspects, ultimately, all these variables, are important to consider at the end of the semester when children are completing their assessments or their final exams.
Has a culture of learning truly been created in the classroom, or any kind of learning environment, that is supportive of learning, which then an assessment is designed to measure? I don’t think assessments can measure learning in isolation, although we try to. But it is really a social and collective process. Children don’t learn in isolation, certainly they are not isolated in classrooms, and we’re testing them in classrooms. This is an important part to consider.
What does the author mean with the title, Freedom to Learn?
My understanding is that freedom to learn is the freedom to express yourself, and the freedom to make mistakes. Learning is a messy process. There are going to be ups and downs.
So part of really feeling free to learn is that you feel supported in your entirety as a learner, not just cognitively but emotionally and socially. Children learn in different ways. Do we give all children the freedom to learn in their own way? Well, of course in classrooms it’s not possible to tailor instructions to each individual student, but is there at least recognition that there is not a one-size-fits-all and that we try and take steps to accommodate children who learn differently?
Is this referring to the way, when I was little, I’d be told that my answer was wrong, whereas now my kids might be told, ‘Oh, that’s a very interesting answer’ instead?
We have to be careful with that. We don’t want to give children mixed messages. It’s very dangerous not to give children clear messages about where they are at, especially academically.
Personally, I think there is nothing wrong with saying, ‘That answer is not correct. But let’s find out how you came up with that answer, and let’s see where you might have gone astray.’ Furthermore, if a student and a teacher have a trusting relationship, then there is room to be supportive but honest.
So you were studying psychology. Why were you attracted to educational testing? Did you feel that it was an area where there was work to be done?
The reason I became interested in educational testing is because my psychological research was focused on measuring logical reasoning. So my PhD research and also my post-doctoral research were all focused on uncovering why it was that in the 1960s and 1970s psychologists were using a certain task called the ‘Wason’ task to evaluate the level of logical reasoning competence that undergraduates had.
One of the things that I found most intriguing with this research was that I considered the task to be very biased in the way it was prompting students to produce their responses. It assumed that these students who were responding to the task had a certain level of knowledge in the area of what is called ‘deductive logic.’ So I began to scrutinise and really think more deeply about the way that we assess knowledge and skills in the population.
We think about these items and tasks, even in psychology, as things that are fairly direct measures. In fact, they’re anything but direct measures—especially when we are trying to measure things that are unobservable, like reasoning process, problem solving, evaluation, analysis, creativity. These are all unobservable constructs.
Five Books aims to keep its book recommendations and interviews up to date. If you are the interviewee and would like to update your choice of books (or even just what you say about them) please email us at firstname.lastname@example.org