Grading known students: An (unacknowledged) challenge for PBL assessment
Before I came to Maastricht in 2016 I worked in the Norwegian university system. There, exams at the BA and MA level (preceding the MA thesis) were graded blindly by two examiners. Coming to Maastricht University, where in most courses we grade exams of the students in our tutorial group and without the checkpoint of a second grader (apart from for BA and MA theses), was a big transition.
Especially in the first few periods, I was acutely aware of knowing the faces of the students I was grading, and sometimes not only their performance in the tutorial group, but also personal circumstances that might affect their exam performance. In my mind, this is a pressing problem for quality assurance, since grading known students means a higher risk of biased grading.
At first, I tried to continue my Norwegian grading habits through downloading exams with only student numbers visible. But that did not work, since students include their names on the cover page of the exam, and sometimes on every page. Forsaking the strategy of trying to continue grading blind in a system where most people do not, I sought to compensate for the risk of bias through maintaining an acute awareness of it, and consulting colleagues when in doubt. I know most colleagues do so, and those of us who coordinate courses also check the grades of tutors against each other. This important quality control procedure does reveal that normally examiners rate exams quite similarly.
It also shows, though, that there are minor differences in how we weigh various aspects of students’ exams. The importance given to quality of language is a recurring example. This is why it is so important that we do have two graders discussing and agreeing on the grade for the final theses. Since the practice of having two graders for every exam as a quality control mechanism is not feasible in our five-period system, we need to look for other measures to ensure the consistent and unbiased assessment of all our students’ exams.
In PBL the importance of formative over summative assessment is often stressed. At UM we do have multiple instances of peer and teacher feedback, but graded exams are, like it or not, here to stay. Ensuring that exam assessments are conducive to students’ learning is, therefore, key. An important aspect is consistency of feedback. Grading forms are increasingly introduced for courses (we have long used them for final theses); having to relate to a number of specified criteria might reinforce the idea that the grade must always be clearly justifiable.
At the level of the individual grader, our colleague Ike Kamphof emphasised a few helpful measures:reading the first few exams again after having read and assessed all, “as I tend to be somewhat disappointed in the beginning and somewhat stricter”, and especially reconsidering narrow fails or passes. While these measures might go some way toward remedying the downsides of the single examiner grading known students, in my opinion the problem is not fully solved.
In 2018, I started discussing with the tutors of the courses I coordinated whether we should attempt blind grading, or at least, as some of our colleagues in the BA ES and other programmes already do, grade each other’s groups. While I think this solution beats grading one’s own tutorial group, it still potentially means grading students known from lectures and other courses. Also, as one of my tutors suggested, PBL tutorials are quite flexible and student-centred, and we cannot assume that the students in one tutorial group learn precisely the same as the students of another group within the same course. Tutors grading their own students ensures that they can reasonably assume what the students should have retained from the tutorials.
A more serious objection that potentially questions the validity of our entire grading system came from a colleague I consulted on the matter, who suggested that we should acknowledge that there is positive bias in our grading system. She argued that grading students that were not known to us could well lead to lower grades on average, due to the ‘halo’ effect of seeing students perform well in class. While other courses were graded by their tutors, grading unfamiliar groups would be unfair to the students of my course.
While I sincerely hope that this effect is less pronounced than suggested, given these objections, I did end up continuing with tutors grading their own group that year. However, I remain convinced that this is a major quality concern, for which we need to continuously search for better options. The increasing number of courses where tutors do grade each others’ groups seems to indicate rising awareness of this issue, and I hope that this blog entry spurs further search for potential solutions.
About the author
Nora S. Vaage is assistant professor in philosophy of art and culture, and associate programme director of the BA Arts and Culture. Nora also teaches and supervises in the MA Art, Literature and Society, Arts and Heritage, and Media Culture.