In search of the holy grade: Evaluating (under)graduate theses
The (under)graduate thesis is a challenging exercise not only due to its length, but also due to its unstructured format. There is no pre-determined set of questions that require answering, no pre-defined theory or method to arrive at an answer. These particular characteristics also make its assessment a complicated exercise.
The use of rubrics is a common solution. Such a template can work when theses are quite homogenous in the sort of questions, topics, theories and methods covered; they are less instructive in interdisciplinary programmes such as those run at FASoS. While the established criteria are useful as a rough guideline (e.g. the inclusion of a well-formulated research question), they are open for interpretation (i.e. what is “well-formulated”?). To address this caveat, we assign multiple graders. This raises the additional challenge of discrepancies between reviewers’ assessment.
Three calibration sessions take place each year to foster convergence between BA ES thesis graders. Participants grade a sample thesis on a 10-point scale in advance of the sessions and discuss how they arrived at their assessment. For the academic year 2018-2019, I gathered these ‘mock grades’ at each session. Using those grades, I simulated all possible pairings of graders (380 for each calibration session). The absolute value of the grade differential varied between 0 and 4 as indicated in the figure below. For the majority of cases, the proposed grades are relatively close. At the third and final session, 78% of potential pairings had a grade differential equal to or lower than l. Given the particular dynamics of a mock set-up, these numbers cannot simply be transferred to real gradings. Still, the range and distribution of grades found, is not out of the ordinary and is confirmed in similar research (but also here and here).
The bigger issue is how to cope with such divergences in assessment. I would like to discuss this from the perspective of a grader, a researcher, and a supervisor.
As a grader, such divergences necessitate consultation. At FASoS we resolve divergences through a deliberative process between first and second grader who eventually submit a single feedback form with a single grade. Usually this is rather easy when grades differ by less than 1 point. When grades are further apart, this leads to compromises whereby comments appear on the assessment form that may not have your full support, or a grade that –in your view- does not reflect the quality of the thesis. In less than 5% of the cases, such differences cannot be resolved, and a third reader is assigned who receives the assessments of the two initial graders and renders a final verdict. The wider the divergence in assessment (and grade), the more demanding the conciliation process both intellectually and emotionally. The reason, in my view, lies with the requirement that only a joint assessment form is to be presented to the student.
This conflicts with our role as an academic researcher. While it is common for scientific reviewers to disagree in their assessment of journal submissions, each review is kept intact and communicated to the manuscript’s author. The grading process used at FASoS hides this plurality of views and suggests there is only one correct way to do (and evaluate) research. Moreover, it can be difficult to separate our capacity as a grader of undergraduate research from our capacities as an academic researcher. That is, the ‘strictest’ grader (implicitly) takes a moral high ground as it suggests they use a higher standard in assessing academic research. We see this dynamic regularly on display during calibration sessions which can become an inhospitable environment to utter a positive assessment.
Finally, the assessment process also affects the way we fulfil our role as a supervisor. As a supervisor you also share in the disappointment when a good thesis is skewered by a second reader. I have drawn two lessons from these experiences. First, the need to clarify the assessment procedures as a context in which students should interpret my feedback. While the assessment form does not enable to show discord amongst graders, students need to be aware that they cannot rely exclusively on their supervisor’s guidance. Second, these experiences have made me aware of blind spots in my own supervision. Blind spots are issues that are crucial to certain colleagues but which I find less important in assessing the quality of research. Unfortunately, as staff evolves continuously, you are never fully able to cover all your blind spots.
Grading (under)graduate theses will always be challenging. By pretending there is a single correct assessment of the work, we only complicate this further. This is not to say that we should relinquish the pursuit of a shared understanding of what constitutes good thesis research. Rather it is an appeal to stop hiding the plurality of views that characterises thesis assessment. In so doing, we not only offer a more authentic assessment, we also train students to become self-critical and less dependent on their supervisor. Upon graduation, they will embark on a professional career where assessment is equally prone to interpretation.
About the author
Johan Adriaensen is an Assistant professor at FASoS, Maastricht University. Besides his research on the EU’s trade policy, he occasionally publishes on teaching and learning. Johan is currently the chair of the Educational Programme Committee for the Bachelor European Studies. Views expressed in this blog are his own.