[Skip!] [Why?]

Introduction: Assessment in Crisis?


Professor Phil Race






Assessment, Learning and Teaching Visiting Professor, Leeds Metropolitan University (September 2006).

This volume contains a wealth of experience, and collects together the wisdom and talent of a wide range of practitioners who are trying to make assessment work better for their students. For most students, assessment is a principal driver of their learning. If there weren’t any assessment, perhaps not much learning would take place. However, there is a lot of dissatisfaction and frustration among staff internationally concerning assessment. We’re overburdened with it—and so are our students. Perhaps a good starting place for our efforts to improve assessment is to look at what we could do to make assessment a better driver for learning.

At the Northumbria Assessment Conference (very much an international event, with participants from round the globe), on 31st August 2006, as part of a symposium about ‘Changing hearts regarding assessment’, I asked participants to write on post-its their heartfelt completions of the starter: “Assessment would be better at making learning happen with my students if only …”. With their permission, I quote (in Table 1) all of their thoughts about what we need to do (and what students need to do) to make assessment a better driver for learning. There is, of course, some overlap, but it is interesting to see where the overlaps lie—signposting the most serious of the problems we face in trying to make assessment fit for purpose.

Table 1: Quotes from Assessment Symposium: Assessment would be better at making learning happen with my students if only …

I had more time to spend having individual assessments that truly met their personal learning requirements.

We could forget percentage marks and leave just feedback and pass/fail/merit instead.

I could spend more time on assessment and less on delivery.

We did not have grades at final level undergraduate study.

They realised the importance of the process to their future development.

There were less dilemmas and constraints in the assessment process.

They didn’t wait till the last minute to do any work.

They knew what is expected and they could steer themselves there with some guidance from me.

I had more time with individuals or groups rather than 200 at a time.

I spent more time on working with others on preparing them for assessment.

I could talk through drafts with them as part of the learning process, in a detailed manner.

My students became involved (immersed in) assessment—in collaboration with me and each other—right from the outset of the module.

We introduced a systematic regime of formative assessment.

They found a self-fulfillment value in the assessment.

The university would not impose dead criteria based on what is easy to measure rather than what we want students to do.

The assessment criteria were transparent and understandable.

They would recognise that I am not assessing them, their worth, but their ideas.

It was more person centred (individual, applicable) to the students.

We took into account their individual learning needs.

Both the students and myself could negotiate and discuss what matters in their learning.

Students themselves were more involved in the design of the assessment.

Feed-forward was more constructive across the board.

Assessment processes sequentially built students’ ‘process’ skills at levels 1, 2 and 3, enabling them to develop antecedent skills for self-assessment, peer-assessment, group assessment processes, etc.

It was a true reflection of their work.

We got the students to evaluate the success and impact of the chosen assessment process on their learning.

They were less anxious about it.

All colleagues would take the time to think more about learning and assessment.

They were aware of how and why it is done.

It involved self-assessment.

The feedback could be oral and one-to-one (for hundreds of students!).

I was free to choose the most appropriate assessment.

There weren’t so many students!

The overall module design allowed feedback to influence future learning.

All lecturers adopted similar principles relating to support and feed-forward—especially in the early stages.

It were more fun and more enjoyable.

My students understood the value of it in affecting their learning.

Course texts were more accessible in terms of language difficulty.

The system was more flexible—we are chained to percentages.

Then learning outcomes were transparent to the students.

I would make the purpose clear and give clear instruction to the students about what they ought to do.

They truly valued the process and afforded the optimum amount of time needed for it.

It valued the students’ active involvement in the process—peer and self-assessment is known to have benefits, but not used enough!

I really knew what I was doing—and they really knew what they were doing.

Perhaps the last of these quotations sums it up? (… if only … I really knew what I was doing—and they really knew what they were doing). But there are some notable trends in the responses of these delegates, including:

It should also be borne in mind that the participants at this particular conference were largely a self-selecting group of practitioners who know already a great deal about assessment, and care a lot about making it work well—and are often fighting battles in their own institutions to improve assessment processes, practices and instruments. In other words, they are in a position to be expert witnesses regarding the problems encountered in the context of assessment.

Therefore, it is particularly useful that this volume has collected not just the ‘if only’ problems, but a rich collection of what is being done to address these problems. After reading the ‘if only’ statements reproduced above, there is no doubt that there are major problems to overcome in designing and implementing assessment. In other words, we know all too well why we need to change assessment, but we still need to learn how to do it. The case studies in this book provide welcome reassurance that there are indeed ways of going about this task.

The terminology of assessment

A lot has been written about assessment, and a vocabulary of specific terms has grown up around the subject. I suggest that it is useful to preface the collection of case studies in this book with a short digest of some of the terminology, hopefully in straightforward language. Assessment should be valid, reliable, transparent and authentic. Easily said, but harder to achieve! What do these four words actually mean in practice? I’ve abridged the discussion which follows from my own latest thinking about assessment, presented in ‘Making Learning Happen’ (Race2005).


Valid assessment is about measuring that which we should be trying to measure. But still too often, we don’t succeed in this intention. We measure what we can. We measure echoes of what we’re trying to measure. We measure ghosts of the manifestation of the achievement of learning outcomes by students. Whenever we’re just ending up measuring what they write about what they remember about what they once thought (or what we once said to them in our classes) we’re measuring ghosts. Now if we were measuring what they could now do with what they’d processed from what they thought, it would be better.

“But we do measure this?” Ask students, they know better than anyone else in the picture exactly what we end up measuring. For a start, let’s remind ourselves that we’re very hung up on measuring what students write. We don’t say in our learning outcomes “when you’ve studied this module you’ll be able to write neatly, quickly and eloquently about it so as to demonstrate to us your understanding of it”. And what do we actually measure? We measure, to at least some extent the neatness, speed and eloquence of students’ writing. What about those who aren’t good at writing? Or to be more critical, what about those students who have at least some measure of disability when it comes to writing?

For a long time already, there have been those of us strongly arguing the case for diversifying assessment, so that the same students aren’t discriminated against repeatedly because they don’t happen to be skilled at those forms of assessment which we over-use (such as, in some disciplines, tutor-marked time-constrained, unseen written examinations, tutor-marked coursework essays, and tutor-marked practical reports).

So we’re not really in a position to be self-satisfied regarding the validity of even our most-used, and most practised assessment instruments and processes. But the situation isn’t new—we’ve used these devices forever it seems. That doesn’t make them more valid. But we’re experienced in using them. Admittedly, that makes us better able to make the best of a bad job with them. But should we not be making a better job with something else?


For many, the word reliability is synonymous with ‘fairness’ and ‘consistency’. Reliability is easier than validity to put to the test. If several assessors mark the same piece of work and all agree (within reasonable error limits) about the grade or mark, we can claim we’re being reliable. This is not to be confused with mere moderation, of course. Reliability can only be tested by blind multiple marking. Double marking is about as far as we usually manage to get. Do we agree often enough? No we don’t, in many disciplines.

There are some honourable exceptions. ‘Hard’ subjects such as, areas of maths and science, lend themselves to better measures of agreement regarding reliability than ‘softer’ subjects such as literature, history, philosophy, psychology. By ‘hard’ and ‘soft’ I don’t mean ‘difficult’ and ‘easy’—far from it. Not surprisingly staff are resistant to the suggestion that they may need to undertake yet more marking. “But multiple marking just causes regression to the mean” can be the reply. “And after all, the purpose of assessment is to sort students out—to discriminate between them—so it’s no use everyone just ending up with a middle mark”. “And besides, we spend quite long enough at the assessment grindstone; we just haven’t room in our lives for more marking”.

So why is reliability so important? Not least, because assessing students’ work is the single most important thing we ever do for them. Many staff in education regard themselves as teachers, with assessment as an additional chore (not to mention those who regard themselves as researchers with teaching and assessing as additional chores). Perhaps if we were all to be called assessors rather than teachers it would help. And perhaps better still, we should all regard ourselves as researchers into assessment.


One way of describing ‘transparency’ is the extent to which students know where the goalposts are. The goalposts, we may argue, are laid down by the intended learning outcomes, matched nicely to the assessment criteria which specify the standards to which these intended outcomes are to be demonstrated by students, and also specify the forms in which students will present evidence of their achievement of the outcomes. There is a nice sense of closure matching up assessment criteria to intended learning outcomes.

How well do students themselves appreciate the links between assessment, and evidence of achievement of the intended learning outcomes? How well, indeed, do assessors themselves consciously exercise their assessment-decision judgements to consolidate these links? Students often admit that one of their main problems is that they still don’t really know where the goalposts lie, even despite our best efforts to spell out syllabus content in terms of intended learning outcomes in course handbooks, and to illustrate to students during our teaching the exact nature of the associated assessment criteria—and sometimes even our attempts to clarify the evidence indicators associated with achievement of the learning outcomes are not clear enough to students. In other words, students often find it hard to get their heads inside our assessment culture—the very culture which will determine the level of their awards.

Therefore, we’re not too hot on achieving transparency either. In fact, the arguments above can be taken as indicating that we rather often fail ourselves on all three—validity, reliability and transparency, when considered separately. What, then, is our probability of getting all three right at the same time? Indeed, is it even possible to get all three right at the same time?


This one seems straightforward. It’s about (on one level, at least) knowing that we’re assessing the work of the candidate, not other people’s work. In traditional, time-constrained, unseen written exams, we can be fairly sure that we are indeed assessing the work of each candidate, provided we ensure that unfair practices such as, cheating or copying are prevented. But what about coursework? In the age of the Internet, word processing and electronic communication, students can download ready-made essays and incorporate elements from these into their own work. Some such practices can be detected electronically, but the most skilful plagiarists can remain one step ahead of us and make sufficient adjustments to the work they have found (or purchased) to prevent us from seeing that it is not their own work.

Plagiarism is becoming one of the most significant problems which coursework assessors find themselves facing. Indeed, the difficulties associated with plagiarism are so severe that there is considerable pressure to retreat into the relative safety of traditional unseen written exams once again, and we are coming round full circle to resorting to assessment processes and instruments which can guarantee authenticity but at the expense of validity. However, probably too much of the energy which is being put into tackling plagiarism is devoted to detecting the symptoms and punishing those found guilty of unfairly passing off other people’s work as their own. After all, where are the moral and ethical borderlines? In many parts of the world, to quote back a teacher’s words in an exam answer or coursework assignment is culturally accepted as ‘honouring the teacher’. When students from these cultures, who happen to be continuing their studies in other countries, find themselves accused of plagiarism, they are often surprised at the attitude to plagiarism. Prevention is better than the cure. We need to be much more careful to explain exactly what is acceptable, and what is not. While some students may indeed deliberately engage in plagiarism, many others find themselves in trouble because they were not fully aware of how they are expected to treat other people’s work. Sometimes they simply do not fully understand how they are expected to cite others’ work in their own discussions, or how to follow the appropriate referencing conventions.

It is also worth facing up to the difficulty of the question ‘where are the borderlines between originality and authenticity?’ In a sense, true originality is extremely rare. In most disciplines, it is seldom possible to write anything without having already been influenced by what has been done before, what has been read, what has been heard, and so on.

More assessment terminology: norm-referenced, criterion-referenced, formative, summative.

Norm-referenced versus criterion-referenced

This is simple to describe—but hard to get right! Norm-referenced assessment could be described as a way of creaming off the top layer of the students who gain the highest level of achievement, and so on, with always about the same proportion achieving this highest grade. Criterion-referenced assessment would allow all of the students to achieve the ‘highest’ award if they all reached the relevant standard. We all know that in some cohorts, many more students are worthy of achieving the top standard than in others. Criterion-referenced assessment is, therefore, more objective, but in the competitive world we live in, rank-order creeps in, and with it a tendency to revert to norm-referencing.

Summative and formative assessment

‘Summative’ assessment is often described as end-of-studies assessment—in other words, a measure of how far our students have got in their learning. ‘Formative’ assessment is more about using assessment along the journey of learning, so that students can learn from their mistakes, remedy their deficiencies, and advance their learning. In formative assessment, it is our feedback that is more important than the scores or grades. In particular, it is the feed-forward that is the critically useful part of feedback—the guidance to students about how exactly to go about improving their learning and their performance. But, in a way, all assessment is at least to some extent formative, as even exam marks or grades give students at least a little information about how their learning is going. And all the assessment elements which count towards students’ qualifications are, to some extent, summative. It would probably be wise for us to stop fussing about which assessment elements were intended to be formative or summative, and to concentrate on giving students useful feedback on all the elements of their work which are assessed.

Learning from experience

We all learn from our efforts at designing and implementing assessment. But this is slow, and many wheels end up being reinvented, and many mistakes are repeated. That’s where this collection of case studies comes in. You now have the opportunity to learn from the experiences of a worthy collection of colleagues, and you can avoid reinventing at least some of the wheels, and avoid repeating many of the mistakes.

I shall end this preamble by simply suggesting that as you peruse each case study in this book, you keep asking yourself “what can I use from this, with my own students in my own discipline, to make my own assessment systems, processes, practices and instruments better?”