Appendix A
Student Surveys: Some Technical Matters

A.1 Introduction

This Appendix is not intended to be a complete analysis of technical problems associated with conducting student surveys. It deals only with issues we have encountered and which we think are important.

A.2 Validity and Reliability

Sooner or later, the conversation at the committee meeting or in the faculty lounge turns to student ratings of instructors. It’s a sure bet that within six seconds, someone will announce that ratings are meaningless - students don’t know enough to evaluate the quality of their instruction …What is interesting is that these assertions are invariable offered without a scrap of evidence by individuals with well-deserved reputations for analytical thinking. If someone offered such unsupported arguments in a research seminar, most of us would dismiss both the arguments and the arguer out of hand. In discussions of teaching, however, we routinely suspend the rules of logical inference without a second thought.

— (Feldman1992)

Feldman goes on to analyse a number of myths which seem to be almost universal. The points below use the terminology we have adopted in this manual rather than Feldman’s.

Believers in the myths should simply read the comprehensive reviews of the approximately 2000 research projects about the evaluation of teaching written by Cashin (1995), Marsh (1987) and Murray (1980).

Marsh concludes his most thorough review with these words:

Research described in this article demonstrates that student ratings are clearly multidimensional, quite reliable, reasonably valid, relatively uncontaminated by many variables often seen as sources of potential bias, and are seen to be useful by students, faculty, and administrators. However, the same findings also demonstrate that student ratings may have some halo effect, have at least some reliability, have only modest agreement with some criteria of effective teaching, are probably affected by some potential sources of bias, and are viewed with some scepticism by faculty as a basis for personnel decisions. It should be noted that this level of uncertainty probably exists in every area of applied psychology and for all personnel evaluation systems. Nevertheless, the reported results clearly demonstrate that a considerable amount of useful information can be obtained from student ratings; useful for feedback to faculty, useful for personnel decisions, useful to students in the selection of courses, and useful for the study of teaching. Probably, students’ evaluations of teaching effectiveness are the most thoroughly studied of all forms of personnel evaluation, and one of the best in terms of being supported by empirical research.

—(Marsh1987) [our emphases]

A.3 Data Processing

There is no simple answer to the question of which is the best method of processing the data obtained from student surveys. Choice will depend on the size of the project and the resources available. Table A.1 provides a comparison between the three main options.

Optical Mark Reader

Computer Scanner

Direct Input (e.g., web)

High speed scanning: about 2000 sheets per hour.

Slow scanning: about 15 sheets per minute.

Immediate: questionnaires completed on-line.

Inflexible: requires special pre-printed sheets.

Flexible: client can design questionnaires easily.


Equipment expensive.

Standard office equipment.

Standard office equipment.

Questionnaires completed in class: high response rates, good security.

Questionnaires completed in class: high response rates, good security.

May have poor response rates and security problems. All students may not have ready access to the web.

Table A.1: Different Methods of Processing Survey Data

Thus, a teacher wishing to conduct a formative evaluation with a small class may be happy with slow scanning rates, especially as he/she can easily design the questionnaire without professional help. On the other hand, where large classes are involved, Optical Mark Readers may have to be used because of the numbers involved. Currently, web-based questionnaires are only suitable for classes which meet in a computer laboratory which enables them to be completed under supervision.

A.4 Question Anchors

Most readers will be familiar with the following question format where ‘Strongly Agree’ is rated 5 and ‘Strongly Disagree’ is rated zero:

The lecturer speaks clearly:

Strongly Agree

Strongly Disagree

There are two problems with this format:

A rather better format is:

The lecturer speaks clearly:

All of the time

None of the Time


The lecturer speaks:

Very clearly

Very unclearly

Note that here, the anchors will change with each question which is something that provides the OMR software developers with a problem. It is, however, generally possible to adapt their standard software to accommodate this better practice.

A.5 Further Reading