Management Observation of Teaching – a critique

London Metropolitan University, in anticipation of the Teaching Excellence Framework, has begun to institute Management Observation of Teaching (MOT). Such observations have been a major cause of stress in schools and Further Education colleges and, as described below, are of dubious value to students. However, they have PR value because managements can trumpet how much they care about quality teaching. What follows is a modified version of an email sent from London Met UCU to the Deputy Vice Chancellor, Peter McCaffery.
Dear Peter
The following items are possibly not an exclusive list, but indicate our current key concerns about the MOT scheme:
The current observations are already in breach of the scheme’s own principles
On Page 4 of the scheme’s description you state that “Observees will be advised that they will be observed at the start of the relevant term”. Given this principle, we would be grateful if you could explain why you have informed staff of the requirement to arrange observations only two or three weeks before the end of teaching.
Health and Safety law has been ignored
Employers are required to consult staff safety representatives “in good time” prior to the introduction of any changes to work practices that might have significant implications for health and safety (Safety Representatives and Safety Committees Regulations, 1977).
There is no doubt whatsoever that lesson observations (other than purely formative and confidential peer observation) do have such implications. The stress associated with such observations has been widely reported in both schools and Further Education colleges and you cannot be unaware of this. For example:
Daily Telegraph, 10th April 2007 – Inspection pressure drives teachers to suicide
Given the appalling record of ill-health and suicide associated with observations and inspections, the schools’ inspectors, OFSTED, recently abandoned graded lesson observations:
At London Met, we note that the MOT is potentially punitive, as described in your guidance: “[MOT] is also judgemental and as such may be used – where appropriate – with other University procedures relating to remuneration, discipline, probation or assessment of competence” (p.4). This punitive element, particularly in a context of continuing redundancies, is guaranteed to cause additional stress among a workforce that is already highly-stressed. At the last Health and Safety Committee, the UCU rep calculated that sickness absence at London Met during 2014-15 had cost the university approximately £2,000,000, a figure that was not disputed. Mental health issues were the single largest reason for absence, and the incidence was above the sector average. Two separate surveys using the Health and Safety Executive’s Management Standards tool have shown that LMU management should be taking action to prevent, not increase, stress.
Therefore, on the basis of the widely-publicised issues associated with lesson observations, can you please explain to us why you have failed to consult with UCU safety reps as required by law and why all previous information requests about the scheme to the VC have been ignored?
The academic staff contract has been ignored
Performance appraisal is a contractual matter and changes to this need to be discussed and agreed with UCU. Moreover, staff workload allocations for 2015-16 do not include time for a new observation scheme.
Is this scheme evidence-based, consistent with the values of the Strategic Plan?
The London Met Strategic Plan specifies “Integrity” as one of our values, including the assertion that “We will base our decisions on robust data and on candid assessment” (p.7).
By contrast, the Sutton Trust’s 2014 review, “What makes great teaching?”, found little evidence for the effectiveness of teaching observations. In particular, a study by Jacob and Lefgren (2008) found a correlation of only 0.2 between the observational judgments of headteachers and value-added measures of their pupils (see p.36 of the Sutton Trust report). A correlation of 0.2 means that only 4% of variation in value-added scores could be predicted by the headteachers’ judgments. The idea that people might be remunerated, penalised or fired on the basis of such miserable predictive validity is beyond belief. It is highly questionable whether time spent by managers on this activity (and the training for it) is time well spent.
The Sutton Trust report concludes: “Ultimately, the definition of effective teaching is that which results in the best possible student outcomes. There is currently no guaranteed recipe for achieving this: no specifiable combination of teacher characteristics, skills and behaviours consistently predicts how much students will learn. It follows that the best feedback to guide the pursuit of effectiveness is to focus on student progress, and that requires high-quality assessment of learning” (p.47).
What consideration has been given to equalities?
Your guidance makes no mention of possible gender bias, bias against ethnic minorities, bias based on sexual orientation, or bias against staff with disabilities. Many studies have identified that some students can have biases against lecturers based on these factors. Observers themselves may hold biases, whether consciously or unconsciously. But what happens when an observer observes a class where there are students with discriminatory attitudes towards their lecturer because of these (or other) factors? How is a lecturer to be protected against the judgments of an observer who fails to realise that there are students whose apparent negative response to the class is based on discriminatory attitudes?
Problems with the suggested indicators for observations

In the Faculty of Life Sciences and Computing (FLSC) staff have received a set of “Suggested Indicators” for the MOT. We do not know if this document is specific to FLSC or university-wide, but these are highly problematic.
  1. Any scheme in any domain of life that leads to rewards or punishments must strive for consistency and lack of bias. The use of “suggested” indicators rather than a definitive checklist allows for all kinds of subjectivity, bias and inconsistency to creep in.
  2. The list of suggested indicators is almost unreadable due to the massive amount of information crammed onto an A4 sheet using tiny print. As lecturers, we are supposed to consider our audience in everything we do. Ironically, then, a process that is meant to assess our performance itself fails on this basic principle. We don’t wish to be overly critical of the individual who put this together, as any of us can occasionally fail to see the forest for the trees, but surely there were other people involved in quality control?
  3. Related to the above, there are far too many criteria on this list of indicators. At the broadest level there are 7 (excluding the one labelled “For University feedback”). But if you count the individual aspects that make these up, there are 66. This is entirely unrealistic. Decades of psychological research indicates that people are extremely limited in their ability to incorporate multiple items into a judgment, even though they may believe otherwise. To take just one example from the literature, British magistrates are meant to take multiple items of information into account when making bailing or jailing decisions. Yet in experimental and observational settings, Dhami and Ayton (20012003) found that magistrates used about one item of information to make their judgments and were often inconsistent. In the observational setting, the information used typically had nothing to do with the characteristics of the case.
  4. If we look at the specific indicators, there are numerous aspects that are vague, contrary to evidence, or problematic in other ways. Many of the issues relate to the non-expert nature of the observer in relation to the topic being taught. The following critique is based on the descriptions in the “Excellent” column.
  • Preparation, planning and organisation. It isn’t clear why much of the associated description falls under this category. For example, why does the use of “real life examples” come into this category rather than (say) “Learner engagement and strategies to clarify understanding”? How does a non-expert observer know what counts as “up-to-date knowledge”? The fact that a paper may have the date 2016 on it doesn’t mean it is up-to-date knowledge; it’s very newness may mean it is unreliable data. Knowledge is a process of accumulation and rejection. Certainly at some levels it is more important to teach key theories that are older, but have stood the test of time. In our School of Psychology, the professional body expects that we ensure historical perspectives are incorporated into our teaching, meaning that older material must to some degree be taught.
  • Session aims / objective / outcomes. What is meant by “contextualised” aims? The description also states: “Explicit links with prior learning and overall learning within the module”. Why must there be explicit links? Can’t you teach discrete topics within a module? And what about modules where different lecturers contribute because of their different expertise?
  • Teaching Methods and approaches employed. The description in this category states: “Responsive to learners and uses varied learning styles”. To borrow an idiomatic phrase, this is not even wrong. Even among those who believe in learning styles, the style is something held by the learner, not used by the instructor (as the guidance indicates it is). In such a situation the instructor is advised to tailor teaching style to the learning style of the student. How you do this when students in a class may have different learning styles is never made clear. But more importantly, the entire concept of learning styles is now mainly discredited. A 2008 review of the experimental literature on this topic was published in Psychological Science in the Public Interest. This found that when teaching styles are tailored to the supposed learning styles of students there is no additional benefit to learning. In other words, there is no robust evidence that learning styles exist.
Elsewhere in this section we find that an excellent lecturer “Engages all abilities to a very high level (active learning)”. The way this is phrased makes it appear that the words before the brackets define the term inside the brackets. They don’t. Active learning is a genuine concept, namely the participation of students in a learning process, as opposed to passively listening. This may overlap with the engagement of students, but isn’t necessarily the same. The first part actually begs various questions. What constitutes a high level and how does a non-expert observer know what a high level is? How does the non-expert observer, or even the lecturer, know what range of abilities exist within the classroom?
There is also a contradiction within this section. On the one hand we read: “Activities clearly matched to content and learners’ levels”, but on the other hand we are supposed to engage lower ability students to a “very high level” (as indicated in the first passage). The observer is apparently supposed to detect that the lecturer is achieving these contradictory aims within the space of an hour’s observation.
  • Quality of the teaching / learning materials. As stated: “Innovative/new”. Is it churlish to point out that something innovative and new is not necessarily good? If a lecturer were to borrow a chimpanzee from London Zoo to deliver an address on what it’s like to be a primate, this would be highly innovative but not much good (though an infinite number of chimps with typewriters might produce Shakespeare’s plays, given an infinite amount of time).
In this section we also read “Exceeds expectations”. Whose expectations are being referred to? The observer’s? The students’? How would the observer know what the students’ expectations are?
Here and elsewhere, the description defines a quality in terms of the same quality. Thus, to achieve a grading of “Excellent” you need to show “Excellent” practice, “Excellent use of…”, etc. What “excellent practice” and “excellent use of…” actually involve is left undefined. Similarly, to achieve a grading of “Excellent” your materials should be of “a high standard” and “professional”, but we are not told what “a high standard” involves (let alone “professional”).
  • Learner Engagement and strategies to clarify understanding. This section includes the passage “Excellent ability to “read” the group”. The placement of the word ‘read’ within quote marks indicates that this is not a properly-defined term. But, more generally, how does an observer determine whether a lecturer is correctly understanding his or her audience? Let’s illustrate with an example. In a new class, a lecturer might observe that there is a student sitting at the back of the room, making comments that can’t quite be made out and maybe whispering to their neighbour. “Reading” this situation often takes a few sessions, but often these students turn out to be quite smart, likeable, and to ask good questions when given the opportunity. In these cases it is better not to treat this as a “behaviour problem” (though there are genuinely disruptive students who do need to be controlled). To an observer who is present for only an hour, this might appear to be the instance of a lecturer who is failing to read or control his/her class, whereas in fact the lecturer is entirely aware of what is happening and managing the situation in an entirely appropriate way.
In this section we also read “Referenced very well to  context (i.e. employability, personal development practice, etc)”. In the current context we won’t apologise for being pedantic, but the “etc” would be more appropriate following “e.g.” rather than “i.e.”. The latter defines, whereas the former gives examples. But within the space of an hour’s observation is it reasonable to expect that lecturers should be demonstrating a commitment to the business agenda (if they should be committed to that at all)? We’re all for students finding employment, and indeed providing them with useful knowledge and skills, but we would like to point out that the purpose of education described in the UN Declaration of Human Rights, to which we are signatory, reads as follows:


“Education shall be directed to the full development of the human personality and to the strengthening of respect for human rights and fundamental freedoms. It shall promote understanding, tolerance and friendship among all nations, racial or religious groups, and shall further the activities of the United Nations for the maintenance of peace”. (Article 26 (2))

In short, the employment context is not the only context and a lecturer should not be downgraded for not mentioning it during the hour in which he or she is observed.
  • Delivery (style, pace, audibility, presence). This section also contains a term that relies on intuitive understanding rather than explicit definition, i.e. “Provides “performance””. One can only be thankful that Stephen Hawking and Noam Chomsky are not being graded on these criteria – both brilliant minds, but not known for their “performance” in delivery (one of us once looked up Chomsky on, and found that a student was complaining that the famous radical talked a lot about politics).
“Students engaged”. This is another indicator where appearances can be deceptive. Just because someone is looking at their instructor intently doesn’t mean they’re engaged. This is summed up in the phrase “The lights are on, but no-one’s home”. Conversely, a person who appears not to be fully attentive may well be concentrating hard on what is said. Where students are genuinely not well-engaged, this can be for reasons not to do with the lecturer. For example, students may be worrying about an impending deadline (a not infrequent occurrence).
“Can be heard”. This is good advice. However, I can think of one senior figure in management who has failed badly on this in recent times, though is somewhat improved now.
“Maintains excellent control”. Why is this descriptor not in the following category?
  • Management of the learning experience (classroom management). “Lecturer well-prepared” – why is this in this category? It is duplicating the very first set of descriptors (Preparation, planning and organisation).
“Sessions start and finish on time”. Prompt starts sometimes do not happen because another lecturer or meeting fails to finish on time.
“Room set up in advance and equipment working”. How is this the lecturer’s responsibility? If a lecturer arrives 10 minutes before a morning class and the computer isn’t working, it isn’t their fault. Furthermore, with an increasingly stretched ISS (Information Systems and Services)  it is not likely that the fault will be remedied by the stated start time.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s