**Thoughts on teacher evaluation from an assessment expert and member of Michigan’s governor-appointed task force**

A number of years ago, I was working on a project that focused on using portfolios high school students produced in their classes as a way of evaluating their level of achievement. This was exciting work because our team was able to see the types of assignments teachers were using to generate the student work, and the types of products students produced in response to those assignments. Although it wasn’t the direct goal of the project, we realized that students could not show high levels of achievement unless their class assignments gave them the opportunity to show that high level of work.

After we thought about this for a while, the result seemed obvious. When using portfolios of work to assess students’ achievement, the quality of their classroom activities is very important. There was another anecdotal finding that came out of this development project. The students of one teacher did not get very high evaluations for their level of achievement, possibly because the assignments in their portfolios did not call for that level of work.

When we reported the results, we received a number of negative comments from parents and school staff. The teacher was one of the most popular in the school. There must be something wrong with the way we evaluated the portfolios because this was one of their “best” teachers. We reviewed the results and it was still clear that the contents of the students’ portfolios were not up to the level of students from other classes and schools.

For years after, I have been puzzled by the result. Was that really a great teacher and our portfolio procedure was not very good? I believe the answer is “no.” Or, was that a really popular, entertaining teacher who did not demand a lot from the students, and who used fairly formulaic assignments? Of course, I had a vested interest in this project, so I think the answer is “yes.” But this leads me to the challenge that has been presented to educational groups in many states: How can we identify the “good” or “great” teachers and distinguish them from those who are “poor” or worse?

**Start with definitions**

The answer has to begin with a few definitions. What is the definition of a teacher? What is the definition of a good teacher? Defining who is a teacher is more difficult than it would first appear. Should a school administrator be classified as a teacher for the purposes of evaluation? How about a school librarian? What about a teacher’s aide? There are also counselors, temporary teachers, specialty teachers, etc. There are tutors and others working outside the school to help students learn. Are the parents of home-schooled students teachers for the purposes of evaluation? From the perspective of state educational systems, the following definition of a teacher is proposed:

*A teacher is a person who is responsible for assisting a group of students to learn the content defined by a well-structured curriculum during a specified unit of instruction.*

There are a number of important concepts in this definition. First, the teacher is responsible for helping the students learn. Those who are assistants to the teacher, such as teacher aides, do not have that formal responsibility. The person who is assigned as the teacher for a group of students is the person who formally assigns grades, and grade-giving is one way of showing the level of responsibility.

Second, the teacher works with a group of students. This eliminates the case of one-on-one tutoring, but teachers may work individually with members of their assigned group.

Third, there is a structured curriculum. This means there are specific types of knowledge and skills that are the targets of instruction and it is the expectation that students will reach an acceptable level of competence in these areas.

Finally, the definition specified a unit of instruction. This means that the teacher works with the students for an extended period of time. It might be the full academic year, or a semester or quarter, depending on how the school is organized.

If this definition of a teacher is accepted, then it leads to a definition of a “good” teacher. A good teacher creates an environment and develops or selects a series of activities that will facilitate and encourage students to achieve the goals specified in the curriculum. There is an expectation that there is a well-defined domain of knowledge and skills students are expected to acquire by the end of the unit of instruction. If they acquire them, the educational system has succeeded, and the teacher is usually considered at least “good”—and we would hope that all teachers are at least “good.”

**Student differences make a difference**

The components of these definitions provide a framework that is often used for evaluating the educational systems of states. Curriculum documents are developed that describe the desired level of achievement for each school subject and each grade level. Tests are produced for estimating students’ levels of achievement on the defined curriculum, and standards of performance are set on the score scales for the tests. If students exceed the desired standard, the educational system is judged to be good.

However, there is one very important point that is typically left out of this framework. It neglects to include the student as an active participant. Without the students’ cooperation, we cannot know how much they have achieved. They have to be willing to show us their skills and knowledge on tests and classroom activities. The students also have to be willing to learn the material in the curriculum. Unfortunately, there is an implicit assumption that students start at approximately the same place when they begin the work on a unit of instruction.

Once we bring the student characteristics into the discussion of what makes a good teacher, the concept of a “great” teacher can be considered. I propose that a great teacher is one who can help a student who faces challenges when entering the educational system to reach the desired level of achievement.

The challenge might be that the student enters the unit of instruction with less previous knowledge and skills than other students. Or the student might have a native language that is different than the language of instruction. He or she may come from an environment that does not encourage education, etc. Teachers who demonstrate the capability to help these students reach the desired level of achievement deserve the label “great.”

**Observation and student growth = Only a rough estimate**

Given this framework for teachers and teaching, can you identify a good teacher by looking at one? On a more personal note, was the popular teacher I described earlier a good teacher? Is it sufficient for students, parents and administrators to like the teacher, or is it more important that estimates of student achievement exceed expectations?

Many of the commonly used teacher evaluation systems now use a combination of performance measures from the students assigned to the teacher and observational tools that try to capture what is going on in the classroom. In some cases, the results of the two approaches are averaged using weights related to the judged importance of the two types of information. There is strong logic behind this approach because there is research showing that time on task in a classroom is related to the amount learned, and that the way learning activities are structured makes a difference. Based on other research, I am convinced that the ability of the teacher to observe students and determine how to adapt instruction to the needs of the group is very important. Teachers who are flexible in their approach get better results than those who rigidly follow a lesson plan.

Of course, classroom observational procedures used for evaluation need to focus on important features of the classroom activities and be conducted a sufficient number of times to get a sense of what is typical for the teacher. It seems unlikely that a five-minute peek into the classroom will yield a description that represents everything that happens in the classroom.

The use of student performance measures to evaluate teachers is somewhat more complex. The intent is to determine how much the teacher contributes to student growth after other factors are taken into account. The other factors typically include a student’s previous capabilities to learn the academic material, environmental components related to their home environment and peer groups, and school facilities that help a teacher accomplish educational goals. The intent of the statistical methods used to estimate the teacher contribution is to level the playing field so that all teachers are fairly evaluated. As part of a research team (with Jeffrey Wooldridge of MSU and Cassandra Guarino of Indiana University) that has been testing the trustworthiness of these methods, often called value-added models or VAMs, I know there are many features that cannot be brought into the statistical analysis because information is not available or it is too difficult to quantify.

**Student growth measures, questions to consider:**

- How are students assigned to teachers?
- Do teachers work as a team or have an intact classroom?
- What professional development is supplied?
- How much parental involvement is there at the school?
- What is attendance like at the school?
- How many contact hours does the teacher have with the students?

All of these things have an influence on the way instruction is carried out.

A consequence of all this complexity is that observations and student growth estimates only give us a rough idea of the capabilities of the teacher. Most procedures are good at identifying the top and bottom 1-2 percent, but they are not very good at accurately classifying a teacher as above average or below average (see figure below).

Our own research shows an above-average teacher has a noteworthy probability of being classified in the bottom 20 percent because of the limitations in sampling and uncertainties in all of the variables used to assess growth. For this reason, it seems better to give a probability that the teacher is in each of the possible evaluation categories (such as poor, average or above average) rather than make a fixed classification.

If the probabilities are about the same for all the categories, the information about that teacher is very imprecise. If one category has a high probability, then the results can be trusted as being accurate. We have found the level of accuracy has a lot to do with the number of students assigned to the teacher, the amount of contact hours and the accuracy of all of the other information collected about the students as well as the educational setting. Low accuracy is not the fault of the teacher, but of the environment within which the teacher works.

So, can we identify a good teacher when we see one? Maybe, if we watch them for a long time and we agree on the characteristics of a good teacher. But if the goal is an efficient and fair system of teacher evaluation, a quick look or a subjective judgment is not good enough. Carefully collected information and accurate analyses are needed, and even then, fixed classifications will have uncertainty. We need to embrace the uncertainty as diagnostic information that shows how much we can trust the results. High levels of trust are needed before we begin using teacher evaluations to make high-stakes decisions about the careers and livelihoods of our nation’s teachers.

*About the author*

*University Distinguished Professor Mark D. Reckase was appointed by Gov. Rick Snyder to the Michigan Council for Educator Effectiveness (MCEE), a group tasked with defining the system for educator evaluation for the State of Michigan. In addition, Reckase is co-principal investigator on a research project evaluating value-added models funded by the U.S. Department of Education. He teaches courses on psychometric theory and applied educational measurement in the Measurement and Quantitative Methods area of the College of Education.*

For more about the MCEE, including its recommendations for educator evaluation in Michigan, go to www.mcede.org.

References

- Guarino, C. M., Reckase, M. D., & Wooldridge, J. M. (2012). Can value-added measures of teacher performance be trusted? Bonn, Germany: Institute for the Study of Labor (IZA) Discussion Paper No. 6602.
- Harris, D. N. (2011). Value-added measures in education: What every educator needs to know. Cambridge, MA: Harvard Education Press.
- McCaffrey, D. F. (2012). Do value-added methods level the playing field for teachers? Stanford, CA: Carnegie Foundation for the Advancement of Teaching.