Performance-Based Assessment

of Secondary Mathematics Student Teachers


Tami S. Martin

Illinois State University

Roger Day

Illinois State University


Abstract:  We describe the use of performance-based assessment of student teachers and discuss its influence on student teachers, their cooperating teachers, and their university supervisors.  We present action research that describes the development, implementation, evaluation, and refinement of a high-inference, criterion-referenced student teacher rating scale that is based on performance attributes.  We report feedback from constituents who have used the process and present an analysis of quantitative data to determine patterns in attribute ratings as well as in student teaching grades.

Statement of the Problem

As part of an effort to reform teaching and learning, national organizations have proposed significant changes in teacher preparation, including a move toward more reliable methods for evaluating teachers’ performance.  One such method, performance-based assessment (PBA), has been a focal point of new standards for the certification of novice teachers, such as those proposed by the Interstate New Teachers Assessment and Support Consortium (INTASC) (1995).  The National Council for Accreditation of Teacher Education (NCATE) has adopted the INTASC principles as benchmarks to be used in NCATE accreditation (Wise, Leibbrand, & Williams, 1997).  As a result, many programs are focusing attention on PBA (Diez, 1997; Kain, 1999).

In this report we describe action research undertaken to study the impact of PBA of secondary mathematics teacher candidates.  We have developed criterion-referenced assessments, linked to national standards, that are designed to provide reliable mechanisms by which to judge teacher candidates’ readiness for success.  We will describe the development, implementation, evaluation, and refinement of an instrument used to assess student teacher performance.  This instrument identifies mathematics student teacher attributes at three performance levels and includes standardized grade criteria linked to a candidate’s level of achievement within these performance levels.  Cooperating teachers, university supervisors, and student teachers use this instrument for midterm and final evaluations during a 12-week student teaching experience.  We also describe and analyze evidence we have collected related to the instrument.

Literature Review

One factor underscoring the need to reconsider methods of student teacher assessment is the routine inflation of student teachers’ grades (Brucklacher, 1998).  Hartsough, Perez, and Swain (1998) describe several types of rating bias that may contribute to this problem.  For example, the halo effect occurs when a rater who is impressed with a student teacher’s overall performance rates the student teacher highly in all categories regardless of observed weaknesses.  Another type of bias is due to logical error.  This may occur when a rater is convinced that two attributes are related and, consequently, rates a student teacher highly on both attributes, despite differing performance in these areas.  At Illinois State University, a review of student teaching grades across programs for two consecutive semesters revealed that 88% of all assigned grades were “A’s” (Kinsella & Stutheit, 1998).

To avoid these biases, performance assessments have been developed, including professional teaching portfolios and final exhibitions, that incorporate a checklist approach (Kain, 1999; University of Indianapolis, 1999).  More comprehensive schemes aimed at preservice or novice teachers have also been devised (e.g., California New Teacher Project, Beginning Teacher Support and Assessment Program [Yopp, 1999], Classroom Observation and Assessment Scale for Teacher Candidates, Competency Based Teacher Education Scale, Preservice Teacher Rating Scale [Hartsough, Perez, & Swain, 1998]).  Users of these performance assessments have documented substantial contributions to preservice teacher development, including increased self-reflection and improved performance.  Most of these instruments, however, are not content specific.  They are designed for teachers at any grade level, teaching in any content area.  Through our performance attributes and associated criteria for grade assignment, we present a framework for evaluating the performance of preservice mathematics teachers.

Modes of Inquiry

Our collection and analysis of data have resulted from two years of an iterative process involving the development, implementation, evaluation, and revision of a student teacher performance assessment instrument.  Prior to using the criterion-referenced instrument, each university supervisor relied on his or her own knowledge, experiences, and preferences to grade student teachers, primarily from a norm-referenced perspective.  (University supervisors assign two grades to each student teacher, such as A/A, A/B, B/B, and so on.) To move toward a more consistent evaluation scheme, as well as one with a more justifiable frame of reference, mathematics faculty who supervise secondary mathematics student teachers developed a high-inference, criterion-referenced rating scale.  In the first year, we developed 12 student teacher attributes (e.g., content mastery, planning, organization, and rapport with students) with behavior descriptors in categories of “exceptional,” “satisfactory,” and “inadequate” ( .

Upon completion of a pilot semester (spring 1999), we reviewed categorical ratings and final grades for student teaching, analyzed feedback from supervisors, cooperating teachers, and student teachers, and refocused our effort to align our program with national standards.  Based on our analysis of the data, we made several changes the second year (spring 2000), including the consolidation of some attributes and the addition of new ones.  These revisions resulted in 11 student teacher attributes (e.g., assessment: multiple forms, conceptual/procedural balance, communication, and professionalism) with accompanying behavior descriptors in categories of “exceeds expectations,” “meets expectations,” and “does not meet expectations” (  We also conducted a half-day training seminar for all university supervisors before the second year of implementation.  After reviewing second year data, we may revise the attributes and performance descriptors again.  We also have obtained funding to bring the cooperating teachers to campus in early spring 2001 to discuss PBA.


Student Teaching Grades

Review of student teaching grades for the five years prior to the use of PBA and grades for the two years applying PBA shows that student teaching grades may be beginning to change.  The percentages of A/A’s earned by student teachers during 1999 (64%) and 2000 (50%) are lower than the average percentage of A/A’s earned over the previous five years (70%), although the percentage of A/A’s for 1999 is fairly similar to the percentages in 1995 (64%), 1997 (67%), and 1998 (61%).  The remainder of the grade distributions for 1999 and 2000 are similar to those for the previous years.  Although the pattern of A/A grades for 1999 and 2000 may indicate some change, it is premature to draw conclusions based on two years of implementation of PBA.

Attribute Rating Assignment

To assess whether student teachers, cooperating teachers, and university supervisors were in agreement in their attribute rating assignments for student teachers, we compared raters’ final evaluations of each student teacher.  In the first year, we obtained cooperating teacher and university supervisor evaluations for 19 of the 26 student teachers.  For a variety of reasons, 7 of the 19 sets of evaluations contained only one evaluation form.  For the remaining 12 students, there was agreement among the raters on 104 of 144 possible attributes, constituting a rater agreement of about 72%.  Of the 40 disagreements among raters, a significant majority (72.5%) occurred because the cooperating teacher rating was higher than that of the university.  This may indicate that the tendency to inflate student teacher grades is more severe among practicing school teachers than it is among university faculty.

In the second year, we obtained evaluations for all 20 student teachers.  In addition to being rated by a university supervisor, each student teacher was self-rated and rated by at least one cooperating teacher.  For all 20 student teachers, the average overall rater agreement (the number of attributes for which there was agreement among all raters) was 54%.  We also computed pair-wise rater agreements between the student teacher and all cooperating teachers, between the student teacher and the university supervisor, and between all cooperating teachers and the university supervisor.  These percentage agreements were 73%, 63%, and 65%, respectively.

Evaluation Questionnaires

At the end of each year, questionnaires were sent to all student teachers, their cooperating teachers, and the university supervisors.  Response rates were 31.7% and 80% for 1999 and 2000, respectively.  Here we provide some of the feedback that goes beyond general positive comments.

A common first-year expression of concern about the student teacher attributes descriptions was that the three levels of performance (exceptional, satisfactory, unsatisfactory) did not adequately address performance differences.  Respondents suggested that one or more additional levels be included on the scale of attribute performance.  Responding to the revised instrument in 2000, with three re-named levels of performance (exceeds expectations, meets expectations, does not meet expectations), only 1 of 39 responses mentioned levels.

The 1999 feedback on the student teaching grading criteria focused on how a student teaching grade corresponded to the attribute categories.  Several people requested more specificity about how many “exceptional”  attributes  constituted a “significant majority” as stipulated for an A/A student teaching grade.  Others appreciated the leeway of the high-inference grading scale.  One experienced university supervisor commented on how these criteria caused him to give a lower grade than he would using his previous grade-assignment scheme.

In 2000, one supervisor and one classroom teacher commented on grade assignment.  This supervisor suggested that “cooperating teachers are concerned with lower-than-usual grades” and the classroom teacher expressed concern about the apparent lack of ISU policy regarding the use of performance assessment: “ISU is putting their math graduates at a disadvantage.” A classroom teacher suggested that letters of recommendation and a student’s professional portfolio make better assessment instruments than do letter grades.  Three student teachers commented on the university supervisor role in the grading process.  They noted that a supervisor cannot expect to see all the various attributes within a limited number of classroom visits or that, due to the high-inference nature of the grading scale, each university supervisor may evaluate differently.  One of these three suggested that one grade in the double-letter grade be assigned by the university supervisor and the other by the cooperating teacher.

Year 1999 comments on the mid-term and final evaluation instruments offered suggestions for format improvements.  One respondent also offered a suggestion about the process:

I think it would be helpful to be observed and evaluated by different people.  More opinions would give a more holistic and unilateral [sic] way to grade.  Even with this rubric, all teachers grade differently and have different expectations and standards.

In 2000, a university supervisor suggested that cooperating teachers meet with supervisors and program personnel before student teaching.  A student teacher/cooperating teacher pair asked whether the evaluation process was based on achievement throughout the semester or on the final achievement reached by a student teacher.  The cooperating teacher said that, “If the student teacher has improved and is doing great work at the end, they should receive a high grade.” This clearly had been an issue among that triad of raters.

Each year we solicited general comments on student teacher evaluations.  A 1999 student teacher raised the issue of accountability, describing a situation in which a colleague staged the replay of a previously taught lesson on the day that the supervisor was visiting.  Year 2000 respondents offered a variety of comments.  A university supervisor suggested that student teachers should not evaluate their own performance: “This puts him/her in a difficult position as he/she has little or no basis for comparison.” This comment suggests that the university supervisor may have been operating from a norm-referenced perspective.  Student teachers’ comments focused primarily on the roles of the university supervisor and the cooperating teacher.  These comments focused on grade assignment, training of university supervisors, and improved communication.


Two years experience implementing PBA has led us to some preliminary conclusions.  First, university supervisors have addressed grade inflation by decreasing the number of A/A’s assigned to student teachers during the past two years.  However, the same is not true for student teachers and cooperating teachers who have a tendency to rate the student teacher more highly and with greater agreement than does the university supervisor.  One explanation for the differential behavior of university supervisors and others may be that only the university supervisors were trained in the use of performance assessment.  Another explanation may be that the daily contact between student teachers and cooperating teachers generates more information on which to base rating decisions.  However, cooperating teachers and student teachers also have a greater personal investment in the process and may be less able to take an objective view.  Second, we have observed that it is difficult for some users to make the transition from a norm-referenced perspective on grading to a criterion-referenced perspective.  This was apparent in the cooperating teacher comment that the Mathematics Department’s use of PBA was a disadvantage for student teachers in the job market, and in the university supervisor opinion that student teachers are unable to rate themselves accurately without having a basis for comparison.  Finally, despite concerns of fairness to student teachers with respect to who assigns grades and the potential for job placement difficulties due to lower grades, many participants claimed that PBA is a valuable tool for making justifiable, equitable decisions about rating student teachers.



Brucklacher, B. (1998).  Cooperating teachers’ evaluations of student teachers: All “A’s”?  Journal of Instructional Psychology, 25(1), 67-72.


Diez, M. E. (Ed.) (1997).  Changing the practice of teacher education: Standards and assessment as a lever for change.  Washington DC: American Association of Colleges for Teacher Education.


Hartsough,  C. S., Perez, K. D., & Swain, C. L. (1998).  Development and scaling of a preservice teacher rating instrument.  Journal of Teacher Education, 49(2), 132-139.


Interstate New Teacher Assessment and Support Consortium. (1995).  Model standards for beginning teacher licensing and development: A resource for state dialogue (working draft).  Washington, DC: Council of Chief State School Officers.


Kain, D. L. (1999).  On exhibit: Assessing future teachers’ preparedness.  Action in Teacher Education, 21 (1), 10-23.


Kinsella, J. & Stutheit, S. A. (1998).  A study of credit/no credit vs. traditional grades in the assessment of student teaching performance.  Unpublished manuscript, Illinois State University.


University of Indianapolis. (1999).  Teacher education program handbook 1999-2000.  Indianapolis, IN: Author.


Wise, A. E.,  Leibbrand, J. A.,  & Williams, B. C. (1997).  NCATE’s response to critical issues in teacher preparation today.  Action in Teacher Education, 19(2), 1-6.


Yopp, R. H. & Young, B. L. (1999).  A model for beginning teacher support and assessment.  Action in Teacher Education, 21(1), 24-36.