Landscape of
Teacher Preparation
Program Evaluation
Policies and Progress
STAFFORD L. HOOD, University of Illinois at Urbana-Champaign
MARY E. DILWORTH, Education Advisor
CONSTANCE A. LINDSAY, University of North Carolina at Chapel Hill
NATIONAL ACADEMY OF EDUCATION 500 Fifth Street, NW Washington, DC 20001
NOTICE: The project and research are supported by funding from the Bill & Melinda Gates
Foundation. This paper was prepared for the National Academy of Education (NAEd) to inform
and support the work of the steering committee for Evaluating and Improving Teacher Preparation
Programs, including the consensus report. The opinions expressed are those of the authors and
not necessarily those of the NAEd or the Bill & Melinda Gates Foundation.
Digital Object Identifier: 10.31094/2021/3/5
Copyright 2022 by the National Academy of Education. All rights reserved.
Suggested citation: Hood, S. L., Dilworth, M. E., & Lindsay, C. A. (2022). Landscape of teacher
preparation program evaluation policies and progress. National Academy of Education Committee
on Evaluating and Improving Teacher Preparation Programs. National Academy of Education.
National Academy of Education
Evaluating and Improving Teacher Preparation Programs
Landscape of Teacher Preparation Program Evaluation Policies and Progress
1
Stafford L. Hood, University of Illinois at Urbana-Champaign
Mary E. Dilworth, Education Advisor
Constance A. Lindsay, University of North Carolina at Chapel Hill
March 2022
CONTENTS
INTRODUCTION ....................................................................................................................2
FEDERAL, STATE, AND ORGANIZATION POLICIES ....................................................4
Federal-Level Policies and Regulations .........................................................................4
State- and Local-Level Policies and Regulations ..........................................................6
Organization and Other Policies and Influence ...........................................................8
EVALUATION MEASURES AND METRICS ....................................................................13
Proliferation of Program Designs .................................................................................. 13
Evaluation Measures, Metrics, and Data Misalignment ........................................... 16
Accountability Measures ................................................................................................17
Predictive Effectiveness Measures ................................................................................17
EQUITY AND SOCIAL JUSTICE .........................................................................................19
Contributions to Providing More Teachers of Color ..................................................20
CONCLUSION ......................................................................................................................22
REFERENCES .........................................................................................................................25
AUTHOR BIOGRAPHIES ....................................................................................................30
1
The authors would like to acknowledge the important contributions of Carrie Lynn James (doctoral candidate,
curriculum and instruction, University of Illinois at Urbana-Champaign) in the review of the literature and related
studies for this paper.
2
INTRODUCTION
The dialogue on what constitutes quality teacher preparation and how it should be
assessed and evaluated is muddled. It begins neatly with a universal agreement and
aspiration for quality teaching and enhanced PK-12 student achievement, then quickly
scatters when there are attempts to define and weigh key components of academic
excellence. No stakeholder group has the last say. The question “Who is most respon-
sible?” for improving student achievement invariably prompts thoughtful discussion
wherein each sector faults another and each may accept some ownership, but no one
accepts full responsibility. We are well served to fix the problem, not the blame.
Recognizing that accountability is key, particularly to the nation’s citizens, what
emerges is a host of public summative measures intended to satisfy everyone with
basic information and data points that have too often failed to substantively move the
needle toward improved practice and outcomes. Moreover, the evaluation of teacher
preparation programs (TPPs) does not occur in a vacuum isolated from the broader
accountability movement in education, particularly the intensity of its focus to hold
teachers, schools, and districts accountable. Questions about the effectiveness of teacher
preparation, teacher classroom performance, and student achievement outcomes stem
from a variety of sources that are inextricably linked to national, state, and local expec-
tations, policies, and accountability systems. Those in the TPP sector of higher educa-
tion are particularly nimble, being keenly aware of and ever ready with meaningful
responses to probing questions with messaging intended to establish their credibility
and generate support that allows for sufficient time and resources to facilitate design-
ing and redesigning programs.
Our review reveals a multitude of issues that stifle useful evaluations, but we
choose to focus on four key areas primed to leverage equitable TPP evaluation for
future program improvement. First, the paper discusses the national and state policy
authorities that establish large-scale TPP goals and incentives with the power to drive
TPP designs and agenda. Second, the paper turns to prominent professional standards-
setting organizations and other groups and individuals that are participants in the
TPP evaluation sector with considerable influence on framing, creating metrics, and
prioritizing what is deemed as fruitful areas for inquiry. Third, the paper discusses the
impact of rapidly emerging models that require TPP evaluation criteria tailored for
various approaches and standards to be useful to TPPs in their day-to-day work. Last,
the paper discusses the critical need to re-examine all areas of TPP evaluation so that
they capture and employ effective strategies addressing equity and social justice. The
paper includes recommendations for improved alignment and consistency, timeliness
and access, and equity, which may influence TPP evaluation in the future, as well as
promising strategies for consideration.
As we provide an overview of the TPP evaluation landscape between 2013 (the
year of the prior National Academy of Education [NAEd] report on the evaluation of
TPP) and 2020, we also are cognizant of factors and conditions that may stall future
progress. We contend that there are several areas of need in order to enhance formal
and informal evaluations (i.e., better align public and private organizational policies
and regulations, provide more and timely data accessible to TPPs and the public, and
firmly establish an obligation to include matters of equity and social justice in all areas
of the TPP evaluation sector).
3
Evaluations are tools intended to filter fact from fiction by providing what may be
considered snapshots to inform decision-making. Unfortunately, this is not always the
case but should be the intended goal. Specifically, evaluations can identify a course of
action for TPPs to make progress toward achieving their visions for immediate and
long-term goals. Still, nothing is static in the education evaluation space.
2
Since 2013, numerous changes in the TPP context (e.g., proliferation of alternative
routes to certification TPPs) have prompted different evaluation designs and methods.
These new TPP formats have been significantly influenced by national, state, and local
policymakers who are anxious about effectiveness, transparency, and speedy results.
Furthermore, as one should reasonably expect, the strategies for the evaluative inquiry
of TPPs must seriously consider the nation’s uncompromising and partisan views on
social justice that are influenced by the rapidly changing PK-12 student demographic,
which requires a repertoire of culturally responsive knowledge and skills. In addition,
there is widespread recognition that educators must consider students’ social and
emotional needs in order to advance their academic achievement.
The context of the “evaluand” (i.e., the object of the evaluation) is shaped by
economic, political, historical, and cultural factors and dispositions of its primary
stakeholders. This has, in some ways, been manifested in the continuing emphasis on
accountability that relies on indicators as evidence of student achievement strategies
preferred by executive branch initiatives, required by legislative mandates, framed by
federal and state agencies, implemented by TPPs, and consumed by the public at large.
For their part, states, TPPs, and accrediting agencies roll with the tide of innovation and
reform in an effort to secure necessary resources to survive and thrive. All have played
major roles in shaping the context of any evaluation lens that we might use to deter-
mine how well TPPs have succeeded in producing quality teachers. Understanding the
context of a program is critically important to the validity of the evaluative findings
and their usefulness for making formative and/or summative judgments, particularly
if improvement is the priority.
A predecessor of this current paper, Feuer et al. (2013) focus on five categories
in their review of the TPP evaluation landscape at that time: federal government,
national accreditation, states, media/independent organizations, and TPPs. In this
paper, we review the current TPP evaluation landscape with slightly different lenses
by casting attention on influential public policies and organizations that inform and/
or support TPP evaluation and prominent TPP formats, designs, methods, and assess-
ments. Our lenses come from three perspectives: one author is a long-standing insider
in the national teacher preparation and assessment policy arena, one is an academy-
based evaluator and researcher with substantial experience in program evaluation and
assessment (focusing on culture and cultural context), and one is an academy-based
researcher who is well acquainted with emerging education issues in the economic and
public policy domain. At the same time, this collaboration has established a certain
2
There have been many events in the United States that shift and, in some cases, have delayed TPP evaluation
policies and priorities. As the sector evolves rapidly, we acknowledge but have not addressed many of these changes.
Certainly, the disruption of the COVID-19 global pandemic, nationwide civil unrest during the summer of 2020, and
the appending incoherent delivery of PK-12 instruction, coupled with the yet to be imposed current U.S. presidential
administration’s agenda, will bring greater complexity to what had previously been a relatively predictable environ-
mental context for evaluating TPPs.
4
level of symmetry among the co-authors and strengthened a more deliberate focus on
issues of equity and access in the evaluation of TPPs.
FEDERAL, STATE, AND ORGANIZATION POLICIES
Federal-Level Policies and Regulations
It is clear that federal policymakers’ primary interest in teacher preparation has not
changed in decades—they seek quality and accountability. Similarly, the objectives in
the evaluation of TPPs are determining quality, responding to accountability, and iden-
tifying areas for improvement. The importance, energy, and resources devoted to each
is prompted by a variety of factors that pressure TPP institutions and organizations,
namely the sentiment that the current structure or system for teacher preparation is
expendable by failing to provide effective educators in a short enough time and at a rea-
sonable cost. It has been persuasively argued that U.S. public investment in the PK-20
enterprise is insufficient to provide the necessary inputs for system improvement, yet
those in business and industry expect that there should be a return on investment that is
evidenced and documented by quantitative outcomes (Anderson, 2019; Moeller, 2020).
Arguably, one important factor that frames the current TPP evaluation environment
has been presidential initiatives designed to encourage innovation, entrepreneurship,
private investment, and control in public schools generally and public support for
PK-12 charter schools specifically (Grossman & Loeb, 2016). The Higher Education
Act (HEA), through Title II, authorizes programs designated for improving TPPs, but
it has yet to be reauthorized. The annual HEA Title II report is a vehicle that was cre-
ated to provide the transparency and public access heralded by the Bush and Obama
administrations. These reports span more than 15 years, but their release is sporadic,
data are inconsistent over time, and they are challenging for evaluations that require
somewhat more precise metrics.
Building on the bipartisan No Child Left Behind Act, the Obama administration’s
stimulus package, American Recovery and Reinvestment Act of 2009, and the Race to
the Top program established the need to better quantify TPP performance by calling
for proficiency rankings and transparency. Cochran-Smith et al. (2017) assert that while
the Bush administration leveraged education accountability standards generally, it was
the Obama administration that raised the stakes for TPPs and teachers. It was
exacerbated by the Obama Administration’s Race to the Top policies and proposed fed-
eral requirements that states be required to rank teacher education institutions annually
according to metrics established by the federal government, especially measurements
of their graduates’ impact on students’ achievement. (p. 3)
The Obama administration’s agenda was articulated by then Secretary of Education
Arne Duncan in the U.S. Department of Education’s (ED’s) report Our Future, Our Teach-
ers (U.S. Department of Education, 2011). The agenda firmly established proficiency
rankings as desirable and transparency as a requirement. The preexisting annual HEA
Title II report was one tool to provide public access. At one time, states were free to
determine what data they provided to the federal government, which was reported
to be more than 600 pieces of information (Cochran-Smith et al., 2018). Criticism from
5
the Obama administration and a U.S. Government Accountability Office (GAO) report
(2015) indicated that few to none of the TPPs had been identified by states as having
low-ranked preparation programs.
The Obama administration was unsuccessful in its attempts to strengthen the Title II
legislative language through the reauthorization of HEA, settling in 2014 for a strategy
of modifying the regulations that monitored its implementation. The proposed 2014
Title II regulations were reportedly opposed by both public and professional asso-
ciations, with the American Association of Colleges for Teacher Education (AACTE)
vocalizing opposition that it
represented an unfunded mandate for schools, states, and higher education institutions;
they impeded the recruitment of a diverse teacher workforce, particularly in high need
areas; and they tied federal aid to preparation program evaluation based on expansion
of an untested system. (Cochran-Smith et al., 2018, p. 28)
The regulations were ultimately approved in 2016, only to be repealed early in
the Trump administration (Brown, 2017). The current HEA Title II Part A consists of a
competitive grant program for a select group of TPPs and reporting requirements for
accountability that are intended to track TPPs and improve program quality (Kuenzi,
2018).
The 2016 Title II regulations had established a framework for evaluating TPPs that
required states to extensively report data that included, for example, TPP graduates’
passing rates on state certification assessments, graduation rates, enrollments, student
demographics, and other related program data for the purpose of ranking their TPPs
and identifying those deemed to be low performing or at risk based on their criteria
(Hegji, 2018; U.S. Department of Education, 2016). The 2016 Title II regulations required
the establishment of a “federally mandated, state enforced data system designed to
measure teacher education quality by requiring significant and controversial new meth-
ods of scoring, ranking, and funding teacher preparation programs” (Cochran-Smith et
al., 2018, p. 55). These regulations provided directives for how states should evaluate
their TPPs and then rank them with federal funding being the reward or withheld to
be punitive. Primarily, the federal directives to evaluate TPPs were intended to use
“meaningful data” that are indicative of outcomes such as students’ performance on
measures of academic achievement (Cochran-Smith et al., 2018, p. 59).
Since 2017, other policy attempts of note relative to teacher preparation evalua-
tion are reflected in the reauthorization of Every Student Succeeds Act (ESSA) of 2015.
ESSA’s Title II: Preparing, Training, and Recruiting High Quality Teachers, Principals, and
Other School Leaders Part A: Supporting Effective Instruction included a provision for state
education agencies to provide funding to TPPs with the requirement that they
award a certificate of completion (or degree) to a teacher only after the teacher has dem-
onstrated that he or she is an effective teacher, as determined by the state; and limiting
admission to the academy to prospective candidates who demonstrate “strong potential
to improve student achievement” (Section 2002(4)). (Skinner, 2019, p. 10)
In the absence of new mandates, TPPs continue to labor, building and submitting
reports that comply with the preexisting requirements. The Trump administration
6
was relatively silent about the importance of teaching and teacher education reform. It
messaged to the public that data collection and transparency are superfluous and did
not issue a comprehensive report on Title II data since Trump’s first summer in office
(reflecting state TPP reports from 2012-2013). Since 2017, the political temperament
toward TPPs can be characterized as one of benign neglect that has diminished interest
in leveraging evaluation as a critical activity.
Federal Data Systems
The federal sector could leverage current investments more effectively for TPP
evaluation. For instance, in the short term the federal sector could create a user-friendly
system in which researchers can link data sets such as the Integrated Postsecondary
Education Data System (IPEDS) and HEA Title II, and in the long term create a com-
prehensive data set that encompasses teacher preparation, accountability programs,
and competitive grant programs that can be used to drive innovation.
IPEDS is one comprehensive federally sponsored program that is under-utilized.
Housed in ED’s National Center for Education Statistics, it serves as a primary source
for postsecondary education data and includes a variety of user-friendly tools (e.g., data
trends that often are not made public elsewhere while being widely used for research
studies). Although IPEDS’s data could be an important performance metric for TPP evalu-
ation, it has a number of protocols that make it challenging for the average user to accu-
rately disaggregate and analyze discipline-specific information such as teacher education
(Dynarski et al., 2015). For instance, as an AACTE Issue Brief (King, 2020) report states,
Institutions completing the IPEDS survey are instructed to include all degree programs
offered, even if no degrees were awarded in that field in the subject year. As a result,
these figures include institutions that reported having an education program but that
awarded no degrees in the subject year. (p. 7)
Unfortunately, there is no federal data set on enrollment in education programs, so
there is no systematic way to identify programs that award few degrees but have robust
enrollment. Furthermore, federal data sets fall short when tracking the demographics
of teacher candidates and programs with reports often relying on scores of other public
and private data sets to fill information gaps. The definition of terms, selection of items,
and schedules for data collection by ED make it challenging, if not impossible, for poli-
cymakers to identify and use certain data points with confidence. It is apparent that one
coherent federal data system that reflects TPP candidates’ demographic characteristics,
completion, and placement would provide critically important information for state
and local policy decisions and should be a federal priority investment.
State- and Local-Level Policies and Regulations
There has been a long-standing question in the U.S. educational policy arena about
the extent to which the federal government should be involved in and influence state
education policies as well as their implementation. As should be the case, state and
federal education policies have a considerable impact on the evaluation of TPPs with
the recognition that the responsibility for education constitutionally resides with the
7
states. At the same time, the federal government is often seen as encroaching on states’
responsibility for education with its considerable influence and funding.
Congress authorizes, appropriates, and targets funding to states and postsecond-
ary institutions for specific areas of operation, including educator preparation and
professional development and student financial aid. The federal government develops
guidelines and regulations aligned with these policies, which to a greater or lesser
extent require performance assessment as an accountability measure in the evalua-
tion of outcomes. Because there is no national evaluation system per se, the legislative
branch of government directly and indirectly incentivizes a fragmented system of TPP
evaluation.
The individual states provide the most likely examples of TPP evaluation systems,
as they approve TPPs, determine how they are evaluated, and decide what assess-
ments or other tools are used for these purposes. Each state and territory holds quality
teaching and learning to be of utmost importance in its responsibility for education. In
their efforts toward quality teacher preparation, they work in close collaboration with
regional organizations such as the Southern Regional Education Board and national
ones that include the Council of Chief State School Officers (CCSSO) and the National
Association of State Directors of Teacher Education and Certification (NASDTEC) to
promote key principles of practice while advocating for state and federal legislation
that will support their agenda. In this effort there is a heavy reliance on the standards
and expertise of specialty groups and organizations such as the National Association
for the Education of Young Children, the Council for Exceptional Children, and the
National Council of Teachers of English, to mention a few, that provide their review
and fine-tuning of requirements in approving TPPs.
The indicators of TPP quality are elusive, although they are typically grouped
around basic, well-established principles of instruction and student learning that
include subject-matter knowledge and student engagement. However, these valued
principles proliferate into a wide assortment of indicators depending on whose judg-
ments and preferences are prioritized in setting the standards that form the basis for
these judgments. State longitudinal data systems can and should play important roles
to inform these judgments but there is considerable need for their refinement.
State and Local Longitudinal Data Systems
Di Carlo and Cervantes (2018) highlight concerns in consistency and access to state
data that potentially can contribute to research and evaluation of TPPs particularly on
matters of educator diversity. The limited racial and ethnic representation in the PK-12
educator workforce is widely recognized as a national issue, yet ED’s Office for Civil
Rights’ biannual Civil Rights Data Collection does not require states to report this
information. The authors effectively argue that a central, nationwide collection and
promulgation of these data is the best way to ensure comprehensive availability to the
public and can contribute to a more complete view of areas of need and resource to
effectively fund programs and policies. Furthermore, the majority of states collect this
information, but they are free to define demographic categories (e.g., include or omit
“mixed race” that is a rapidly growing cohort in this nation’s population). Lastly, the
absence of a national and transparent data set can stifle TPP recruitment efforts for
8
candidates of color as well as interstate reciprocity for licensed educators. The chal-
lenge here is that the fractured nature of education governance does not ensure the
consistency of data collection across states.
While there is recent work suggesting that matching teachers and students by race
has a positive impact on PK-12 students of color, in particular (Cherng & Halpin, 2016;
Egalite et al., 2015; Gershenson et al., 2018; Redding, 2019), capturing diversity data and
program’s diversity impact present formidable challenges. Fenwick’s (2021) compre-
hensive review of TPP evaluation in the states highlights the wide range of authorities
and directives that are intended to inform policy but at the same time distract TPPs
from their essential teaching and learning missions.
Even though the states hold the primary authority for TPP approval and evaluation,
the influence of local school districts cannot be overlooked—particularly large urban
and suburban school districts—as they also engage in formal and informal evaluations
of teacher preparation. Lastly, one frequently untapped data source is human resources
data that can be found at the district or state level (Goings et al., 2021). The fact that
these data are now able to link TPPs and student performance also comes as a result
of efforts to develop summative evaluations for teachers.
Clearly, greater coordination of national, state, and local data collection efforts will
yield TPP evaluations that are useful and meaningful to institutions and to the con-
stituents that policymakers serve. At the same time prominent professional standards-
setting organizations and others also significantly influence the frames, metrics, and
priorities for TPP evaluation.
Organization and Other Policies and Influence
Accrediting Organizations
The most recognized players in the evaluation context are the two federally approved
TPP accrediting groups (the Council for the Accreditation of Educator Preparation
[CAEP] and the Association for Advancing Quality in Educator Preparation [AAQEP])
and other organizations that rate TPP performance. Program accreditation is often
frustrating to institutions that are subject to their requirements. Yet, the enterprise
continues to grow. Some TPPs do not see the necessity of national accreditation given
the associated financial costs and labor-intensive exercises associated with it when state
program approval suffices for teacher credentialing within the state. In recent years,
there has been a reconfiguration of accrediting agencies, with new entities arguing that
their approach is what the universe of TPP needs to move forward. Generally, each
agrees that TPP quality is important and that TPPs should be engaging in ongoing
improvement resulting in the enhanced academic and life success of U.S. students, but
this sentiment does not distinguish one organization from another.
One key player is CAEP, which represents a “strategic union” between its predeces-
sor accrediting agencies—the National Council for Accreditation of Teacher Education
(NCATE) and the Teacher Education Accreditation Council (TEAC). It proclaims a new
direction in the accreditation of TPPs that is more evidence based and congruent with
the national trend of data-driven accountability, while also endorsing the revisions
to the Title II regulations with its existing standards (Cochran-Smith et al., 2018). The
9
standards initially developed by CAEP were widely publicized to be congruent with
the call for accountability that was strongly echoed by the Obama administration’s
programs and initiatives. Cochran-Smith et al. strongly assert:
The CAEP standards seemed intended to appease both policy makers who worked
from the neoliberal logic underlying the era of accountability and members of the pro-
fession who were resistant to the logic. (2018, p. 85)
However, Cochran-Smith et al.’s further assessment of CAEP was that in its “claims
to be revolutionizing accreditation in terms of the content dimension of accountability,
it was similar in many ways to accreditation through NCATE and TEAC at least on
the surface” (2018, p. 85).
One newcomer in the TPP accreditation arena is AAQEP. Founded in 2017, AAQEP
reports accrediting 25 TPPs and in 2021 received Council for Higher Education Accredi-
tation recognition as an accrediting organization. Clearly, AAQEP is intended to provide
an accreditation alternative to CAEP as the main accreditor of TPP—one that is more
inclusive through strong collaborative partnerships with TPPs and intentional and
direct involvement with PK-12 educators and administrators. Therefore, it is reasonable
to suggest that CAEP left room for a new player to enter the game. In articulating its
standards, the AAQEP website uses terms such as “culturally responsive practice” and
“community/cultural context,” conveying the message of inclusiveness that overlaps
the TPP and the community that its graduates serve.
Cochran-Smith et al. suggest that AAQEP could have promise as it
emphasizes diversity and equity in their procedures suggesting that standard solutions
to local challenges will not suffice.… [There is] emphasis on teacher candidates’ class-
room performance rather than their impact on tested achievement of eventual students;
and support of innovations and variations in keeping with diverse local contexts and
communities. (2018, p. 179)
Both CAEP and AAQEP continue to tweak their messaging, but their ability to survive
and thrive hinges to a great extent on state and local policymakers’ understanding of
cost and benefit value for the communities that they represent.
The National Council on Teacher Quality (NCTQ), created in the early 2000s as a
private advocacy organization for improving the quality of teacher preparation, has
the loudest voice within the education sector. While it is not an accrediting organiza-
tion, it is closely affiliated with influential, conservative, and reform-minded groups
and policymakers, such as the Thomas B. Fordham Institute, that have been critical of
the teacher education establishment for many years. NCTQ’s mark continues to be its
highly publicized TPP rating and ranking system and subsequent reports, which are
criticized by researchers and teacher educators based on their allegedly flawed meth-
odology, minimal samples, and unsubstantiated conclusions. NCTQ initially focused
primarily on input-based standards that included entry criteria, syllabi, and student
teaching, as examples. The TPPs were rated on a five-point system for each of the
standards, which then provided a composite score to determine program ranking
(Cochran-Smith et al., 2018).
10
The 2015 NCTQ report State of the States: Teaching Leading and Learning was con-
spicuously released toward the end of the second Obama term and is perceived as an
attempt to revise the TPP evaluation space through the proposed revisions of the Title
II regulations. The NCTQ report responded positively to the more performance-based
approach in evaluating teacher effectiveness, indicating that this was broadly evident
in state policy.
NCTQ’s January 2017 report Running in Place: How New Teacher Evaluations Fail to
Live Up to Promises was not as favorable about the progress that had been made in the
evaluation of TPPs since its 2015 report. This is not surprising because the revised Title
II regulations of the Obama administration had only been approved in October 2016
after the failed approval of the revised 2014 regulations. Therefore, it is likely that the
uncertainty of whether the 2016 revised regulations would be implemented by the next
presidential administration probably resulted in a holding pattern for TPP evaluation.
The NCTQ 2017 report noted that some progress had been made by the states to “sig-
nificantly” use student academic growth in teacher evaluation, with 30 states making
it a major priority and 10 states somewhat requiring it, but still another 10 states and
the District of Columbia did not require any “objective” measure of student growth.
The report also argued that 18 of the state education agencies (SEAs) had lax regu-
lations in the credentialing of teachers because the SEAs still provided some teachers
with an “effective” summative rating even if the teachers received a “less than effective”
score on their student learning evaluations. As expected, this report was not received
well by the TPP community. It should also be apparent that there is not full participation
by TPPs in the NCTQ process as the organization continues to generate controversial
ranking reports and is considered to be an agitator by many in the TPP community
with what can be characterized as limited evaluative inquiry of TPPs, based on its
methodology and politicized positioning (Cochran-Smith et al., 2018).
Perhaps the most dominant shadow in this work is cast by AACTE. The asso-
ciation represents more than 700 colleges and universities in the teacher preparation
enterprise, with its current “who we are” statement reporting that it is “dedicated to
high-quality, evidence-based preparation that assures educators are ready to teach all
learners.” Collaborating with other national groups, AACTE generates research and
policy briefs while serving as the primary advocate for TPPs in federal educational
policy and in state educational policy through its affiliate groups. Yet, there have been
tensions between AACTE and the TPP community particularly around connecting
TPP quality to graduates’ effectiveness, as indicated by the subsequent performance of
their students on standardized achievement measures such as a value-added measure
(VAM) approach. Cochran-Smith et al. (2017) suggest that a coalition of AACTE and
other professional associations contributed to the demise of the proposed 2014 Title II
revised regulations. AACTE maintains professional interest in TPP accreditation and
evaluation, but no longer financially supports related activities as it did in prior years.
Nongovernmental Organizations
Aydarova (2020) effectively argues that absent policy limits, certain nongovernmen-
tal, intermediary organizations (IOs) constitute closely knit accountability regimes that
“allow IO actors to amass material, informational, and relational resources to advance
11
their agendas despite seeming opposition to the measures they propose from the edu-
cational community” (p. 4). There are a number of organizations that have a legacy and
thus prestige in the development of assessments that accumulate useful data for TPP
evaluations. Key among them are the Educational Testing Service (ETS), Pearson, and
research and development organizations such as the American Institutes for Research
(AIR), Westat, RAND Corporation, and Mathematica. These organizations stand to
advise the federal government, states, and districts and create assessment data systems
on demand. They are often invisible knowledge brokers, but their work is often filtered
by sponsors and access is restricted. Pertinently, there are a number of national non-
profit organizations that over time have had a keen interest in how TPPs are evaluated.
Aside from the mission of establishing a quality teaching force, their interests range from
responding to the needs and safeguarding the viability of their member constituents to
having some say in the financial resourcing of state and federal policies that may impact
their work. They include but are not limited to CCSSO, NASDTEC, the American Fed-
eration of Teachers, and the National Education Association.
In addition to various organizations and professional associations, the involvement
and influence of philanthropic entities cannot be overlooked. For instance, the Bill &
Melinda Gates Foundation (the Gates Foundation) has made significant funding con-
tributions at multiple levels since 2013, with $34.7 million going to fund five teacher
preparation transformation centers to “develop, pilot and scale effective teacher prepa-
ration practices to help ensure that more teacher-candidates graduate ready to improve
student outcomes in K-12 public schools” (Bill & Melinda Gates Foundation, 2015). The
Gates Foundation announced that this was its “first investment as part of its teacher
preparation strategy … focused on supporting programs that:
Give candidates authentic opportunities to build and refine their skills;
Commit to continuous improvement and accountability;
Ensure that those who prepare new teachers are effective; and
Are shaped by K-12 systems and the communities they serve” (Bill & Melinda
Gates Foundation, 2015).
Yet, it is also important to note Will’s 2018 article in Education Week titled An Expen-
sive Experiment: Gates Teacher Effectiveness Program Shows No Gains for Students. The
Gates Foundation had invested $212 million into the Memphis, Tennessee; Pittsburgh,
Pennsylvania; and Hillsborough County, Florida, school districts as well as in a school
consortium in California beginning in 2009-2010 with matching funds from the districts,
which reportedly totaled $575 million for the initiative to design teacher evaluation
systems that would include both observation rubrics and measures of “growth in stu-
dent achievement.” However, after 5 years, a study by RAND and AIR (funded by the
Gates Foundation) reports no improvement in student outcomes. Will further noted that
the study “found no evidence that low-income minority students had greater access to
effective teachers than their white, more affluent peers, which had been another stated
goal of the Gates Foundation” (2018, p. 9).
It is possible that the Measures of Effective Teaching (MET) Project, the Gates
Foundation’s investment in a 3-year study “on fair and reliable measures of effec-
tive teaching—improving student test scores” whose findings were reported in 2013
12
(Measures of Effective Teaching Project, 2013), was running in parallel with the afore-
mentioned teacher evaluation project. Unquestionably, these investments by the Gates
Foundation have made significant contributions to the evaluative inquiry of teacher
preparation and teacher effectiveness. Grants to certain organizations do have the
potential to leverage criteria on TPP evaluation components. For example, the William
+ Flora Hewlett Foundation’s support for the National Commission on Social, Emo-
tional, and Academic Development and its final report From a Nation at Risk to a Nation
at Hope effectively advanced the need for social and emotional learning in more than
200 pieces of legislation (Shriver & Weissberg, 2020).
Foundations also have the wherewithal to test and substantiate certain research
methods that find their way into evaluation. The concept of value added, for instance,
rooted in the work of agricultural economist William Sanders, was effectively estab-
lished as a key criterion in a number of TPP state and federal grant programs until
its effectiveness was disavowed by researchers in the field (Amrein-Beardsley, 2008;
McCaffrey et al., 2003). As Smith and Smith (2009) contend, many foundations carry a
reputation of bipartisanship, have the opportunity to fund policy-changing strategies
over a sustained period of time, and can serve as a countervailing force in society by
representing views and providing financial support in areas that are different from
those of the government. This situates them in a powerful place.
There are an increasing number of highly regarded professional educators and
economists who have stepped out of the fray to establish organizations that allow
them to promote new TPP evaluation methods that have utility. For example, Edward
Crowe’s Teacher Prep Inspection–US has adapted the British inspection method to the
U.S. context, using inspection teams. It conducts on-site visits, interviews, reviews,
examinations of data quality, and observations of teacher candidates. It has completed
inspections of 180 TPPs in 21 states. Often, TPPs are invited to participate in these and
similar initiatives, being typically identified by reputation and/or through professional
acquaintances. Rarely is there an open call for programs to apply. The process tends to
include the same TPPs (i.e., large research institutions) and omits many minority-serv-
ing and small private colleges. At the same time, there has been some progress made
in the training and participation of evaluators of color who are increasingly involved
in major evaluation projects (Collins & Hopson, 2014). However, their participation is
not as evident in major TPP evaluations and particularly not as lead contractors for
these evaluations.
Influential Reports
Notably a number of reports also influenced the TPP evaluation sector. For instance,
one report, Approaches to Evaluating Teacher Preparation Programs in Seven States (Meyer
et al., 2014), provides a glimpse of how TPPs in one region began to adjust their evalu-
ation priorities in response to the Obama administration’s 2011 publication Our Future,
Our Teachers (U.S. Department of Education, 2011). Focusing on the seven states in the
Regional Educational Laboratory (REL) Central region—Colorado, Kansas, Missouri,
Nebraska, North Dakota, South Dakota, and Wyoming—the report suggests that the
evaluation of TPPs mirrors findings in the 2013 NAEd report in that they are “primarily
state program approval processes, which vary substantially” (Feuer et al., 2013, p. 2).
13
It was noted that TPPs in the REL Central region were increasingly emphasizing mea-
sures “that focus more closely on program outcomes for teacher candidates, practicing
teachers, and their students” (Meyer et al., 2014, p. 18).
A 2015 report by GAO, Teacher Preparation Programs: Education Should Ensure States
Identify Low-Performing Programs and Improve Information-Sharing, is also important in
the context of TPP evaluation. This report was published shortly after the failure to
approve the major revisions to HEA Title II in the 2014 regulations and reinforced that
a major purpose for the Title II report was for states to identify TPPs that were low
performing. However, the GAO findings were that the identification of these TPPs was
minimally evident in the reporting by the states and viewed to be an inefficient or even
a meaningless exercise. The report not only found that seven states had no process for
identifying their low performing TPPs but also that ED officials had not adequately
verified the processes used by states to identify low-performing TPPs. The report fur-
ther strengthens the argument that more useful data needs to be collected from states
in their annual Title II reports that would contribute to assessing TPP quality. Both the
inadequate identification of low-performing and at-risk TPPs by the states and the
less than useful data submitted by states in their annual Title II reports were major
aspects of the revised regulations of 2016 that were approved for the short term. It is
also important to note that a review of the 2017-2018 reported data (a report has yet
to be published) on the Title II website (https://www.ed.gov) indicates that 162 TPPs
were identified as at risk or low performing, a 260 percent increase compared to 2014.
EVALUATION MEASURES AND METRICS
Although TPP formats are vastly different, there are critical components that virtu-
ally all program models purport to include, such as some measure of basic subject-mat-
ter knowledge and clinical field experiences. While accrediting organizations remain a
predominant model for program evaluation, the proliferation of TPP designs has called
forth additional factors for consideration.
Proliferation of Program Designs
The phrase “traditional teacher education” is a misnomer. Since the mid-1980s, the
initial and continuing professional development of teachers has shifted from being
firmly situated in college and university-based programs to a host of new venues
designed to swiftly fill state and local needs in certain disciplines (e.g., science, technol-
ogy, engineering, and mathematics; special education) and rectify the broadening racial,
ethnic, and linguistic gap between PK-12 students and quality educators that work to
teach them (Dilworth & Coleman, 2014; McFarland et al., 2018; U.S. Department of
Education, 2016b). Once challenged by postsecondary institutions as competitors in the
sector, many schools, colleges, and departments of education now host and/or collabo-
rate with them. Today, the roughly 30 percent of TPPs that are classified as alternative
route are hosted by local public school districts; public and for-profit charter schools;
state, regional, and local education agencies; community college systems; foundations;
and nonprofit programs (Fenwick, 2021; U.S. Department of Education, 2016b; Wilson
& Kelly, 2021). These programs vary significantly in design and delivery and operate
14
under various state authorities; thus, “in practice, all states are not requiring that all
providers and programs meet the same standards” (Fenwick, 2021, p. 19). Debatably,
there are no apparent efforts to craft measures that recognize distinctions between and
among program types and at the same time signal program quality.
As TPP formats proliferate, so too grows the need for useful and reliable evalua-
tion frameworks (Bartell et al., 2018). In a comprehensive review of alternative models
of teacher education programs, Cochran-Smith and Villegas (2016) find that studies
address one or more of the following questions:
Is this particular teacher preparation program successfully doing what it claims
to be doing (or wants to be doing)?
What is the evidence for this (and how could it be demonstrated to outsiders)?
How can program faculty and administrators use this evidence or the explanatory
frameworks developed in conjunction with it in order to improve the program
and/or to contribute to the broader knowledge base about teacher preparation?
(p. 463)
These are important questions, but to what extent do they prompt the development
of new qualitative and quantitative measures, as well as evaluative insights, that are of
the most interest to the communities they serve (Wells & Roda, 2016)?
There are a multitude of intersecting entities that direct and inform TPP evaluation.
Key among them are state governing boards and authorities and program accreditation
and licensing organizations. Fenwick (2021), in the comprehensive report A Tale of Two
Cities: State Evaluation Systems of Teacher Evaluation Programs, provides a useful com-
parison of “typical” traditional and alternative route provider and program approval
processes and standards (e.g., admissions, institutional mission, quality of instruc-
tion) (see Table 1). The comparison suggests that the evaluative evidence provided to
decision-makers for determining TPP quality varies by program type with traditional
programs carrying a heavier burden of proof than others.
Teacher residency programs are a case in point. This popular TPP model is highly
regarded as it offers a universally supported preparation component of clinical expe-
rience and at the same time employs individuals as they prepare, which makes the
programs more attractive to individuals of color than traditional programs (Cochran-
Smith & Villegas, 2016; Dilworth & Coleman, 2014; Guha et al., 2016; Papay et al., 2012;
Rice & Brent, 2002) and are often framed within a “third space” (Beck, 2016), in another
word, hybrid spaces that provide for an authentic teaching and learning environment
between campus based and school-based work (Zeichner, 2010).
One element for comparison is a TPP’s effectiveness in preparing new teachers
who are employable and stay in the field. Generally, here traditional programs offer
pass rates on licensure exams and/or hiring and retention data while alternative
route programs offer an assessment and evaluation of candidates for certification and
TPP improvement. Acceptance to teacher residency programs typically require formal
agreements to work in cooperating PK-12 schools while in training and upon program
completion commit to work in these districts. TPP reports to authorizing agencies may
be useful documentation but of minimal use to evaluation. The length of time teachers
15
TABLE 1 Comparison of Typical Traditional and Alternative Route Provider and Program
Approval Processes and Standards
Traditional Alternative (Not IHE-Based)
Admissions criteria
GPA of incoming class
Average licensure/entrance exam scores
Admission and recruitment criteria
Bachelor’s degree from an accredited institu-
tion
Average licensure/entrance exam scores
Target cohort size and a plan for recruiting can-
didates
Institutional mission, vision, goals,
conceptual framework
Narrative evidence of alignment of unit con-
ceptual framework with institutional mission,
vision, and goals
Ownership, governance, and physical location/
address
Budget and revenue sources
Quality and substance of instruction
Coursework and syllabi aligned with CAEP/
state standards with special emphasis on diver-
sity, equity, and inclusion and assessment/data
driven instructional decision making
Planned program of study with required
course content and hours
Student and program rubrics, assessments, and
data aligned with standards
Coursework
Description of instructional modules (typically
online modules) aligned with targeted catego-
ries of certificates
Description of how students are evaluated
Quality of student teaching experience
Fieldwork policies, including requisite hours in
handbook
Qualifications of fieldwork supervisor and
mentor teacher
Record of regularly scheduled observations of
student teaching by university supervisor
Clinical training
Evidence of support during training, clinical
teaching, internship, and practicum
Description of support and communication
between students, cooperating teachers, and
the alternative certification program
Description of conditions under which clinical
teaching may be implemented
Faculty qualifications and orientation
Percentage of faculty with advanced degrees
and PK-12 teaching experience
Percentage of full-time, part-time, and adjunct
faculty
Profile of clinical and internship partner
schools
University orientation for university supervi-
sor, adjunct faculty, and cooperating teachers
Selection criteria for supervisors and cooperating
teachers
Selection criteria for clinical supervisors
Selection criteria for cooperating teachers
Code of professional conduct of staff and stu-
dents
Effectiveness in preparing new teachers
who are employable and stay in the field
Pass rates on licensure exams
Hiring and retention data
Assessment and evaluation of candidates for
certification and TPP improvement
continued
16
from the respective program types remain in the field may provide critically important
and useful information to consider as well. Therefore, it is reasonable to explore the
identification of criteria that may better inform the evaluation of emerging alternative
models and the measures and metrics to be used. Yet, we must also address the limita-
tion of current measures and metrics used to evaluate TPPs, particularly the misalign-
ment of the data that are available.
Evaluation Measures, Metrics, and Data Misalignment
The aforementioned VAMs have represented a field shift in the conception of
teacher and school quality. These measures of teacher performance undergird a larger
movement in education that seeks to rank schools using data generated from test scores
and provide transparent metrics for multiple sets of stakeholders. Teacher quality has
come to mean a teacher’s ability to grow student learning over time as measured by
these models. As data proliferate, all elements of the education system have been influ-
enced by this concept of teacher quality.
TPPs are not exempt from the movement to provide performance metrics indicative
of their production of quality teachers entering the teaching profession. A significant
element of the Race to the Top legislation required that states produce report cards for
each TPP (Crowe, 2011). These report cards were to use data about programs and their
graduates that would ideally link their performance to the academic performance of
the students in the schools where they are initially placed. There was also a desire that
state TPPs should be rated and ranked based on these metrics. Here, the Obama admin-
istration sought to induce improvement in the quality of these programs by making
these report cards public and using summative measures as indicators of quality for
the consumers (e.g., districts, principals, parents) of TPP graduates. Many states imple-
mented these systems and continue to use some form of public reporting for their TPPs.
It is not surprising that these efforts were not without controversy within the TPP
community. In particular, many programs felt strongly about the inappropriateness
of using value-added estimates from their candidates’ students to judge their pro-
grams. Indeed, one might imagine a scenario where certain metrics have unintended
Traditional Alternative (Not IHE-Based)
Success in preparing high quality teachers
Teacher performance assessments adminis-
tered near end of program
Ratings of graduates by principals/employers
Program completers’ self-assessment of knowl-
edge, skills, and dispositions
Impact on PK-12 learning outcomes
Certification procedures
Quality assurances Complaint procedures
Typically 5- to 7-year cycle Typical 3-year cycle, can range up to 7 years
NOTE: IHE = institution of higher education.
SOURCE: Fenwick, 2021.
TABLE 1 Continued
17
consequences that harm programs and do not induce improvement, particularly if the
program places its teacher candidates at high-needs and hard to staff schools (Cochran-
Smith et al., 2016).
The shift in how TPP quality would be evaluated had begun prior to the 2013 NAEd
report, from input indicators to outputs and outcomes based on some form of a per-
formance metric. While the more input-focused metrics for TPPs (admission criteria,
curriculum, faculty, etc.) continue to be argued as important, it is clear that the outcome
and performance types of metrics of TPP effectiveness, such as graduates’ successful
performance on state teacher certification tests, VAMs, and student growth, are more
highly valued through the persistent lens of accountability. At the same time, there is
some appreciation for the value of TPP graduates and principals’ surveys as important
indicators of consumer satisfaction. There does appear to be some consensus that there
should be evidence of a teacher’s contribution to their students’ learning but there is
no consensus about what that evidence should look like and who determines what
evidence is acceptable to show this impact. All of the accrediting entities agree that
teacher and student performance are indicative of teacher effectiveness and TPP quality.
Accountability Measures
An understanding of methods and assessment with regard to TPP evaluation
should have a primary focus on understanding the pressures of accountability now
facing TPPs. As mentioned in the previous sections, the major push for accountability
measures largely comes from ED reporting requirements as formerly espoused in the
Title II regulations, CAEP standards, and the development and widespread adapta-
tion of portfolio-based assessments (Cochran-Smith et al., 2016). What these calls for
accountability have in common is a focus on public summative measures (i.e., mea-
sures that seek to distill performance into a summative rating that captures program
performance). This focus on a single, summative rating represents a true shift in how
these programs are evaluated and is consistent with trends in education accountability
systems. Prior to this current focus, the field relied on state approval of programs, pass
rates on licensure exams, and whether programs and schools met accreditation require-
ments (Donovan et al., 2014).
Predictive Effectiveness Measures
The increased use of student and teacher data in evaluating TPP performance is a
result of many states now having longitudinal data systems and other infrastructure
that make it possible to link teacher preparation candidates directly to the performance
of their students. In particular, the use of student value-added metrics as measures of
TPPs is a natural outgrowth of their use in teacher evaluation systems. However, as
much as value-added models have proven controversial in the PK-12 space, they are
also contested in the teacher preparation space. Additionally, their use as evaluative
measures for programs has not been empirically borne out in the data. For example,
Goldhaber (2019) uses administrative data from the state of Washington to show that
there are minor differences in value added among graduates of preparation programs.
He notes that there are few studies that capture the actual features of preparation pro-
18
grams and workforce outcomes. Similarly, Lincove et al. (2014) find that statistically
robust, value-added metrics can be estimated, but they are sensitive to the selection of
teachers into programs and jobs, decisions about accountability criteria, and the selec-
tion of control variables.
In addition to value-added metrics, some scholars have investigated how other
elements of state-level teacher evaluation systems might be used to judge TPP effec-
tiveness. Some studies show that few individual program requirements are positively
associated with achievement gains (Preston, 2017). Rating instruments each measure a
single underlying construct rather than multiple constructs (Henry et al., 2013). Bastian
et al. (2018) analyze the relationship with the evaluation rating of program graduates
and find that there were significant differences by TPPs, but that it was critical to control
for school context. They argue that evaluation ratings provide evidence on the perfor-
mance of TPPs that is distinct from value added. Using data from the North Carolina
Educator Effectiveness System, they uncovered large variation among and within pro-
grams and found that the ratings on the observation rubrics based on North Carolina
teacher and administrator standards are good predictors of performance because they
capture elements of the preparation program in practice.
A report of the National Academies of Sciences, Engineering, and Medicine (2020)
concludes:
The research base on preservice teacher preparation supplies little evidence about its
impact on teacher candidates and their performance once they are in the classroom.
Preservice programs in many states assess the performance of teacher candidates for
purposes of licensure, but few states have developed data systems that link information
about individual teachers’ preservice experiences with other data about those teach-
ers or their performance. Overall, it is difficult to assess the causal impact of teacher
preparation programs. (p. 6)
Another promising program feature is observation ratings. Using a sample of 44
providers offering 184 programs across Tennessee, Ronfeldt and Campbell (2016) find
that observational ratings such as those from the state teaching evaluation rubric are
associated with student achievement gains.
Portfolio assessment (e.g., edTPA and PPAT) is a highly subscribed tool to gauge a
beginning teacher’s readiness to practice. As of 2018, 45 states had adopted some form
of portfolio assessment (Whittaker et al., 2018). These assessments serve a dual purpose:
to measure candidate performance and to evaluate program performance. These assess-
ments come with recommended cut scores that are aligned with a state’s professional
standards and are subject to local needs and political intent. TPPs can use evidence from
portfolio assessments for continuous improvement when the scores exhibit construct
validity, reliability, and have predictive power (Admiraal et al., 2011). The scores from
the exams can also be used by programs for continuous improvement via compari-
sons to other programs in their home state (Bastian et al., 2016). Bastian et al. (2018)
demonstrate that the edTPA in particular can be a useful way to understand profiles
of instructional practices by TPPs. They also find statistically significant relationships
between the edTPA and the Education Value-Added Assessment System, meaning that
the edTPA can be a useful predictor of eventual teacher performance. Though the edTPA
is most widely used, there are a variety of portfolio assessments available to the field,
19
including the PPAT developed by ETS and loosely aligned with the Interstate Teacher
Assessment and Support Consortium standards, the Texas-sponsored and ETS-devel-
oped Pre-Admission Content Test, the California Teaching Performance Assessment
hosted by the California Commission on Teacher Credentialing, the Resident Educator
Summative Assessment hosted by the Ohio Department of Education, and the recently
defunct Washington State Professional Educator Standards Board portfolio. Critics of
the portfolio assessment purport that it is an additional tool in a movement to privatize
public education because it is often used as a high stakes accountability assessment
that can place significant burdens on the candidates (Whittaker et al., 2018). The edTPA
is grounded in the more senior portfolio assessment (i.e., the well-regarded National
Board for Professional Teaching Standards assessment). Similar to the National Teacher
Examination assessment of the 1970s and the Praxis® examinations of the 1990s, the
edTPA has been highly scrutinized for a host of issues including its relevance to cur-
rent teaching and learning theories, psychometric measures, and impact on under-
represented racial, ethnic, and linguistically diverse groups (Gitomer et al., 2019). More
recent criticisms of the edTPA focus on challenges around norming and validity, and
the lack of sustained oversight by technical committee members (Gitomer et al., 2021).
Although there is a fair amount of controversy surrounding the merits of the assess-
ment (Gitomer et al., 2019; Goldhaber et al., 2017; Peck et al., 2014; Tuck & Gorlewski,
2016), it is still well situated in the initial teacher performance domain. It is apparent
that there continues to be considerable debate regarding the measures and metrics used
to provide meaningful information in the evaluation of TPPs.
EQUITY AND SOCIAL JUSTICES
Targeting groups’ (stakeholders’) positionality relative to school reform and social
justice is particularly important. Underlying this movement toward public summative
measures as evaluators of program success is a critical discussion of what should be
used to evaluate teachers. Cochran-Smith et al. (2016) describe this as a tension between
“thin equity” and “thick equity,” where the former focuses solely on in-school condi-
tions as drivers of educational disparities and the latter focuses on both in-school and
out-of-school factors. The public generally and racially and ethnically marginalized
communities specifically are increasingly weary of evaluation findings that state and
restate the existence of a PK-12 achievement gap between and among White students
and others. They have come to understand that well-prepared teachers and more teach-
ers of color in particular are key drivers of better student performance. Yet, rarely is
this quantitative or qualitative information explicit in proposed or existing legislation
or acted on (Dilworth, in press).
Evaluation is typically recognized as a tool for TPP accountability and program
improvement but fail to appreciate its possibilities as a vehicle to advance institutional
equity and/or the nation’s social justice agenda (Hood et al., 2015a; House, 2019, 2020).
The extent to which TPPs prepare educators who successfully support PK-12 academic
achievement, particularly for racially, ethnically, and linguistically diverse underserved
students, is arguably an important metric that should influence the allocation of finan-
cial and other resources. Therefore, it seems reasonable to recognize and review TPPs
with an evaluative lens that meets quality practice and productivity thresholds. It is
20
apparent that minority serving institutions (MSIs) should be included in this group.
As Petchauer and Mawhinney (2017) posit “policy demands facing teacher education
at this contemporary moment also make this the right time to see MSIs as a collective
unit in teacher education” (p. 6).
Contributions to Providing More Teachers of Color
MSIs are a subset of the postsecondary sector and are distinguished by their
missions, goals, and affiliation. Notably, historically Black colleges and universities
(HBCUs) and American Indian Tribally Controlled Colleges and Universities have
historical roots that bind them in significant ways. Together with Asian American and
Native American and Pacific Islander–serving institutions and Hispanic-serving insti-
tutions, these institutions generate a significant number of educators generally and
teachers of color specifically (Dilworth, 2012; Dilworth & Brown, 2008; Gasman et al.,
2016; Lindsay & Lee, 2018).
The need and merits of a diverse teaching force is well documented most recently
by Cherng and Halpin (2016); Gershenson et al. (2018, 2021); and Gist (2017). There
is a critical need to increase the number of Black, Indigenous, and people of color as
the racial, ethnic, and linguistic diversity of the nation’s PK-12 student population has
grown exponentially. The societal expectation is that all TPPs should recruit and prepare
educators from various cultures and that school districts should do a better job of retain-
ing them in PK-12 classrooms. At the same time, it is evident that this responsibility
has not been fully shared by TPPs as MSIs continue their long-standing tradition to be
more responsive in meeting this need than others.
The reasons for the under-representation of educators of color are complex, varied,
and have changed somewhat over time, including inadequate financial support to
pursue teaching, poorly constructed career ladders, and a limited number of indi-
viduals pursuing teaching degrees who came from distressed urban and rural areas,
completed college, and returned to their home communities. Furthermore, a focus on
accountability measures that include challenging teacher assessment licensure exami-
nations and the dominance of a postbaccalaureate licensure format that adds the cost
of a fifth year of study are deterrents (Carter & Goodwin, 1994; Carver-Thomas, 2018;
Dilworth & Coleman, 2014; King, 1993).
One factor that has influenced the number of potential PK-12 educators generally
and those of color specifically is an increased interest and participation in alternative
routes to licensure. These programs are hosted by IHEs and states, school districts, and
nonprofit organizations and typically provide individuals with the option to be trained
and work and to be simultaneously compensated. The merits of this pipeline are that
individuals enter PK-12 classrooms quickly and qualify for school positions. The short-
coming is that those who are trained through these alternative routes tend to retreat
from the classroom sooner than those prepared in traditional college- and university-
based TPPs (Espinoza et al., 2018). One can reasonably assume that enrollment trends
favoring alternative route programs will continue to rise in MSIs, boosting efforts to
diversify the teaching force. King and Mahaffie (2016) document the contribution of
HBCUs, noting that 16 percent of Black or African American individuals who enrolled
in IHE-based TPPs matriculated in HBCUs, and alternative, IHE-based programs had
21
a higher percentage of students enrolled in HBCUs (4 percent) than that of traditional
IHEs (2 percent).
Secretary of Education Arne Duncan’s 2013 annual report (U.S. Department of
Education, 2013) to Congress on teacher quality notes that 69 percent of TPPs are clas-
sified as traditional, 21 percent are alternative route TPPs based at IHEs, and 10 percent
are alternative route TPPs not based at IHEs. Approximately 37 percent of enrollees in
IHE-based alternative programs are of color and 53.7 percent are of color in non-IHE
based alternative programs.
In their review of effective teacher diversity state initiatives, Dilworth and Coleman
(2014) suggest that there is merit in embracing alternative route teaching and learning
formats, but at the same time there is a need to establish clear and universal standards
and guidelines. Given the successes of MSIs in generating a diverse corps of educators
in any format, evaluation criteria that reflect their work and are grounded in the cultur-
ally responsive program principles should be developed and utilized.
A number of studies and reports have sparked interest in factors that broaden
thinking, theory, and practice in educational evaluation to address issues of access,
equity, inclusion, and social justice. For example, Hood et al. (2015a, 2015b) argue for
leveraging the importance and critical need to view evaluation through a culturally
responsive lens; the National Academies of Sciences, Engineering, and Medicine’s
Monitoring Educational Equity (2019) promotes the quantification of equity indicators
for large-scale data collection; and Wimberly’s 2015 volume LGBTQ Issues in Educa-
tion: Advancing a Research Agenda includes the use of large-scale data sets in examining
LGBTQ education. In addition, there are a number of recent, highly publicized works
that have expanded the discussion of access, equity, inclusion, and social justice in
TPPs and TPP evaluation, including Who Believes in Me?: The Effect of Student–Teacher
Demographic Match on Teacher Expectations (Gershenson et al., 2016); The Importance of
Minority Teachers: Student Perceptions of Minority Versus White Teachers (Cherng & Halpin,
2016); and The Long-Run Impacts of Same-Race Teachers (Gershenson et al., 2018). Lastly,
Dilworth (2018) promotes the idea that there is merit in considering the intersectionality
of teachers’ race, ethnicity, and age as a factor in program assessment and evaluation.
Efforts to provide the public with summative measures and reliance on publicly
generated databases too often omit important qualitative data that can provide con-
temporary and culturally responsive lenses. These data are rarely valued in the state
and federal policymaking domain. As Toldson (2019) states:
Today, researchers routinely separate numbers from people. We use deficit statistics, test
scores, achievement gaps, graduation rates, and school ratings, without a humanistic
interpretation. We also create false dichotomies between qualitative and quantitative
research. (p. 3)
Some advocacy and special interest organizations, such as Excelencia in Educa-
tion, the Urban Institute, and the Albert Shanker Institute, and publications—notably
Diverse Issues in Higher Education—with and without private support, fill a void by
accepting the task of extrapolating quantitative data from large databases and analyz-
ing the information for consumption and consideration in policy initiatives that target
education issues of race, ethnicity, language, exceptionality, and inclusion. They do so
22
in user-friendly technology formats, but also provide technical reports to inform those
in the research and evaluation sector.
It is not necessary to create new models on how to include members of the com-
munity in the evaluation of TPPs, as extensive examples can be found in the literature
on evaluation theory and practice. There are encouraging examples in health, social
work, Indigenous evaluation, and some sectors of education in which community
stakeholders are more substantively included in the evaluation process (i.e., design,
implementation, and interpretation of results) but are not clearly apparent in the evalu-
ation of TPPs nationwide.
The substantive inclusion of community stakeholders in the program evaluation
process is most closely aligned with multicultural validity (Kirkhart, 1995), delibera-
tive democratic evaluation (House & Howe, 1999), culturally responsive evaluation
(Frierson et al., 2010; Hood et al., 2015b), and the Indigenous evaluation framework
(LaFrance & Nichols, 2008). This call for the inclusion of community stakeholders has
also been accompanied by the long-standing one to increase the number of evaluators
of color and those with “shared lived experiences” when conducting evaluations in
culturally diverse communities to strengthen evaluative validity (Collins & Hopson,
2014; Hood, 2001; Hood et al., 2005; Reid et al., 2020). House and Howe (2000) provide
examples of what the deliberative democratic evaluation approach looks like in practice
with Cochran-Smith et al. (2017), offering this approach for consideration to address
democratic accountability in teacher education. Frazier-Anderson et al. (2011) provide
the African American Culturally Responsive Evaluation System for Academic Settings,
applying the Culturally Responsive Evaluation’s lens for the inclusion of community
stakeholders throughout the evaluation process. Numerous chapters in Hood et al.
(2015a) provide examples as to how community stakeholders have been included in
program evaluation in culturally diverse settings. However, the most robust examples
are evaluations conducted in Indigenous communities, primarily by Indigenous evalu-
ators (Cram et al., 2014; LaFrance et al., 2012).
CONCLUSION
For a variety of reasons, evaluations intended to inform policymakers and the public
on TPP performance typically do not meet their goals. Public and private initiatives
that are designed to promote quality teacher preparation, improve PK-12 instruction,
and enhance student learning are advanced, absent thoughtful consideration of evalua-
tion findings. It is counterproductive for TPP institutions and organizations to respond
to various accountability directives without the time and opportunity to understand
their meaning and to make reasonable adjustments in operations before moving to one
politically fueled concept after another.
There are examples of TPP evaluations having an impact on federal or state policies
intended to improve TPPs and TPP procedures (Bastian et al., 2016; Sykes & Dibner,
2009). Yet, since 2013, we find that there is limited information suggesting that these
initiatives have met their program improvement goals. It can be argued that the Trump
administration’s immediate repeal of the Obama administration’s 2016 revisions to the
Title II regulations may have created a vacuum, resulting in a pause in the attention
23
to the evaluation of TPPs. At the same time, repealing these regulations seems to have
signaled that those priorities for TPP accountability were no longer important and were
being left to be addressed by the states. One could surmise that this vacuum hindered
innovation and change at the state and institutional level.
Research has indicated that there is as much variation in teacher outcomes within
TPPs as there is among programs (Goldhaber et al., 2013). The fundamental purpose of
TPP evaluation should be to provide valid and useful information to make evaluative
judgment about TPP performance and program improvement. As we have described,
the countervailing notions and movements that happen in education policy often
work at cross-purposes against these goals for TPPs. Good, sound evaluations offer a
clear path to program improvement if the system allows. What appears to be lacking
are clear, consistent, and transparent goals defined by all stakeholders (i.e., state and
national policymakers, program accrediting agencies, organizations, and the public).
At the same time it is clearly apparent that there should be a central, nationwide col-
lection of useful data to improve the evaluative inquiry of TPPs that includes a current
and accurate compilation of state data. The availability of more comprehensive data
to the public can contribute to a more complete view of areas of need and resources
to effectively fund programs and policies. Key to a more fruitful investment of time,
money, and resources is to retreat from public summative measures by establishing
data systems that accommodate quantitative and qualitative indicators that explicitly
target community needs—candidate outcomes and TPP improvement—and incorpo-
rate equity indicators that are often overlooked.
We believe that this paper provides a reasonably clear snapshot of the TPP evalua-
tion landscape’s complexity that exists within the context of federal and state education
policy environment, varying TPP models, standards-setting accreditation groups, and
influential organizations and individuals. We offer our observations with examples of
how each of these entities influence the development and operation of data systems that
too often generate information with limited utility. In addition, we promote a message
to all that there is a critical need to re-examine all areas of TPP evaluation in order to
capture and employ effective strategies that address equity and social justice.
Certainly, there is more ground to be covered as researchers and practitioners con-
tinue to interrogate, articulate, explore, and refine the TPP evaluation landscape. We
believe one place to start is with a clear and deliberate understanding that TPP evalu-
ation is an essential tool for meaningful program improvement that is the primary
responsibility of TPP providers. Of course, this evaluation of TPP quality and utility
for program improvement must rely on sound evaluation measures and metrics that
do not reify quantitative information as the only real truth or minimize the importance
of TPPs’ social responsibility. We expect that more than a few will disagree with our
call to substantively increase the participation of highly trained and experienced evalu-
ators from marginalized communities in the TPP evaluation landscape. We believe
such participation is not only important in bringing in diverse and culturally relevant
knowledge and experiences into the evaluation process but also, more importantly,
can contribute to the validity of the findings from these evaluations. Particularly, when
these TPPs are major providers of teachers in these communities. The challenge before
us shall not be an easy one to undertake. Nor should it be.
24
With these concluding reflections in mind, we offer the following recommendations
as a place to start the next phase of this important discourse to improve and evolve the
evaluation of TPPs.
Recommendations
Data Alignment, Consistency, Timeliness, and Access
Public- and private-sector agencies and influencers should work to establish a coherent, TPP
data collection system. This system should:
Establish and adhere to data collection schedules that are calibrated with similar infor-
mation-gathering efforts and initiatives
Define terminology and metrics that are current and accommodate the needs and capac-
ity of states, local school districts, and the communities they serve
Expand the capacity for decision-making on the ground (e.g., tailor rankings and report
cards for consumer knowledge and use)
Align TPP state program approval and professional accreditation data collection and
reporting processes into more rapid cycles that allow for ongoing continuous improve-
ment and the formative evaluation of TPPs
Make readily available assistance on methods for appropriately interpreting quantitative
and qualitative data to TPPs, states, and school districts
Equity and Social Justice
Publicly supported TPP data collection activities should:
Encourage the involvement of researchers from all TPP levels and types (e.g., liberal
arts, teacher residency in evaluation initiatives)
Identify and incentivize TPPs in MSIs that are successful in producing teachers of color
Prioritize the participation of evaluators from marginalized communities who have sub-
stantive evaluation training and experience
• Encourageandsupportnongovernmentalorganizations’datareviewandanalysis,par-
ticularly those whose missions focus on traditionally disenfranchised teacher candidates
and communities
Explicitly prioritize diversifying the PK-12 teaching force as one of the most important
goals and establish substantive criteria as a requirement in competitions for research,
practice, and evaluation grants and contracts
25
REFERENCES
Admiraal, W., Hoeksma, M., van de Kamp, M-T., & van Duin, G. (2011). Assessment of teacher competence
using video portfolios: Reliability, construct validity, and consequential validity. Teaching and Teacher
Education: An International Journal of Research and Studies, 27(6), 1019-1028.
Amrein-Beardsley, A. (2008). Methodological concerns about the education value-added assessment
system. Educational Researcher, 37(2), 65-75.
Anderson, L. (2019). Private interests in a public profession: Teacher education and racial capitalism. Teach-
ers College Record, 121(6), 1-38.
Aydarova, E. (2020). Shadow elite of teacher education reforms: Intermediary organizations’ construction
of accountability regimes. Educational Policy. OnlineFirst. https://doi.org/10.1177/0895904820951121.
Bartell, T., Floden, R., & Richmond, G. (2018). What data and measures should inform teacher prep-
aration? Reclaiming accountability. Journal of Teacher Education, 69(5), 426-428. https://doi.
org/10.1177/0022487118797326.
Bastian, K. C., Lys, D., & Pan, Y. (2018). A framework for improvement: Analyzing performance-assessment
scores for evidence-based teacher preparation program reforms. Journal of Teacher Education, 69(5),
448-462.
Bastian, K. C., Henry, G. T., Pan, Y., & Lys, D. (2016). Teacher candidate performance assessments: Local
scoring and implications for teacher preparation program improvement. Teaching and Teacher Educa-
tion, 59, 1-12.
Beck, J. S. (2016). The complexities of a third-space partnership in an urban teacher residency. Teacher
Education Quarterly, 43(1), 51-70.
Bill & Melinda Gates Foundation. (2015, November 18). Gates Foundation awards over $34 million in grants
to help improve teacher preparation programs [Press release]. https://www.gatesfoundation.org/
Media-Center/Press-Releases/2015/11/Teacher-Prep-Grants.
Brown, E. (2017, March 8). Senate overturns Obama-era regulations on teacher preparation. The Washington
Post. https://www.washingtonpost.com/local/education/senate-overturns-obama-era-regulations-
on-teacher-preparation/2017/03/08/b8cf127a-041c-11e7-b9fa-ed727b644a0b_story.html.
Carver-Thomas, D. (2018). Diversifying the teaching profession: How to recruit and retain teachers of color.
Learning Policy Institute.
Carter, R. T., & Goodwin, A. L. (1994). Chapter 7: Racial identity and education. Review of Research in
Education, 20(1), 291-336.
Cherng, H-Y. S., & Halpin, P. F. (2016). The importance of minority teachers: Student perceptions of minor-
ity versus White teachers. Educational Researcher, 45(7), 407-420.
Cochran-Smith, M., Baker, M., Burton, S., Chang, W-C., Cummings Carney, M., Fernández, M. B., Stringer
Keefe, E., Miller, A. F., & Sánchez, J. G. (2017). The accountability era in US teacher education: Look-
ing back, looking forward. European Journal of Teacher Education, 40(5), 572-588.
Cochran-Smith, M., Carney, M., Keefe, E., Burton, S., Chang, W., Fernandez, M. B., Miller, A. F., Sanchez,
J. G., & Baker, M. (2018). Reclaiming accountability in teacher education. Teachers College Press.
Cochran-Smith, M., Stern, R., Sánchez, J. G., Miller, A. F., Stringer Keefe, E., Fernández, M. B., Chang, W-C.,
Cummings Carney, M., Burton, S., & Baker, M. (2016). Holding teacher preparation accountable: A review
of claims and evidence. National Education Policy Center.
Cochran-Smith, M., & Villegas, A. M. (2016). Research on teacher preparation: Charting the landscape of a
sprawling field. In D. H. Gitomer & C. A. Bell (Eds.), Handbook of research on teaching (5th ed.) [eBook].
American Educational Research Association.
Collins, P. M., & Hopson, R. (Eds.). (2014). Building a new generation of culturally responsive evaluators through
AEAs graduate education diversity internship program: new directions for evaluation, Number 143. John
Wiley & Sons.
Cram, F., Kennedy, V., Paipa, K., Pipi, K., & Wehipeihana, N. (2014). Being culturally responsive through
Kaupapa Māori evaluation. In S. Hood, R. Hopson, & H. Frierson (Eds.), Continuing the journey to
reposition culture and cultural context in evaluation theory and practice (pp. 289-311). Information Age
Publishing.
26
Crowe, E. (2011). Getting better at teacher preparation and state accountability: Strategies, innovations,
and challenges under the federal Race to the Top program. Center for American Progress. https://
cdn.americanprogress.org/wp-content/uploads/issues/2012/01/pdf/teacher_preparation.
pdf?_ga=2.193825517.1948718756.1607544112-746937442.1607544112.
Di Carlo, M., & Cervantes, K. (2018, September). The collection and availability of teacher diversity data: A state-
by-state survey [Research brief]. Albert Shanker Institute. https://www.shankerinstitute.org/sites/
default/files/teacherracedataFINAL.pdf?_ga=2.232721567.1848603042.1607545253-1978149100.
Dilworth, M. E. (2012). Historically Black colleges and universities in teacher education reform. The Journal
of Negro Education, 81(2), 121-135.
Dilworth, M. E. (Ed.). (2018). Millennial teachers of color. Harvard Education Press.
Dilworth, M. E. (In press). The absence and probability of effective public policies for teacher diversity.
In C. Gist & T. Bristol (Eds.), Handbook of research on teachers of color. American Educational Research
Association.
Dilworth, M. E., & Brown, A. L. (2008). Teachers of color: Quality and effective teachers one way or
another. Handbook of Research on Teacher Education, 424-467.
Dilworth, M. E., & Coleman, M. J. (2014). Time for a change: Diversity in teaching revisited. National Educa-
tion Association.
Donovan, C. B., Ashdown, J. E., & Mungai, A. M. (2014). A new approach to educator preparation evalua-
tion: Evidence for continuous improvement? Journal of Curriculum and Instruction, 8(1), 86-110.
Dynarski, S. M., Hemelt, S. W., & Hyman, J. M. (2015). The missing manual: Using National Student
Clearinghouse data to track postsecondary outcomes. Educational Evaluation and Policy Analysis, 37(1
Suppl), 53S-79S.
Egalite, A. J., Kisida, B., & Winters, M. A. (2015). Representation in the classroom: The effect of own-race
teachers on student achievement. Economics of Education Review, 45, 44-52.
Espinoza, D., Saunders, R., Kini, T., & Darling-Hammond, L. (2018). Taking the long view: State efforts to solve
teacher shortages by strengthening the profession. Learning Policy Institute.
Fenwick, L. (2021). A tale of two cities: State evaluation systems of teacher preparation programs. American Asso-
ciation of Colleges of Teacher Education. https://3e0hjncy0c1gzjht1dopq44b-wpengine.netdna-ssl.
com/wp-content/uploads/2021/10/AACTE_Final.pdf.
Feuer, M. J., Floden, R. E., Chudowsky, N., & Ahn, J. (2013). Evaluation of teacher preparation programs: Pur-
poses, methods, and policy options. National Academy of Education.
Frazier-Anderson, P., Hood, S., & Hopson, R. K. (2011). Preliminary considerations of an African American
culturally responsive evaluation system. In S. D. Lapan, M. T. Quartoli, & F. J. Riemer (Eds.), Qualita-
tive research: An introduction to methods and designs (pp. 347-372). Jossey Bass.
Frierson, H., Hood, S., Hughes, G., & Thomas, V. (2010). A guide to conducting culturally responsive
evaluation. In J. Frechtling (Ed.), The 2010 user-friendly handbook for project evaluation (Report No. REC
99-12175, pp. 75-96). Division of Research and Learning in Formal and Informal Settings, Directorate
for Education and Human Resources, National Science Foundation.
Gasman, M., Castro Samayoa, A., & Ginsberg, A. (2016). A rich source for teachers of color and learning: Minor-
ity serving institutions. Penn Center for Minority Serving Institutions.
Gershenson, S., Hansen, M., & Lindsay, C. A. (2021). Teacher diversity and student success: Why racial repre-
sentation matters in the classroom. Harvard Education Press.
Gershenson, S., Hart, C., Hyman, J., Lindsay, C., & Papageorge, N. W. (2018). The long-run impacts of same-
race teachers (Report No. w25254). National Bureau of Economic Research. https://www.nber.org/
system/files/working_papers/w25254/w25254.pdf.
Gershenson, S., Holt, S. B., & Papageorge, N. W. (2016). Who believes in me? The effect of student–teacher
demographic match on teacher expectations. Economics of Education Review, 52, 209-224.
Gist, C. D. (2017). Voices of aspiring teachers of color: Unraveling the double bind in teacher educa-
tion. Urban Education, 52(8), 927-956.
Gitomer, D. H., Martínez, J. F., & Battey, D. (2021). Who’s assessing the assessment? The cautionary tale of
the edTPA. Phi Delta Kappan, 102(6), 38-43.
Gitomer, D. H., Martínez, J. F., Battey, D., & Hyland, N. E. (2019). Assessing the assessment: Evidence of
reliability and validity in the edTPA. American Educational Research Journal, 58(1), 3-31.
27
Goings, R. B., Walker, L. J., & Wade, K. L. (2021). The influence of intuition on human resource officers’
perspectives on hiring teachers of color. Journal of School Leadership, 31(3), 189-208.
Goldhaber, D. (2019). Evidence-based teacher preparation: Policy context and what we know. Journal of
Teacher Education, 70(2), 90-101.
Goldhaber, D., Cowan, J., & Theobald, R. (2017). Evaluating prospective teachers: Testing the predictive
validity of the edTPA. Journal of Teacher Education, 68(4), 377-393.
Goldhaber, D., Liddle, S., & Theobald, R. (2013). The gateway to the profession: Assessing teacher prepara-
tion programs based on student achievement. Economics of Education Review, 34, 29-44.
Grossman, P., & Loeb, S. (2016). Improving the teacher workforce. In M. Hansen & J. Valant (Eds.), Memos
to the president on the future of U.S. education policy. The Brookings Institution.
Guha, R., Hyler, M. E., & Darling-Hammond, L. (2016). The teacher residency: An innovative model for prepar-
ing teachers. Learning Policy Institute.
Hegji, A. (2018). The Higher Education Act (HEA): A primer (Report No. 7-5700 R43351). Congressional
Research Service. https://fas.org/sgp/crs/misc/R43351.pdf.
Henry, G. T., Campbell, S. L., Thompson, C. L., Patriarca, L. A., Luterbach, K. J., Lys, D. B., & Covington,
V. M. (2013). The predictive validity of measures of teacher candidate programs and performance:
Toward an evidence-based approach to teacher preparation. Journal of Teacher Education, 64(5), 439-453.
Hood, S. (2001). Nobody knows my name: In praise of African American evaluators who were responsive.
In J. Greene & T. Abma (Eds.), Responsive evaluation: Roots and wings (pp. 31-43). New Directions for
Evaluation, no. 92: Winter 2001. Jossey-Bass.
Hood, S., Hopson, R. K., & Frierson, H. T. (Eds.) (2005). The role of culture and cultural context: A mandate
for inclusion, the discovery of truth and understanding in evaluative theory and practice. Information Age
Publishing.
Hood, S., Hopson, R., & Frierson, H. (Eds.). (2015a). Continuing the journey to reposition culture and cultural
context in evaluation theory and practice. Information Age Publishing.
Hood, S., Hopson, R. K., & Kirkhart, K. E. (2015b). Culturally responsive evaluation. In K. E. Newcomer,
H. P. Hatry, & J. S. Wholey (Eds.), Handbook of practical program evaluation (pp. 281-317). Wiley.
House, E. R. (2019). Evaluation with a focus on justice. New Directions for Evaluation, 2012(163), 61-72.
House, E. R. (2020). Evaluating in a fragmented society. Journal of MultiDisciplinary Evaluation, 16(36), 26-36.
House, E., & Howe, K. R. (1999). Values in evaluation and social research. Sage Publications.
House, E. R., & Howe, K. R. (2000). Deliberative democratic evaluation. New Directions for Evaluation, 85,
3-12.
King, J. (2020, October 26). Institutions offering degrees in education: 2009-10 to 2018-19 [Issue brief]. American
Association of Colleges for Teacher Education.
King, J., & Mahaffie, L. (2016). Preparing and credentialing the nation’s teachers: The secretary’s 10th report on
teacher quality. Office of Postsecondary Education, U.S. Department of Education. https://files.eric.
ed.gov/fulltext/ED576185.pdf.
King, S. H. (1993). The limited presence of African-American teachers. Review of Educational Research, 63(2),
115-149.
Kirkhart, K. E. (1995). 1994 conference theme: Evaluation and social justice seeking multicultural validity:
A postcard from the road. Evaluation Practice, 16(1), 1-12.
Kuenzi, J. J. (2018). Teacher preparation policies and issues in the Higher Education Act (CRS Report R45407,
Version 3). Congressional Research Service. https://fas.org/sgp/crs/misc/R45407.pdf.
LaFrance, J., & Nichols, R. (2008). Reframing evaluation: Defining an Indigenous evaluation framework.
The Canadian Journal of Program Evaluation, 23(2), 13.
LaFrance, J., Nichols, R., & Kirkhart, K. E. (2012). Culture writes the script: On the centrality of context in
indigenous evaluation. New Directions for Evaluation, 2012(135), 59-74.
Lincove, J. A., Osborne, C., Dillon, A., & Mills, N. (2014). The politics and statistics of value-added mod-
eling for accountability of teacher preparation programs. Journal of Teacher Education, 65(1), 24-38.
Lindsay, C. A., & Lee, V. J. (2018, September 5). Which colleges are helping create a diverse teacher work-
force? Urban Institute. https://www.urban.org/features/which-colleges-are-helping-create-diverse-
teacher-workforce.
McCaffrey, D. F., Lockwood, J. R., Koretz, D. M., & Hamilton, L. S. (2003). Evaluating value-added models for
teacher accountability. Monograph. RAND Corporation.
28
McFarland, J., Hussar, B., Wang, X., Zhang, J., Wang, K., Rathbun, A., Barmer, A., Forrest Cataldi, E., &
Mann, F. B. (2018). The condition of education 2018 (Report No. NCES 2018-144). National Center for
Education Statistics, U.S. Department of Education. https://nces.ed.gov/pubs2018/2018144.pdf.
Measures of Effective Teaching Project. (2013). Ensuring fair and reliable measures of effective teaching: Culmi-
nating findings from the MET Project’s three-year study. Bill & Melinda Gates Foundation. http://www.
metproject.org/downloads/MET_Ensuring_Fair_and_Reliable_Measures_Practitioner_Brief.pdf.
Meyer, S. J., Brodersen, R. M., & Linick, M. A. (2014). Approaches to evaluating teacher preparation programs
in seven states (Report No. REL 2015-044). Regional Educational Laboratory Central, U.S. Department
of Education. https://ies.ed.gov/ncee/edlabs/regions/central/pdf/REL_2015044.pdf.
Moeller, K. (2020). Accounting for the corporate: An analytic framework for understanding corporations
in education. Educational Researcher, 49(4), 232-240. https://doi.org/10.3102/0013189X20909831.
National Academies of Sciences, Engineering, and Medicine. (2019). Monitoring educational equity. The
National Academies Press.
National Academies of Sciences, Engineering, and Medicine. (2020). Changing expectations for the K-12
teacher workforce: Policies, preservice education, professional development, and the workplace. The National
Academies Press.
Papay, J. P., West, M. R., Fullerton, J. B., & Kane, T. J. (2012). Does an urban teacher residency increase
student achievement? Early evidence from Boston. Educational Evaluation and Policy Analysis, 34(4),
413-434.
Peck, C. A., Singer-Gabella, M., Sloan, T., & Lin, S. (2014). Driving blind: Why we need standardized per-
formance assessment in teacher education. Journal of Curriculum and Instruction, 8(1), 8-30.
Petchauer, E., & Mawhinney, L. (2017). Teacher education across minority-serving institutions. Rutgers Uni-
versity Press.
Preston, C. (2017). University-based teacher preparation and middle grades teacher effectiveness. Journal
of Teacher Education, 68(1), 102-116.
Redding, C. (2019). A teacher like me: A review of the effect of student–teacher racial/ethnic matching on
teacher perceptions of students and student academic and behavioral outcomes. Review of Educational
Research, 89(4), 499-535.
Reid, A. M., Boyce, A. S., Adetogun, A., Moller, J. R., & Avent, C. (2020). If not us, then who? Evaluators
of color and social change. New Directions for Evaluation, 2020(166), 23-36.
Rice, J. K., & Brent, B. O. (2002). An alternative avenue to teacher certification: A cost analysis of the path-
ways to teaching careers program. Journal of Education Finance, 27(4), 1029-1048.
Ronfeldt, M., Brockman, S. L., & Campbell, S. L. (2018). Does cooperating teachers’ instructional effective-
ness improve preservice teachers’ future performance? Educational Researcher, 47(7), 405-418.
Ronfeldt, M., & Campbell, S. L. (2016). Evaluating teacher preparation using graduates’ observational
ratings. Educational Evaluation and Policy Analysis, 38(4), 603-625.
Shriver, T. P., & Weissberg, R. P. (2020). A response to constructive criticism of social and emotional learn-
ing. Phi Delta Kappan, 101(7), 52-57. https://doi.org/10.1177/0031721720917543.
Skinner, R. R. (2019, October 17). The Elementary and Secondary Education Act (ESEA), as amended by the Every
Student Succeeds Act (ESSA): A primer (Report No. CRS R45977, version 2). Congressional Research
Service. https://crsreports.congress.gov/product/pdf/R/R45977/2.
Smith, M. S., & Smith, M. L. (2009). Research in the policy process. In G. Sykes, B. Schneider, & D. N. Plank
(Eds.), Handbook of education policy research (pp. 372-398). Routledge.
Sykes, G., & Dibner, K. (2009). Fifty years of federal teacher policy: An appraisal. Center on Education Policy.
Toldson, I. A. (2019). No BS (Bad stats): Black people need people who believe in Black people enough not to believe every
bad thing they hear about Black people. Brill│Sense. https://brill.com/view/title/54716?language=en.
Tuck, E., & Gorlewski, J. (2016). Racist ordering, settler colonialism, and edTPA: A participatory policy
analysis. Educational Policy, 30(1), 197-217.
U.S. Department of Education. (2011). Our future, our teachers: The Obama administration’s plan for teacher edu-
cation reform and improvement. https://www.ed.gov/sites/default/files/our-future-our-teachers.pdf.
U.S. Department of Education. (2013). Preparing and credentialing the nation’s teachers: The secretary’s ninth
report on teacher quality. Office of Postsecondary Education.
29
U.S. Department of Education. (2016a, October 31). Teacher preparation issues. 34 CFR Parts 612 and 686
[Docket ID ED-2014-OPE-0057] RIN 1840-AD07. Office of Postsecondary Education. Final regulations.
Federal Register, 81(210), 75494-75622.
U.S. Department of Education. (2016b). The state of racial diversity in the educator workforce. Office of Planning,
Evaluation and Policy Development, Policy and Program Studies Service. http://www2.ed.gov/
rschstat/eval/highered/racial-diversity/state-racial-diversityworkforce.pdf.
U.S. Government Accountability Office. (2015, July). Teacher preparation programs: Education should ensure
states identify low performing, programs and improve information sharing (Report No. GAO-15-598). House
of Representatives, Committee on Education and the Workforce, Subcommittee on Health, Employ-
ment, Labor, and Pensions. https://www.gao.gov/assets/680/671603.pdf.
Wells, A. S., & Roda, A. (2016). The impact of political context on the questions asked and answered: The
evolution of education research on racial inequality. Review of Research in Education, 40(1), 62-93.
Whittaker, A., Pecheone, R. L., & Stansbury, K. (2018). Fulfilling our educative mission: A response to
edTPA critique. Education Policy Analysis Archives, 26(30), 1-20.
Will, M. (2018, June 21). “An expensive experiment”: Gates teacher-effectiveness program shows no
gains for students. Education Week. https://www.edweek.org/teaching-learning/an-expensive-
experiment-gates-teacher-effectiveness-program-shows-no-gains-for-students/2018/06.
Wilson, S., & Kelly, S. L. (2022). Landscape of teacher preparation programs and teacher candidates. National
Academy of Education Committee on Evaluating and Improving Teacher Preparation Programs.
National Academy of Education.
Wimberly, G. L. (2015). Use of large-scale data sets and LGBTQ education. In G. L. Wimberly (Ed.), LGBTQ
issues in education: Advancing a research agenda. American Educational Research Association.
Zeichner, K. (2010). Rethinking the connections between campus courses and field experiences in college-
and university-based teacher education. Journal of Teacher Education, 61(1-2), 89-99. https://doi.
org/10.1177/0022487109347671.
30
AUTHOR BIOGRAPHIES
Stafford L. Hood is the founding director of the Center for Culturally Responsive
Evaluation and Assessment (CREA) and the Sheila M. Miller Professor of Education/
Curriculum & Instruction emeritus in the College of Education at the University of
Illinois at Urbana-Champaign (UIUC). Hood’s research and scholarly activities have
focused primarily on the role of culture/cultural context in program evaluation and
educational assessment and the contributions of African American evaluators during
the pre-Brown v. Board of Education (1930-1954) period. For the past two decades, he
collaboratively established CREA as an international and interdisciplinary community
of researchers, scholars, and practitioners advocating the use of a culturally respon-
sive lens in systematic inquiry across evaluation, assessment, policy analysis, applied
research, and action research. Hood is a fellow of the American Educational Research
Association (2016), a recipient of the American Evaluation Association’s 2015 Paul
F. Lazarsfeld Evaluation Theory Award, conferred an honorary appointment as an
adjunct professor at Dublin City University (School of Education Studies) in Dublin,
Ireland, in 2014, and is a fellow of the American Council on Education (2001-2002). His
membership on many advisory boards and committees includes the Educational Test-
ing Service’s Visiting Panel for Research, the National Board for Professional Teaching
Standards’ Assessment Certification Advisory Panel, and the American Indian Higher
Education Consortium’s Building an Indigenous Framework for STEM Evaluation. He
earned a B.A. in political science, an M.A. in counseling from the University of Wiscon-
sin–Whitewater, and a Ph.D. in education (emphases program evaluation, administra-
tion, and policy analysis) from UIUC.
Mary E. Dilworth is a senior education policy and research advisor to nonprofit edu-
cation organizations and institutions and the chair of the District of Columbia Higher
Education Licensure Commission. Her work is keenly focused on matters of teacher
quality and preparation, particularly as they intersect with race and ethnicity. Dilworth
has a host of professional experiences that inform her work, including vice president
for research and higher education at the National Board for Professional Teaching Stan-
dards and senior vice-president of the American Association of Colleges for Teacher
Education (AACTE). She is a frequent contributor to national and state forums (e.g., the
National Academies of Sciences, Engineering and Medicine and the Council of Chief
State School Officers). She has written, edited, and contributed to scores of scholarly
books, articles, policy, and research reports. She is the author of a chapter on the pres-
ence and absence of policies to diversify the teaching force for the upcoming Handbook
of Research on Teachers of Color (Bristol & Gist) and the editor of Millennial Teachers of
Color (Harvard Education Press), which is a recipient of the AACTE Outstanding Book
of the Year. Dilworth holds and has held a number of elected and appointed positions
on boards and commissions, including the American Educational Research Association,
the Educational Testing Service, the National Education Association, the American
Federation of Teachers, and the Ford Foundation. She earned a B.A. and an M.A. from
Howard University and a doctorate from The Catholic University of America, each in
the field of education.
31
Constance A. Lindsay is an assistant professor at the University of North Carolina at
Chapel Hill. Lindsay earned a doctorate in human development and social policy from
Northwestern University, where she was an Institute of Education Sciences’ predoctoral
fellow. Since leaving Northwestern, Lindsay has worked in education policy in vari-
ous contexts, applying her research training in traditional studies and in creating and
evaluating new systems and policies regarding teachers. Lindsay’s areas of expertise
include teacher quality and diversity, analyzing and closing racial achievement gaps,
and adolescent development. Her work has been published in such journals as Edu-
cational Evaluation and Policy Analysis and Social Science Research. Lindsay received a
bachelor’s degree in economics from Duke University and an M.P.P. from Georgetown
University. Before her doctoral study at Northwestern, she was a presidential manage-
ment fellow at the U.S. Department of Education.
32
The National Academy of Education (NAEd) advances high-quality research to improve education
policy and practice. Founded in 1965, the NAEd consists of U.S. members and international associates
who are elected on the basis of scholarship related to education. The Academy undertakes research
studies to address pressing educational issues and administers professional development fellowship
programs to enhance the preparation of the next generation of education scholars.
naeducation.org