
Teacher Knowledge and Instructional Quality of Beginning Teachers: Growth and Linkagesby Laura M. Desimone, Eric D. Hochberg & Jennifer McMaken  2016 Background/Context: We lack strong and consistent information about which measures of knowledge matter most for good teaching and student learning, and what are trajectories of improvement for novice teachers. Research Questions: We explore the level, variation, and change in teacher knowledge and instruction in the first two years of teaching, the relationship between Mathematical Knowledge for Teaching (MKT) and more distal measures such as certification. Sample: We studied 45 middle school math teachers in their first two years of teaching, in 11 districts of varying size and urban status in two southeastern and two midAtlantic states. Research Design and Analysis: This is a longitudinal (twoyear) study of natural variation, which includes descriptive, correlational, individual growth curve, and regression analyses. Data Collection: Based on multiple administrations of survey data, MKT assessments, and classroom observations using the Instructional Quality Assessment (IQA), we developed measures of (a) the rigor of lesson activities and classroom discussion, (b) the quality of classroom discussion, (c) the relative emphasis on procedural versus higherorder cognitive demands, (d) the proportion of time spent on basic versus advanced math topics, and (e) the number of topics covered, or instructional breadth. Findings: Key findings are as follows: (a) many beginning math teachers in our sample had neither a degree in math nor substantial coursework in math; (b) teachers generally had low MKT scores, a balanced approach to emphasizing cognitive demands, low levels of discussion quality, and substantial acrossteacher variation in topic coverage; (c) teachers improved in some but not all measures of instructional quality; (d) there were no direct relationships between MKT and instructional quality; (e) we found little evidence that MKT is a better predictor of instructional quality than distal measures, but we did find suggestive evidence that MKT may help to explain their predictive power; (6) we found suggestive evidence that taking more advanced math courses predicts desirable teaching practices; and (f) the number of weeks of student teaching in math was consistently related to more rigorous instruction and less emphasis on basic topics. Conclusions: These results have implications for shaping teacher preparation programs, teacher inservice professional development, and certification policies, as well as how we study new teachers and calibrate our expectations for improvement in novice teachers. Do teacher knowledge and instructional quality grow in the first two years of teaching? Is teacher knowledge related to instructional quality in the first two years of teaching? We address these questions in this study. There are several good reasons for studying linkages between teachers’ knowledge and instructional quality growth. First, many educational reforms rely on improvements in instructional quality to mediate the effect of teacher knowledge on increased student learning (Cohen, McLaughlin, & Talbert, 1993). Second, recent national and state policies require teachers to establish their subjectmatter content knowledge, either through credentials, professional development, or assessments (U.S. Department of Education, 2013). But we lack strong and consistent information about which measures of knowledge matter most for good teaching and student learning (Met Project, 2010). Such information could help shape teacher preparation programs and certification exams. Third, in an effort to retain good teachers and help them improve their practice, policymakers, administrators, and others are paying considerable attention to policies supporting new teachers (New Teacher Center, 2013). A better understanding of how teacher knowledge and instructional quality grow in the first few years of teaching could enhance such efforts and help shape the mentorship of novice teachers. This is especially salient to urban areas, where there are usually a disproportionate number of novice teachers (Ingersoll & Perda, 2010). In short, to better shape teacher development and instruction policy, we need to better understand what type and level of teacher knowledge supports highquality instruction that fosters student learning. The promise of highquality instruction will not be realized “unless systematic thought and research is devoted to questions of teacher subjectmatter knowledge and how both current and desired levels of teacher knowledge impact instructional practice” (Stein, Baxter, & Leinhardt, 1990, p. 642). CONCEPTUAL FRAMEWORK: TEACHER KNOWLEDGE AND MATH INSTRUCTION We derive the conceptual grounding of our study from a set of related hypotheses that posit that (a) teachers’ knowledge, broadly construed, is related to the quality of their instruction (e.g., Ball, 1990; Shulman, 1987); (b) teachers’ knowledge grows through their undergraduate and teacher preparation experiences, inservice mentoring, professional development, and experience teaching (e.g., Putnam & Borko, 1997); and (c) growth in teachers’ knowledge improves the quality of their classroom instruction, which in turn increases their students’ learning (e.g., Carpenter, Fennema, Peterson, Chiang, & Loef, 1989). The overarching conception of all of these relationships reflects the multidimensional aspects of teacher knowledge and instruction. For example, teachers’ knowledge can include knowledge of the subject they teach, pedagogy, classroom instruction, or the curriculum (Shulman, 1987). Similarly, instructional quality has many dimensions, including the amount and type of topic coverage, emphasis on cognitive demands (e.g., memorization vs. problem solving), quality of discussion, rigor of tasks, identification and treatment of student mistakes, quality of student assignments, and more (Ball, 1991; Carpenter, Fennema, & Franke, 1996; Good & Brophy, 2003; Hiebert et al., 1996). Our study is grounded in the conceptual notion that teacher knowledge and instructional quality grow and are related to each other. However, while there is general agreement that teacher knowledge plays an important role in the effectiveness of instruction (Ball, Thames, & Phelps, 2008; Hill & Ball, 2009; Kennedy, 1997; Phelps, 2009), there is much less agreement regarding the amount of knowledge teachers must possess, the characteristics of knowledge that enable effective teaching, the growth trajectories of teachers’ knowledge and instruction, and the role of teacher knowledge in instructional practice and student achievement. Our analysis addresses some of these issues. Within this broad conception of the growth of and the relationships among teacher learning experiences, knowledge, instruction, and student learning (Ball, Sleep, Boerst, & Bass, 2009), we suggest hypotheses that relate different forms of knowledge measures. Specifically, we propose that “distal,” or more distant, measures of teacher knowledge—such as credentials, course taking, professional development, student teaching experience, and college major—might predict more “proximal” measures of teacher knowledge, meaning those that are closer to what teachers actually know—such as subjectspecific knowledge and subjectspecific pedagogical knowledge. We examine distal proxies in part because they are often policy targets, such as requirements for coursework and certification. While linking these distal knowledge proxies to student achievement has been elusive, it could be that their influence is mediated by instruction or by measures of knowledge more closely related to what a teacher knows about her subject area. Understanding how well (or not) distal knowledge measures relate to more proximal knowledge measures and instruction has implications for how we would use them as targets for training or selecting teachers. Although student achievement is a fundamental part of the conceptual framework, our data do not include measures of student learning. However, we do include measures of instructional quality that have been shown in the literature to be related to student learning. RESEARCH QUESTIONS We answer three main research questions in our study, guided by our conceptual framework: (1) What are the level, variation, and change in teacher knowledge and instructional quality for math teachers in their first and second years of teaching? (2) To what extent is a proximal measure of teacher knowledge associated with instructional quality, and does this relationship change from the first to the second year? and (3) To what extent is a proximal measure of teacher knowledge a better predictor of instructional quality than frequently used distal measures of teacher knowledge? To better grasp the relationship between teacher knowledge and instruction quality (research question 2), we explore several alternatives, which include tests of immediate and delayed effects: (a) Does teacher knowledge predict instructional quality in the fall of the first year of teaching? (b) Does teacher knowledge predict instructional quality in the spring of the first year of teaching? and (c) Does teacher knowledge predict instructional quality in the spring of the second year of teaching? Questions (b) and (c) allow for the possibility that a teacher’s initial focus on classroom management and logistics may interfere with her ability to apply her subjectmatter knowledge to the quality of instruction until later in her first or second year of teaching. While we do expect to find a relationship between subjectmatter knowledge and instruction (Hill, 2007), we suspect that these links might be quite weak in the early years of teaching, given the multitude of struggles and challenges of firstyear teachers (e.g., Grossman, 1990). Next, we discuss the literature that serves as the foundation for these questions. CONCEPTUALIZING AND MEASURING TEACHER KNOWLEDGE Shulman’s (1986) delineation of aspects of teacher knowledge, and his hypotheses about how each aspect might improve instruction, pushed the field to better conceptualize types of teacher knowledge (Ball, 2000) and to measure them in more sophisticated ways (Hill, Ball, & Schilling, 2008; Hill, Schilling, & Ball, 2004). A multifaceted conception of teacher knowledge now has a strong basis in the literature, although scholars describe its dimensions in different ways (Ball & Rowan, 2004; Krauss et al., 2008; Magnusson, Krajcik, & Borko, 1999; MorineDershimer & Kent, 1999). Challenges to conceptualizing and measuring teacher knowledge include capturing the “complexity, subtleties and nuances of teacher knowledge” (Stein et al., 1990, p. 640) as well as measuring a construct that researchers are still exploring and defining (Alonzo, 2007; Gearhart, 2007; Kersting, Givvin, Sotelo, & Stigler, 2010; Schoenfeld, 2007). Proximal Measures of Teacher Knowledge Recently, researchers have developed multiplechoice assessments of knowledge for teaching in different content areas (Hill et al., 2004; Phelps, 2009; Piasta, Connor, Fishman, & Morrison, 2009; Rohaan, Taconis, & Jochems, 2009). One notable effort is the Mathematical Knowledge for Teaching (MKT) measures (Hill et al., 2004). The MKT measures include a set of items that represent specialized knowledge for teaching, such as teachers’ ability to provide mathematical explanations that children can comprehend and to select appropriate representations of mathematical symbols and content. The assessment also has a set of items that represent the mathematical tasks of teaching, such as analyzing student errors and addressing unconventional solutions to problems (Hill & Ball, 2009). The MKT is the first measure of its kind, and it holds promise for largerscale comparative work. One of the MKT’s key strengths is its theoretical foundation. The MKT measure reflects Ball, Thames, and Phelps’s (2008) practicebased theory of knowledge for teaching math. Drawing on systematic observations of teachers and analyses of mathrelated problems that occur during teaching, the MKT’s developers contend that “mathematical knowledge for teaching” (Hill & Ball, 2009) theoretically represents three main categories of knowledge: (a) content knowledge enmeshed with knowledge of how students think about or learn particular content; (b) content knowledge enmeshed with knowledge of effective teaching, such as the selection of appropriate examples; and (c) how to use instructional materials (Ball et al., 2008; Phelps & Schilling, 2004). The MKT assesses knowledge on several dimensions, including explaining terms and concepts to students; interpreting students’ statements and solutions; judging and correcting textbook treatments of particular topics; using representations accurately in the classroom; and providing students with examples of math concepts, algorithms, or proofs (Hill, Rowan, & Ball, 2005). The MKT has been shown to predict student achievement (Ball, Lubienski, & Mewborn, 2001; Hill et al., 2005) and to predict a measure of instruction developed to be highly aligned to it (Hill et al., 2008). Previous work has demonstrated the discriminative, measurement, and predictive validity of the MKT (Hill et al., 2004). Given this demonstrated value of MKT, it seems an ideal tool for studying withinteacher change during beginning years of teaching. Distal Measures of Teacher Knowledge We consider various distal knowledge measures: content knowledge and pedagogy courses taken in undergraduate preparation and inservice, the type of credential earned, whether the teacher has a math or education major, and the number of weeks spent student teaching. The field has criticized the use of distal proxy measures of teacher knowledge, arguing that subjectmatter knowledge is distinct from pedagogical content knowledge and knowledge for teaching (e.g., Ball, 1990, 1991). For example, across content areas the literature substantiates that general content knowledge, such as what might be taught in a collegelevel math course, provides an insufficient knowledge base for teaching content to children (Baumert et al., 2010; Hill & Ball, 2009; MorineDershimer & Kent, 1999). And researchers agree that general pedagogical knowledge is a separate domain that includes principles of classroom organization, behavior management, and clear communication (Baumert et al., 2010; Magnusson et al., 1999; MorineDershimer & Kent, 1999; Shulman, 1987). It is no wonder, then, that studies show mixed results for how well college preparation and credentials translate into better teaching and student learning. One hypothesis is that formal schooling and teacher preparation are related to more proximal aspects of teacher knowledge and, therefore, performance in the classroom; however, studies of how well such proxies predict student achievement have had mixed results, often finding little or no relationship between certification and student achievement (Begle, 1972, 1979; Goldhaber & Brewer, 2000; Greenwald, Hedges, & Laine, 1996; Hanushek, 1981, 1986; Phillips, 2010). Yet studies that use knowledge measures that directly assess teachers, such as certification exams or subjectmatter exams, have shown positive relationships with student achievement (Boardman, Davis, & Sanday, 1977; Ferguson, 1991; Hanushek, 1972; Harbison & Hanushek, 1992; Mullens, Murnane, & Willett, 1996; Rowan, Chiang, & Miller, 1997; Strauss & Sawyer, 1986; Tatto, Nielsen, Cummings, Kularatna, & Dharmadasa, 1993; for reviews, see Greenwald et al., 1996; Hanushek, 1986; Wayne & Youngs, 2003). Student teaching is clearly an opportunity for teachers to build their knowledge and instructional effectiveness. The empirical research on links between student teaching and inservice teacher effectiveness, however, are minimal. While teachers commonly report that student teaching is the most influential part of their preparation (Hollins & Guzman, 2005; Wilson, Floden, & FerriniMundy, 2001), few studies have linked the content and quality of preservice student teaching to later content or pedagogical knowledge, or to instructional effectiveness (CochranSmith & Zeichner, 2005; Wilson & Floden, 2003). Notable exceptions are Boyd, Grossman, Lankford, Loeb, and Wyckoff’s (2009) study of institutions preparing New York City teachers, which found that field experiences with program oversight, experienced supervising teachers, and a minimum of five observations were significantly associated with better teacher effects on student achievement; and Ronfeldt’s (2012) study of 3,000 New York City teachers, which found that teachers who did their student teaching in “easier to staff” schools were more effective in raising student test scores when they became fulltime teachers. Ronfeldt suggests that easytostaff schools are likely to provide more of a constructive professional community conducive to teacher learning. Other aspects of student teaching, beyond the scope of this study, warrant further study, namely the relationship of the placement to the broader activities of the teacher’s preparation program, and how they are theoretically and practically linked. For example, placement experiences may be more productive if they are tied systematically to content taught in the preparation program. While other studies problematize student teaching—in terms of the clash of cultures between the university and the school (Wideen, MayerSmith, & Moon, 1998); the similarity of field experiences across programs (Boyd et al., 2008); and the negative repercussions of lack of collegiality (Korthagen, Loughran, & Russell, 2006)—we have seen no study that analyzed length of student teaching placement and its relationship to knowledge or instructional quality. These distal indicators—major, certification, and time spent in student teaching—may not predict instruction and achievement consistently because of the variation in their quality and also because no single indicator is likely to capture the complex nature of teacher knowledge (which is something that the MKT attempts to do). We examine links among these common distal proxies and the MKT, and compare how well the MKT and distal measures perform in predicting instruction that has been shown in the literature to be predictive of student learning. CONCEPTUALIZING AND MEASURING MATHEMATICS INSTRUCTION As Shulman (1986) so aptly explained, “to conduct a piece of research, scholars must necessarily narrow their scope, focus their view, and formulate a question far less complex than the form in which the world presents itself in practice. This holds for any piece of research; there are no exceptions” (p. 6). This dictum rings true for our work. Mathematics instruction is multidimensional, involving teaching students the cognitive demands of various tasks, managing classroom discussion, determining the depth and range of topic coverage, responding to student errors, preparing and grading student work and assessments, motivating students, and much more. In this analysis, we do not study math instruction in all of its depth and complexity; instead, we narrow the definition of knowledge and instruction to a small slice of these deep and complex constructs. The aspects of instruction we study are grounded in ideas from the math community about the importance of teaching mathematics for understanding; providing rigorous tasks that require students to estimate, conjecture, plan, investigate, and learn through trial and error; engaging in discussions that necessitate indepth explanations and build on one another’s ideas; focusing on problemsolving and indepth understanding rather than relying solely on memorization and procedures; covering a smaller set of topics with greater depth; and covering more advanced topics (e.g., Ball, 2000; CochranSmith & Lytle, 1999; Stein & Lane, 1996). We target aspects of instruction that have been associated with student achievement but that differ from teacher to teacher even given the same curriculum and circumstances. In this way we identify malleable aspects of instruction that are at least somewhat in the control of the teacher and so can be affected by policy. Although our description of instruction is by no means exhaustive, the aspects we include represent major components of mathematics instruction as recommended by the National Council of Teachers of Mathematics (NCTM, 2000, 2006) and the National Research Council (NRC; Kilpatrick, Swafford, & Findell, 2001). The dimensions of instruction that we examine are a subset of those hypothesized to be related to teacher knowledge. In studying links between teacher knowledge and instruction, we believe that it is more important to study aspects of teaching that include how math is used during a lesson and the extent to which it comports with NCTM protocols (e.g., Junker et al., 2006; Learning Mathematics for Teaching, 2006; Schoen, Cebulla, Finn, & Fi, 2003; Sawada et al., 2000), rather than identifying lesson characteristics (e.g., seatwork/group work) that are less likely to be linked with variation in teacher knowledge. LINKING TEACHER KNOWLEDGE TO MATHEMATICS INSTRUCTION The literature reflects a strong set of hypotheses about how teachers’ knowledge of math and how to teach math is likely to translate into several types of desirable instruction, such as the ability to construct better mathematics representations, to better understand students’ methods, to have a clearer understanding of student errors, and to better grasp the structures underlying mathematics and how they connect (e.g., Ball, 1993; Borko et al., 1992; Carpenter et al., 1989; Leinhardt & Smith, 1985; Ma, 1999; Thompson & Thompson, 1994). Some evidence suggests that teachers with more explicit and betterorganized knowledge tend to provide instruction that features conceptual connections, appropriate and varied representations, and active and meaningful student discourse (Cohen, 1990; Putnam, Heaton, Prawat, & Remillard, 1992). Fennema and Franke (1992) contend that teachers with more knowledge differ in the “richness of the mathematics available for the learner” (pp. 149–150); conversely, they argue that “. . . teachers with limited knowledge have been found to portray the subject as a collection of static facts; to provide impoverished or inappropriate examples, analogies and/or representations; and to emphasize seatwork assignments and/or routinized student input as opposed to meaningful dialogue” (p. 164). Similarly, Stein et al. (1990) posit that teachers with limited knowledge overemphasize “limited truths” and miss opportunities for connecting important concepts and representations. Analyzing videotaped instruction, Hill, Ball, Blunk, Goffney, and Rowan (2007) noted that teachers who scored high on MKT measures avoided math inaccuracies and provided instruction characterized by appropriate representations, strong explanations, and mathematical reasoning to a greater extent than did teachers who scored low on MKT measures. Analysis from the Bill and Melinda Gatessponsored Measuring the Effectiveness of Teachers (MET) project (metproject.org) also suggested that higher MKT scores were associated with higher scores on an aligned observation measure, although the researchers did not find that higher MKT scores were associated with increased student achievement (Mihaly, McCaffrey, Staiger, & Lockwood, 2013). Charalambous (2010), in his multicase analysis of two teachers, suggested a link between teachers’ MKT scores and the cognitive level of task enactment. This research has been essential for illustrating and advancing theoretical notions of the role that subjectmatter knowledge plays in the quality of teaching; however, the studies are often finegrained, qualitative case studies of one or a few teachers, or smallscale comparison studies (e.g., Grossman, 1990; Leinhardt & Smith, 1985; Stein et al., 1990). They are exploratory, meant to uncover relationships between knowledge and instruction rather than formally describe or test these relationships (DarlingHammond & BaratzSnowden, 2005; Haertel, 1991). The natural next step is to conduct systematic analyses of larger samples of teachers to examine patterns and test hypotheses; few such studies have been conducted. A notable exception is Baumert et al.’s (2010) yearlong study of a representative sample of 10thgrade German mathematics classrooms. They found that teachers’ pedagogical content knowledge (PCK) was empirically distinguishable from their content knowledge, and that the significant relationship between PCK and student achievement was completely mediated by the quality of the teachers’ instruction, as defined by cognitive demand and individual learning support. Recently, several researchers have indicated that examining the ways that teacher knowledge relates to different aspects of instruction is relatively unexplored in research (Charalambous, 2010; Hill et al., 2005). Our study helps fill this gap by analyzing how teacher knowledge may be related to multiple dimensions of instruction in middle school mathematics. Additionally, we are not aware of any evidence documenting a relationship between the MKT and broader measures of instruction, even though it has the conceptual potential to be related to aspects of instruction besides those it is specifically aligned with; we investigate this issue in our study. Specifically, based on research that links highquality instruction to teachers with a deeper understanding of math (Hiebert et al., 1997; Resnick & Hall, 1998), we hypothesize that teachers with greater content knowledge cover fewer topics (due to more indepth coverage of advanced topics and less time on basic topics) more rigorously and with higherquality discussion that places a greater emphasis on higherorder cognitive demands. DATA AND STUDY DESIGN Our analytic sample is comprised of 45 middle school math teachers in their first two years of teaching. The target sample was all beginning seventh and eighthgrade math teachers in 11 districts of varying size and urban status in two southeastern and two midAtlantic states. A total of 66 (out of a possible 70) teachers volunteered to enroll in the study. Four of those 66 teachers discontinued participation before we collected baseline measures, reducing the sample to 62. We have complete data on all 62 teachers for the first fall and on 60 teachers in the spring of Year 1. We lost 17 of our 62 teachers from Year 1 to Year 2, due to their attrition from teaching. Sample size reduction over time reflects a combination of resignation and termination, changes in teaching assignment to a noneligible grade level or content area, and reassignment or relocation to other schools or districts. Nonresponse and study withdrawal comprise less than 10% of the teachers who left the study. Between the first and second years, eight teachers are known to have stayed within teaching but changed assignments, resulting in ineligibility for study participation; seven teachers are known to have left teaching altogether. To see if the 17 teachers lost to attrition differed meaningfully from teachers who stayed, we compared baseline measures between those in the study in Year 2 and those in the study in Year 1, and found no significant differences. This suggests that those teachers who left the study did not differ systematically from stayers, at least on the measures in our study (e.g., content knowledge, instruction, preparation), though they may differ on unobservables such as commitment to teaching. Since our focus is on change, we base our analysis on the 45 teachers who were in our study for all three time points. Table 1 shows that about 10 of our 45 teachers (22%) have a math education degree, four teachers (8%) have a math or science major or minor, the mean number of weeks spent student teaching is about 12, and the average number of advanced math courses that teachers in our sample took is 2.20. This indicates that a substantial percentage of beginning teachers did not have a substantial background in mathematics. As a robustness check, we conducted our correlational and regression analyses with the larger sample of 62 teachers and found virtually identical results (not shown).[1] Table 1. Study Descriptives
1. These numbers are based on those for whom we had complete data at all three time points, as our sample was too small to use missing data imputation techniques. The number of teachers who completed the IQA, MKT, and SEC survey differ at each time point (due to maternity leaves, nonresponse, and attrition after completing some but not all data collection). The overall sample of teachers who completed the survey is 62 in fall of Year 1, 60 in spring of Year 1, and 43 in spring of Year 2. Fiftysix teachers completed the SEC in fall of Year 1, 55 in spring of Year 1, and 42 in spring of Year 2. The number of teachers with IQA data was 59 in the fall of Year 1, 58 in the spring of Year 1, and 44 in the spring of Year 2. The number of teachers who completed the MKT is 61 in fall of Year 1, 60 in spring of Year 1, and 45 in spring of Year 2. ^{a }Significantly different from Fall Year 1 at p < .001 ^{b }Significantly different from Fall Year 1 at p < .05 Although a 45teacher sample is modest in size, our study design has several advantages. First, it is longitudinal. We charted levels and changes in MKT and instructional quality in the first two years of teaching, which we know of no other study doing. This provides a rare look at empirical measures of teacher change. Second, we used mixed methods and multiple measures to compare various ways of representing teacher knowledge and instruction. Third, the sample is midway between a small case study and largerscale representative samples, allowing us to collect observation and teacher assessment data while also examining patterns and relationships. We draw our data from a larger, staged, longitudinal study of mentoring and induction of middle school math teachers that began in 2007. This study used consecutive cohorts of beginning teachers in 2007–2008, 2008–2009, and 2009–2010, and followed them for their first two years of teaching. We administered surveys to teachers in the fall and spring of their first and second years of teaching. At both time points in the first year and in the spring of the second year, we administered the MKT and observed each teacher twice, on consecutive days, using the Instructional Quality Assessment (IQA), a classroom observation protocol developed at the University of Pittsburgh. We used two parallel forms of the MKT, given alternately. At the first administration, we randomized which teacher received which form, and then we alternated at the next wave so that no teacher repeated the same form consecutively. In the first year of teaching, we observed two consecutive lessons in the fall and two consecutive lessons in the spring; in the second year of teaching, we observed two consecutive lessons in the spring. We administered our surveys via mail; the response rate was 100% for the fall survey the first year and over 90% on subsequent survey administrations. In our analyses, we use the survey data from the firstyear fall teacher survey administrations in October 2007, 2008, and 2009; the IQA lesson observations in November/December 2007, 2008, and 2009 and March/April 2008, 2009, 2010; and the MKT administrations from October 2007, 2008, and 2009 and May 2008, 2009, and 2010. Table 1, which contains the descriptive statistics for variables in the study, also includes the data collection time points and sample sizes at each point. VARIABLES USED IN THE ANALYSIS We ground our measures of instruction in three main ideas. First, we sought measures consistent with the field’s description of what knowledge is important for teaching math and what constitutes “high quality” math instruction, as described earlier. Second, we chose dimensions of instruction that had been validated in previous research. Third, to triangulate, we used measures that employ different modes of data collection—a teacher paper and pencil assessment (MKT), a thirdparty observation rubric (IQA), and a teacher selfreport survey. Proximal Knowledge Measure: Mathematical Knowledge for Teaching (MKT) We use a version of the MKT that includes items on middle school number concepts and operations and middle school patterns, functions, and algebra (Hill, 2007). In this analysis, we convert MKT scores to zscores. Although multiplechoice assessments such as the MKT are objective and can be implemented efficiently, Baxter and Lederman (1999) have expressed some concern about their construct validity and their ability to discriminate among teachers with a range of knowledge. Our study offers an opportunity to test this relatively new knowledge measure on a moderately sized sample of beginning teachers to help establish its properties for charting growth and showing variation among teachers. Distal Knowledge Measures We use data from our teacher selfreport surveys to measure commonly used proxies for teacher knowledge and experience. The proxies are math or science major or minor (we include “hard” science majors such as chemistry and physics with the math major because those science majors require substantial math coursework); math education major or minor; number of weeks of student teaching in a math classroom; and number of advanced math courses taken in college. The math major and math education variables are dummy variables coded 1 if a teacher had a major or minor in a math or “hard” science field or completed a math education program and 0 otherwise. The student teaching variable is a continuous variable of the number of weeks the teacher taught in a classroom as part of an apprenticeship (i.e., student teaching) before becoming a teacher. We do not have information about how the placement was related to the teacher’s coursework or other preparation activities. Advanced math courses is a continuous variable representing the number of advanced math courses teachers took during their postsecondary education. Measures of Instruction The complexity of teacher instruction is reflected in the variety of methods researchers use to examine it. Typical methods include teacher surveys, classroom observations, and teacher logs (which are daily surveys); in recent years, researchers have begun to use artifact analysis and video coding as well (Borko et al., 2006; Kilday & Kinzie, 2009). Each of these approaches is limited in some way. For example, teacher selfreports are useful for measuring specific instructional behaviors but less valid and reliable for measuring most dimensions of instructional quality (Mayer, 1999; Porter, 2002; Quint, Akey, Rappaport, & Willner, 2007). And although observation protocols reliably determine dimensions of quality (Kennedy, 1999), they provide only a snapshot of instruction rather than a complete picture of what is taught during the year (Porter, 1989), and there is no consensus about which quality dimensions such measures should include. To address the tradeoffs of using different ways of measuring instruction, we measure instructional quality with multiple measures and modes: a classroom observation protocol that takes into account task rigor and discussion quality; and a selfreport survey from which we derive instructional breadth (number of topics covered), emphasis on advanced topics, and emphasis on higherorder cognitive demands. Instructional Quality Measures from the IQA We used the IQA (Boston & Wolf, 2006; Junker et al., 2006) classroom observation protocol to measure the quality of each teacher’s math instruction. The IQA is designed to code math lessons that involve students in a problemsolving activity and subsequent wholeclass discussion related to this activity (Matsumura et al., 2006). It focuses less on the accuracy of teachers’ instruction and more on the level of the task (i.e., how cognitively demanding is it?) and the level of classroom discussion (i.e., are students participating in discussion, and are teachers pressing students to think more deeply about the concepts at hand?). Results from an initial pilot study showed low to moderate levels of rater reliability (Junker et al., 2006). In response to these low reliability levels, developers lengthened the rater training procedure and added components; a subsequent validation study showed improvements in rater reliability (Matsumura et al., 2006). Preliminary evidence from several studies of the IQA’s predictive validity indicate that it discriminates student achievement outcomes (Matsumura et al., 2006; Quint et al., 2007), although caution should be exercised in not interpreting this as a causal relationship. The IQA’s focus on the rigor of the task, which encompasses the degree to which the task is cognitively demanding, requires students to problem solve and to draw upon prior knowledge. All of these facets of academic rigor are clearly defined and measured by the IQA and closely map to tasks that have been deemed “worthwhile” by NCTM and important by other researchers (Hiebert et al., 1997; Putnam, Lampert, & Peterson, 1990; Stein, Grover, & Henningsen, 1996). Acknowledging that a comprehensive measure of instructional quality is not feasible, the IQA’s developers focused on a set of measures that research identified as most closely linked to student achievement. The measures are (a) rigor of lesson activities and the ensuing class discussion (e.g., the degree to which students are supported to build deep conceptual understanding and engage with the demands of highlevel tasks); (b) the quality of class discussion (e.g., the extent to which teachers press students to build deep conceptual understanding in mathematics and engage with the demands of highlevel tasks); and (c) the quality of the teacher’s expectations (e.g., the clarity and rigor of expectations for the quality of student work and how teachers communicate these expectations to students) (Matsumura et al., 2006). We used the first two dimensions of the IQA in our study. For these two dimensions, the IQA has eight rubrics, each of which is scored separately. Three of the rubrics measure academic rigor; they focus on (a) the level of cognitive demand required by the tasks provided to students, called “potential of the task”; (b) the extent to which the teacher maintains this cognitive demand throughout students’ completion of the task, called “implementation of the task”; and (c) the extent to which the class discussion includes students sharing work and explaining their thinking about the mathematical content inherent in the task, referred to as “student discussion following the task.” The five remaining rubrics measure the following aspects of student discussion: (d) proportion of class participating in the discussion; (e) teachers’ linking moves (the extent to which the teacher makes connections among mathematical ideas voiced by different speakers); (f) students’ linking moves; (g) teachers’ pressing for reasoning based on student contributions (the extent to which the teacher requires justification of contributions to the discussion); and (h) student responses (the extent to which students provide reasoning and justification for their contributions). Each rubric is scored on a 5point scale (4 = exemplary, 1 = poor, 0 = not observed), with scores of 3 or 4 indicating instruction that is more consistent with the vision of teaching and learning advocated by the NCTM (NCTM, 2000, 2006). In our initial descriptive analyses, we examined scores on all eight rubrics, but for our multivariate analyses we created indices to provide more reliable measures (Mayer, 1999). We created a “rigor” scale based on the first two rubrics, and a “discussion quality” scale based on the five discussionrelated rubrics and the discussion dimension from the rigor rubric. Cronbach’s alphas of the indices were .86 and .96, respectively. CONDUCTING THE OBSERVATIONS
To improve reliability, we use the two consecutive lessons we observed at each time period to produce a single set of IQA scores; recent research stresses the importance of conducting two or more observations to ensure adequate reliability and validity (Ho & Kane, 2013). At each of these time points, a member of the research team videotaped two lessons in the same math class. Two independent raters viewed and rated each recorded lesson. When scheduling the IQA observations, we did not ask teachers to cover a particular topic or set of topics. Instead, consistent with the target of the IQA, we told the teachers that we wanted to observe a lesson in which students were provided with a problemsolving task, given time to work on this task, and called together for a wholeclass discussion following work time on the task. We observed on days when teachers indicated that their lessons would follow this format. The mathematical topics covered during the observation varied by teacher and, in most cases, corresponded to the point in the curriculum sequence where the class was on the particular dates we observed; in other words, teachers generally chose topics that were closely related to what they would have been covering on the dates preceding and following our observations. Examples of topics, which run the gamut of those typically covered in middle school mathematics, include proportional relationships, solutions of linear equations and pairs of linear equations, surface area and volume, and probability of compound events. Particular tasks in which students engaged during the lessons were similarly varied. For example, in one seventhgrade lesson that we observed in May, students solved a problem requiring them to determine the amount of paint needed to cover an irregularly shaped stage set. Another lesson, at the eighthgrade level, required students to use algebraic representations of a scenario involving rates of change and to justify whether particular expressions appeared to be equivalent. Students in another eighthgrade class that we observed were shown the algorithm for computing the volume of a cylinder and subsequently solved routine problems using the algorithm. One potential concern with allowing teachers to choose the lessons to be observed is that they may opt to present lessons for which they are better prepared than usual. As part of the Measures of Teacher Effectiveness (MET) study, Ho and Kane (2013) investigated this type of bias by comparing observation scores on videos chosen by researchers with videos selected by teachers. They found that when teachers picked the videos, scores were higher, but differences among teachers in their scores were just as salient and, in fact, the reliability of the scores was higher. IQA RATER TRAINING AND CODING RELIABILITY Consistent with research that emphasizes the importance of thorough training for observers (Ho & Kane, 2013), our observers, who were doctoral students or postdoctoral researchers, were trained in a multistep process. Each rater participated in at least two full days of training conducted by IQA experts. Then, before doing any coding, each rater participated in interrater reliability exercises. Groups of coders viewed the same classroom video, coded it, and then discussed their coding with one another to reach mutual understanding and agreement. Raters did not begin individually coding lessons until they reached 80% interrater agreement. We used multiple methods to assess the reliability of our IQA coding. Exact agreement between paired coders—calculated as the total number of agreements divided by the total number of agreements and disagreements—was 60% overall. Onepoint agreement, in which we considered raters to be in agreement if individual scores were within 1 point of one another on each 5point scale, was 88%. In addition, we conducted a generalizability study (Gstudy) at three time points during the coding process to verify that our design for rating lessons—with two raters each rating two lessons at each time point—provided a stable estimate of instructional quality. In each Gstudy, each rater involved in the project at the time when the Gstudy was conducted independently scored two lessons from each teacher in a random sample; we used GENOVA software to analyze the ratings given to the lessons (Crick & Brennan, 2001). At the teacher level, generalizability coefficients with two raters and two observations for each teacher ranged from .74 to .86, with an average of .80, indicating sufficient reliability. Each time a Gstudy was conducted, the team of coders met to discuss any discrepant ratings and to come to consensus on application of the rating scheme. INSTRUCTION MEASURES FROM THE TEACHER SURVEY Consistent with our earlier discussion, we sought to measure dimensions of math instruction defined by recent NCTM and NRC publications as well as those hypothesized in the literature to be associated with teacher knowledge. And, in accordance with research on survey data, we sought behaviorally oriented constructs (e.g., what teachers do in the classroom) rather than affective or relational constructs (e.g., teacher emotions or opinions about their relationships with students), which are more difficult to reliably collect from teacher selfreports (Mayer, 1999). Our selfreport measures of instruction are derived from the “content grid” from the Survey of Enacted Curriculum (SEC) (see Council of Chief State School Officers, 2005; Porter, 2002); the grid provides a survey measure of teachers’ relative emphasis on topics by cognitive demand. The instrument uses a comprehensive taxonomy of mathematics concepts. The columns of the grid list five cognitive demands in mathematics: (a) memorize facts, definitions, and formulas; (b) perform procedures; (c) demonstrate understanding of mathematical ideas; (d) solve nonroutine problems/make connections; and (e) conjecture/generalize/prove. The rows of the grid list a comprehensive set of detailed topics, such as multiplestep equations, inequalities, and linear equations. For mathematics, the content grid identifies 16 general areas (e.g., operations, measurement); each is further defined by four to 19 specific topics, for a total of 183 specific topics. On our survey, we asked teachers to indicate the math topics by cognitive demand that they covered during a semester of instruction. Other studies have shown the content matrix to be valid and reliable (e.g., Blank, Porter, & Smithson, 2001; Blank et al., 2005; Porter, Polikoff, & Smithson, 2009; Porter, Smithson, Blank, & Zeidner, 2007; Porter, 2002) and an important mediator of student achievement (Gamoran, Porter, Smithson, & White, 1997). Using the SEC content matrix, we can measure three constructs that reflect important dimensions of instruction: (a) the relative emphasis on procedural versus higherorder cognitive demands, an important part of a balanced instructional approach (National Mathematics Advisory Panel, 2008); (b) the proportion of time spent on basic versus advanced math topics (Loveless, 2001); and (c) the number of topics covered, or instructional breadth, which lets us compare “mile wide, inch deep” topic coverage to more indepth treatment of fewer topics (Schmidt, McKnight, & Raizen, 1997). We derived all three of these measures from the instructional content matrix that we included on the teacher survey. To determine the proportion of time spent on high cognitive demands, we developed an additive index of the amount of time teachers indicated that they spent on “demonstrate understanding,” “conjecture/generalize/prove,” and “solve nonroutine problems.” Proportion of time spent on basic topics represents the time teachers spent on the topics of number sense and operations (rather than more advanced topics such as basic and advanced algebra, geometry, and statistics). The breadth of topics is an additive index indicating how many unique topics (e.g., angles, absolute value, line graphs, multiplying fractions) teachers covered during each semester. We do not consider alpha reliabilities relevant here, as these are additive indices of how much time teachers spend on various approaches and topics. These constructs are grounded in math literature that places them among the important indicators of quality math instruction (e.g., Stein & Lane, 1996). These dimensions of instruction complement the task rigor and discussion quality measured by the IQA rubric. CORRELATIONS OF MEASURES OF INSTRUCTIONAL QUALITY Correlations among our instruction variables offer evidence that the measures are distinct and operate generally in the way that we conceptualized them. Table 2 shows correlations among our variables in the spring of Year 2 (for efficiency, fall and spring of Year 1 correlations are not shown). For example, in the spring of Year 2, teachers’ spending more time on high cognitive demands has a .31 correlation coefficient with covering more topics. Even though this relationship is not significant, it bears some discussion, since the wellknown “mile wide, inch deep” content coverage description of most U.S. teachers implies covering many topics in a procedural way. Table 2. Spring Year 2 Correlations
^{a }Significant at p < .001 ^{b }Significant at p < .01 ^{c }Significant at p < .05 We do not see breadth of topic coverage as necessarily incompatible with emphasis on higher cognitive demands. One could cover a topic very indepth using only memorization and procedural approaches. Similarly, one could cover many topics but employ predominately problem solving and conceptual approaches when teaching those topics. Time spent on basic topics and time spent on higher cognitive demands (correlation coefficient of .30, but not significant) also are not incompatible. A teacher could focus on simple topics—adding fractions, singlestep equations—but use conceptual approaches to teach these basic topics. Previous work has indicated that effective teachers cover more material (Hanushek, 2011; Hanushek & Rivkin, 2010) and use higherorder demands (Ball, 2000). So, these two taken together imply that effective teachers are able to cover more topics at higher levels of cognitive demand. Still, we would not expect strong correlations on average between breadth and depth, given generally low to moderate teaching quality, and our nonsignificant correlations are in line with this idea. Related to this, covering more topics was correlated (not significantly) with less press in discussion (.26) and less student knowledge provision in discussion (.29), which is consistent with the idea that breadth of coverage allows less indepth discussion. As an added robustness indicator, correlations for the full sample of 60 teachers in the spring of the first year were very similar to the truncated longitudinal sample of 45 teachers (results not shown). ANALYSIS TECHNIQUES We conducted a threepart analysis. First, we undertook descriptive analysis to examine the relationships between instructional quality and pedagogical content knowledge. We examined summary statistics of instructional quality and teacher knowledge to make sense of how teachers vary in their scores compared to one another and over time. Next, we used a correlational analysis to examine the relationships among these variables and our distal teaching knowledge measures. We estimated means and correlations for the sample of 45 teachers at three time points: the fall of the first year of teaching, the spring of the first year of teaching, and the spring of the second year of teaching.[2] To examine whether the teachers’ instructional quality and knowledge improved over time, we used individual growth curve analysis (Duncan, Duncan, Strycker, Li, & Alpert, 1999). We modeled both dimensions of teachers’ IQA score trajectories, their MKT trajectories, and time spent on basic topics instruction across the three time points using the following multilevel individual growth curve approach: where i represents the time of the observation, i = 1 to n, and j represents the teacher, j = 1 to k. Growth curve modeling accounts for the correlated nature of the data by nesting the repeated measures within individuals over time at Level 1. We used these repeated observations to model a latent growth trajectory for individual teachers, which we in turn fed into the Level 2 model to estimate betweenteacher differences. Diagnostic tests of model fit revealed that a random coefficients model, in which both the intercept and slope are allowed to vary randomly, fit the data best; as can be seen in the preceding equation, this allows for the estimation of individual teacher intercepts (U_{0j}) and different rates of growth across time (U_{1}_{j}time_{ij}) plus the random error term (ε_{ij}). We first fit the unconditional model without teacher background variables so that we could examine the variance structure before modeling any significant growth by background characteristics, such as mathematics and education background and student teaching experience. Finally, we employed regression analyses to examine how well math knowledge predicts instruction. We ran a series of bivariate regressions at each time point to investigate whether MKT predicts the IQA dimensions of rigor and highquality discussion, the number of topics teachers covered, and the emphasis and the percentage of instruction time on basic topics. We further developed these models to test whether the study’s proximal measures of teacher knowledge are able to explain teaching quality better than the distal measures that are more frequently used in the literature. We fit the following model, where the set of distal knowledge measures predicts the outcome of interest independent of MKT scores: . In the second step, we added teachers’ MKT scores to this model to see how the fit and patterns between models differ: . RESULTS In this section we describe the results of our descriptive, growth, and regression analyses. We do not interpret effect sizes for marginally significant variables, although we do mention marginally significant findings. In the tables and text, we report standardized coefficients to allow for comparison across variables. To calculate effect size, we multiply the unstandardized beta coefficient and the standard deviation of the dependent variable (we include the calculation in the endnotes). RESEARCH QUESTION 1: WHAT ARE THE LEVEL, VARIATION, AND CHANGE IN TEACHER KNOWLEDGE AND INSTRUCTIONAL QUALITY FOR MATH TEACHERS IN THEIR FIRST AND SECOND YEARS OF TEACHING? We examined level and variation for the distal teacher knowledge measures; for the MKT and instructional quality, we also looked at change over the first two years of teaching. Table 1 provides means and standard deviations for all three time points for the variables in our analysis (teacher background, MKT scores, IQA scores, and content matrix instruction variables). Distal Teacher Knowledge Measures (Teacher Background) To get a sense of the qualification distribution of teachers in our sample, and to see if there were meaningful typologies of teacher qualifications, we did a cluster analysis of our four background variables and identified a typology of four types of teachers. Table 3.1 shows these four types: Cluster 1: Student teaching, no formal math—those with little formal math background but with an average of 17 weeks of student teaching in math; Cluster 2: Math education, high student teaching, and advanced math courses—those with 16 weeks of student teaching in math and more than four advanced math courses, on average; Cluster 3: Moderate student teaching and advanced math courses—those with a math education degree, about 11 weeks of student teaching in math, and an average of 2.7 advanced math courses, and Cluster 4: No experience, no math—those with basically no math background and no student teaching. Table 3.1 Cluster Analysis of Distal Teacher Knowledge Variables
We examined whether relationships with knowledge and instructional quality differed according to the teacher’s qualification cluster (results not shown). We analyzed all six outcomes (three from the SEC, two from the IQA, and the MKT) at all three time points for differences by cluster. We found no differences on either of the IQA dimensions by cluster. Not surprisingly, we found that Cluster 2, which represents teachers with the most math education and math courses, had higher MKT scores at all three time points compared to the average across other clusters (p < .001 at time 1, p < .05 at times 2 and 3). We also found that those with little formal math background but with an average of 17 weeks of student teaching in math (Cluster 1) spent significantly less time than the other three clusters (p < .01) on basic topics in the spring of Year 2. Our analytic sample had 32 traditionalroute teachers who completed their teacher preparation coursework at a total of 22 different colleges and universities in eight states. While we do not have detailed data on the content and quality of each preparation program, we do have data on several key aspects of their preparation, presented in Table 3.2. Of the 32 traditionalroute teachers in our sample, 22 obtained their degree at the undergraduate level, while 10 did so at the graduate level. Very few had five or more math methods or math content courses (two and five teachers, respectively). Table 3.2. Frequency of Teacher Preparation Characteristics among Teachers in the Study (n = 45)
Note. Survey items regarding the number of courses teachers completed in each area were asked across all college or university coursework completed as of the date of the survey; consequently, coursework frequency data are included for teachers regardless of route to certification, and courses may have been completed in conjunction with either traditional or alternative certification programs. MKT, a More Proximal Knowledge Measure To interpret MKT results, we converted raw scores on the MKT measure to standardized zscores using a conversion table provided by the MKT developers. Thus, MKT scores as we report them indicate the number of standard deviations from the mean of the pilot sample used in scaling the instrument. An MKT score of 0 indicates a teacher with the same MKT score as the average of teachers in the pilot sample. A score of 1 indicates performance that is one standard deviation above the average of teachers in the pilot study, and a score of 1 indicates performance that is one standard deviation below the average. As Table 1 shows, mean MKT scores were .42 in the fall of Year 1, .38 in the spring of Year 1, and .13 in the spring of Year 2. A few teachers scored high on the test, and a few extremely low, with most falling in the .2 to 1 range. The distribution of scores remains the same for all three time points. Given this similarity in the distribution of scores across all teachers, aggregated MKT scores did not significantly change from one time point to the next, even when examining fall Year 1 to spring Year 2 (though this change was marginally significant). However, individual teachers tended to exhibit positive growth in MKT scores over time; thus, teacher growth trajectories from the fall of their first year of teaching to the spring of their second year were significant. On average, teachers increased their MKT score about .2 points per time period (effective size = .2, p < .01). Figure 1 shows the observed individual teacher MKT growth trajectories as well as the predicted growth trend. Figure 1. Growth of MKT scores IQA We separately analyzed each of our eight IQA dimensions as well as the index variables of rigor and discussion that we created. Table 1 shows the mean scores for the IQA variables. Scores are generally low, between 1 and 2 on a scale from 0 to 4; teachers tended to cluster around a score of 2, and none were rated overall as 4 on any dimension of the IQA.[3] The correlation matrix in Table 2 shows that for the discussionrelated indicators, teachers who rank high on one dimension tend to be high on all of them. Similarly, the two dimensions of rigor (task potential and task implementation) are highly correlated (.78). Figure 2. Growth of IQA task rigor There was no significant teacher growth on any of the IQA dimensions. Figures 2 and 3 show the growth trajectories for the rigor and discussion indices, respectively. Trend lines moved both upward and downward, but usually by only .1 or .2 on the 0 to 4 scale. To investigate the extent to which measurement error played a role in our findings, we conducted several sensitivity analyses. Careful comparison of raters determined that no raters or rater pairs consistently rated teachers lower or higher.[4] Still, for at least 40% of the individual lessons we observed, raters’ paired ratings differed on at least one of the eight IQA dimensions by more than 1 point on the 5point coding scale. Further, these discrepancies were not confined to one set of teachers, schools, or districts. There tended to be more agreement on the dimensions of Task Potential and Task Implementation than for the discussionrelated dimensions. Figure 3. Growth of IQA discussion We conducted sensitivity analyses that included estimating trends and relationships using (a) the highest IQA score in the pair; (b) the average of the two; (c) having a third coder code a subset of video observations, and then averaging the three codes; and (d) taking the average of the two (of three) codes that were closest together. None of these strategies made a difference in terms of significance of growth over time in the IQA. Further, to increase the reliability of the measure we averaged the fall and spring scores for Year 1 and compared averaged Year 1 scores to spring Year 2 scores. The results still suggested no significant gains or losses. Topic Coverage and Cognitive Demand Emphasis As illustrated in Table 1, from the fall of Year 1 to the spring of Year 2, teachers showed no significant change on cognitive demand emphasis, remaining around 48% to 49% at all three time points. From the fall of Year 1 to the spring of Year 2, teachers did significantly increase the number of topics they covered and significantly decrease their emphasis on basic topics. (Figure 4 illustrates the growth curve for basic topics.) As Table 1 shows, in the first semester of teaching, teachers averaged about 52 topics, compared to 72 in the spring of the first year and 71 in the spring of the second year. Coverage of basic topics moved from 47% to 29% to 27% of instruction time over the first two years of teaching. Course content and student level likely affected the dimensions of instruction we measured; however, after controlling for these variables, we found no significant patterns. (We controlled for course content using the title of the course [e.g., algebra, prealgebra]; for student level, we used the teachers’ rating of the percentage of students at or below grade level.) Though there is withinteacher variation on the amount of time spent on higherorder cognitive demands, on average, acrossteacher variation remains quite consistent. Figure 4. Growth curve of time on basic topic instruction RESEARCH QUESTION 2: TO WHAT EXTENT IS A PROXIMAL MEASURE OF TEACHER KNOWLEDGE ASSOCIATED WITH INSTRUCTIONAL QUALITY? DOES THIS RELATIONSHIP CHANGE FROM THE FIRST TO THE SECOND YEAR? Here we tested the link between knowledge and instruction. Specifically, we examined our hypotheses that higher scores on the MKT would predict more rigor and highquality discussion, coverage of fewer topics, more emphasis on higherorder cognitive demands, and less emphasis on basic topics. We expected teacher knowledge (as measured by the MKT) to translate into higherorder instruction, but we did not see much evidence of this. Correlational and regression analysis results do not show any consistent relationship between knowledge and instruction. The only marginally significant correlations between knowledge and instruction were in fall of Year 1 (not shown), where the MKT score marginally correlates with student linking and teachers’ pressing for reasoning on student responses. Ordinary least squares regression results are compiled in Table 4. A teacher’s MKT score was not a significant predictor of either the rigor or discussion index in the first or second year of teaching. Table 4 also shows that MKT is not a significant predictor of the number of topics teachers covered, the proportion of time they spent on higherorder cognitive demands, or the proportion of time they spent on basic topics. Table 4. Standardized Coefficients Predicting Instruction from MKT Score
These findings could reflect true weak links between the MKT and instructional quality as we measured it or, alternatively, they could result from both MKT and instructional quality being at such low levels. We examined whether restriction of range was influencing our results. We explored whether we could detect a relationship between MKT and instructional quality with highscoring teachers (MKT scores above 1) and lowscoring teachers (MKT scores about 0, which is the average in the national sample). We still found no relationship. Our results suggest that we do not have a restriction of range problem, but rather that there is no clear pattern between high MKT scorers and the IQA dimensions. RESEARCH QUESTION 3: TO WHAT EXTENT IS A PROXIMAL MEASURE OF TEACHER KNOWLEDGE A BETTER PREDICTOR OF INSTRUCTIONAL QUALITY THAN FREQUENTLY USED MORE DISTAL MEASURES OF TEACHER KNOWLEDGE? To answer this question, we first asked, Do distal teacher knowledge measures predict instruction in the first two years of teaching? We hypothesized that a math education major might be associated with less rigor and a math major with more rigor (Rivkin, Hanushek, & Kain, 2005), but in our data both majors predicted less rigor. As Table 5 shows, in the fall of Year 1, having a degree in math education (as opposed to having a degree in any other discipline) predicts less rigor (β = .445, p = .01 for fall Year 1, which translates to a .29 decrease in rigor score, on the 0 to 4 scale [5]). In the spring of Year 2, having a math or science major or minor was associated with less rigor in the lesson (β = .481, p = .01); this translates to a reduction of the IQA rigor score by .49.[6] Math/science major/minor approached significance in the fall of Year 1 (β = .330, p = .07), similarly predicting less rigor. We discuss possible explanations for this in the next section. Table 5. Standardized Coefficients Predicting Instruction from Distal Knowledge Variables and MKT Scores
Standardized beta coefficients; t statistics in parentheses. *p < 0.05. ** p < .01. ***p < .001. In contrast, student teaching experience tended to be related to better instructional quality—more rigor, fewer topics (marginally), and fewer basic topics. Specifically, the more time a teacher spent student teaching, the more rigorous her observed lessons were in the fall of the first year (β = .406, p = .01). This suggests that an increase of one week of student teaching translates into a very small (.01) increase in the teacher’s rigor score.[7] The number of weeks of student teaching approached significance for predicting the coverage of fewer topics (β = .303, p = .09) in the spring of Year 2. It was significant in predicting less coverage of basic topics in the spring of Year 2 (β = .571, p = .001), which translates to an additional week of student teaching decreasing an emphasis on basic topics by 1% (the unstandardized beta coefficient is .01). In the fall of Year 1, the number of advanced math courses approached significance for predicting higher cognitive demand emphasis (β = .408, p = .07). None of the background variables at any time points predicted teachers’ IQA discussion scores. Next, we correlated the distal measures with the MKT. Table 6 shows correlations and ttests between the fall and spring MKT scores and our four measures of math background. Previous work showed that higher scores on the MKT were associated with teachers’ experience teaching math, subjectmatter certification, and math coursework (Hanushek & Rivkin, 2006; Hill, 2007; Hill et al., 2005). Our results are consistent with this research and also consistent with work showing the importance of student teaching (e.g., Papay, West, Fullerton, & Kane, 2012).
Table 6 shows that the number of advanced math courses is significantly correlated with the MKT score for each of our three time points (.45 in fall of Year 1, .39 in spring of Year 1, and .40 in spring of Year 2). Further, a teacher with a degree in math or science was significantly more likely to have a higher MKT score in the spring of Year 2. The number of weeks of student teaching is correlated with MKT in the fall of Year 1 (.33) and the spring of Year 1 (.40), and is marginally correlated with the MKT in the spring of Year 2 (.29). To test these relationships in a multivariate model, we included both the distal and proximal measures of teacher knowledge in the same model to compare how well the MKT and more distal knowledge measures predicted the quality of instruction. MKT was not a significant predictor of any of our five types of instruction for any of the three time points of analysis. We include the results in Table 5 and discuss them below. Fall of Year 1 When we added MKT to the firstyear models, the number of advanced courses in mathematics became significant in predicting more emphasis on higherorder cognitive demands (b = .540, p = .02). This result suggests that when MKT is held constant, teachers who take more advanced math courses may emphasize higherorder cognitive demands more often. There were very small changes in coefficients predicting rigor: having a math education degree became associated with even less rigor (b = .458, p = .01), and student teaching had a stronger association with more rigor (b = .418, p = .01). If we interpret as “real” these changes in coefficients from the model without MKT to the model with MKT, they suggest that MKT scores explained a small part of the negative relationship between a math education degree and rigor, and a small part of the positive relationship between rigor and student teaching and advanced courses. Spring of Year 1 None of the distal knowledge variables are significant predictors of any of our instruction variables in the spring of Year 2. During this time period the coefficients for the distal knowledge variables did not change when we added MKT to the model. Spring of Year 2 In spring of Year 2, the entry of MKT into the equation only slightly alters two sets of relationships. The negative relationship between having a math or science degree becomes a bit stronger in predicting less rigor (b = .493, p = .01), and the number of weeks of student teaching becomes a bit weaker in predicting time spent on basic topics (b = .540, p = .002). This implies (weakly) that of teachers with a math or science degree, those with higher MKT would have more rigorous instruction; and similarly, that for teachers with a similar number of weeks of student teaching, those with higher MKT would focus less on basic instruction. All other relationships (or lack of relationships) between distal measures of knowledge and dimensions of instruction remained the same. Below we discuss the results. Table 7 provides a heuristic summary of key results across our analyses, to aid with synthesizing the findings. Table 7. Heuristic Summary of Key Results RQ 1
RQs 2 and 3. Patterns in How MKT and Distal Knowledge Measures Predict Instruction Quality: Models with and without MKT
DISCUSSION Before delving into the implications of our study, we first want to call attention to some caveats that should be considered when interpreting the results. First, the sample is a volunteer sample of teachers and is not nationally representative, although arguably it is representative of new middle school math teachers in each district, given the 90% or more uptake rate for our study. Further, instruction is a multidimensional concept, and we measure only particular aspects of instruction, leaving many important components of instruction unexplored. And although the study provides useful information about how MKT scores may change over time and be linked to instructional quality, the longitudinal design, while an improvement over crosssectional correlational analyses, still does not allow for a rigorous causal conclusion indicating that knowledge caused a change in instruction. And finally, our work does not consider student achievement, so we do not know whether the variation and growth that we found produce differences in student performance. LEVEL, VARIATION, AND CHANGES IN KNOWLEDGE AND INSTRUCTIONAL QUALITY Against the backdrop of several years of No Child Left Behind’s focus on upgrading teacher qualifications, especially for middle school math, we were surprised to find so many beginning math teachers without either a degree or substantial coursework in math. Recall that this study included teachers from 11 districts, several of which are moderately sized, in the southeast and midAtlantic regions of the country. These results suggest that despite recent attempts to upgrade the subjectarea expertise of middle school teachers, there is still room for significant improvement for states and districts to do a better job of ensuring that math teachers have a firm grounding in subjectmatter knowledge. This could be done through hiring and assignment practices (i.e., do not hire or assign teachers with no math background to teach math) where there are no shortage issues; in other locations, districts could provide intensive inservice subjectmatter professional development for teachers with weak math backgrounds. We also found that beginning teachers in our study generally had low levels of knowledge (as measured by the MKT), a balanced approach to cognitive demands, low levels of discussion quality, and substantial acrossteacher variation in topic coverage. The national norms available for the MKT are not directly comparable to our sample of firstyear middle school mathematics teachers, but Hill (2007) reports that teachers in their first and second years of teaching scored on average slightly below .03, with scores increasing along with teaching experience. Also, pilot studies for both the MKT and IQA showed similarly low scores in the same ranges we found (Hill et al., 2004; Matsumura et al., 2006). In line with recent policy recommendations (National Mathematics Advisory Panel, 2008), teachers in our study reported a balanced approach to cognitive demands, spending about half of their time on higherorder cognitive demands (such as estimation, conjecture, and solving novel problems), and half on procedures and memorization. One hypothesis for why discussion quality was generally low is that the rigor of teachers’ lessons is constrained by the rigor of the curriculum they are required to teach (e.g., Remillard, 2005). In this study, however, we found that teachers often degraded the initial level of rigor of the task, pointing to teacher rather than curriculum influence. This suggests that a teacher may provide lessons that require problem solving and conjecture, but she may not be especially skilled in maintaining the quality of discussion about such a lesson. This conclusion seems reasonable to us, and is consistent with the idea that teachers need a deep understanding of math teaching knowledge to integrate the complex dimensions of highquality mathematics instruction (e.g., Cohen, 1990). We found substantial variation in the percentage of time teachers spent teaching basic topics (from 27% to 47%) and the number of topics covered (from 52 to 72). Education professionals continue to debate how much time teachers should spend on basic topics before moving on to more advanced material, and how to balance breadth with depth (Loveless, 2001); our findings are consistent with the idea that there is considerable variation in how teachers approach these dimensions of instruction. It remains to be seen whether the Common Core Standards will reduce acrossteacher variation in content coverage. Our analysis provides empirical evidence documenting that in their first two years of teaching, middle school math teachers improved in their math knowledge and improved on some but not all measures of instructional quality. Specifically, on average teachers showed small but significant growth on the MKT in the first two years of teaching and significantly decreased their coverage of basic topics and increased their instructional breadth (number of topics covered). The increase in the number of topics in the context of a decrease in time on basic topics suggests that teachers are teaching more advanced topics in their spring and second year of teaching, and spending less time on basic topics. This is consistent with research that suggests that as teachers become more comfortable and knowledgeable about the material they are teaching, they tend toward more advanced and conceptual approaches (e.g., Ball, 1990; Ball et al., 2009). Our study showed no growth in the rigor and quality of classroom discussion or in cognitive demand emphasis, however. We know of little empirical research that charts quantitative measures of teacher knowledge and dimensions of instructional quality over time, so we cannot compare our results to others. But we do have several working hypotheses that we offer as possible explanations for our findings. It could be that teachers are growing in ways not captured by the narrow and focused measures in our study. For example, in their first years, teachers may grow on more affective or organizational dimensions (La Paro, Pianta, & Stuhlman, 2004) or other dimensions of the quality of instruction, such as identifying and diagnosing student errors or presenting material in multiple ways (Leinhardt & Smith, 1985). Additionally, measurement error might influence our ability to detect growth. Still, we are cautiously confident in our measures as a whole, given that we chose previously validated measures and triangulated them with three different modes of data collection (a teacher selfreport, thirdparty observation, and a paper and pencil assessment). We also suspect that more pronounced changes in instruction occur after the second year of teaching. While previous research suggests that instructional quality on average remains at disappointingly low levels throughout a teacher’s career (e.g., Cobb, Wood, Yackel, & McNeal, 1993; Smith, 2005; Stigler, Gallimore, & Hiebert, 2000), other work suggests a steep learning curve in the first two years and improvement after that (FeimanNemser & Parker, 1990; Grossman, 1990; Rockoff, 2004; Papay & Kraft, 2010). Also, it could be that teachers are learning in their first years of teaching but may not be able to apply what they learn systematically to their teaching until later in their careers. Just as teachers are more likely to engage in meaningful discussions as they gain experience in the classroom (Ball, 1990), it may be that only after they master logistics and organization are teachers intellectually able to apply what they are learning in order to change how they approach task rigor, discussion, cognitive demand emphasis, and topic coverage. We might address this by incorporating into teacher learning opportunities (e.g., mentoring, induction, professional development) explicit instructions for how teachers can translate what they learn into their classroom instruction (e.g., Penuel, Gallagher, & Moorthy, 2011). Longitudinal studies of teacher learning that extend to three years and beyond would increase our understanding of teacher growth trajectories. Our findings have implications for policies designed to train and support teachers. The average low levels of mathematics knowledge and instructional quality and low levels of improvement suggest that we need to improve how we train teachers before they come to the classroom as well as how we mentor them once they are in the classroom. Findings from the study suggest that preservice preparation and inservice mentoring would do well to provide opportunities for teachers to practice and receive feedback on the use of particular instructional strategies, in combination with building knowledge for teaching mathematics. THE LINKS BETWEEN KNOWLEDGE AND INSTRUCTION We found no direct relationships between MKT and instructional quality, although our findings suggest possible relationships. To allow for the fact that teachers might not be able to apply their knowledge to the quality of instruction until they master classroom management and student–teacher interaction, we tested immediate and delayed effects. Still, we found no direct relationships between teachers who scored higher on the MKT and task rigor, the use of higher cognitive demands, or less time on basic topics. This is contrary to what we expected, given previous research that indicates a strong conceptual link between teachers’ knowledge of math and the quality of their math instruction (e.g., Leinhardt & Smith, 1985). Given the complex and multidimensional nature of instruction, it is conceivable that the MKT measure, to show effects, must be more closely aligned to the measure of instructional quality. For example, Hill et al. (2007) developed an observation protocol to measure the same dimensions of instruction that are tested on the MKT and found they were significantly related (Hill et al., 2007). The developers of the MKT clearly state that they believe the measure captures the content knowledge that teachers need to teach math effectively, and that higher scores on the scales are related to higherquality math instruction—and thus student achievement (Hill et al., 2007). They do not, however, make broad claims about what aspects of math instruction the MKT might be related to beyond the specific types of knowledge that it measures. In our study, we build on their work to investigate how well the MKT predicts other aspects of instruction. This is consistent with other work that aims to add to our knowledge of the relative reliability and validity of different measures of math instruction, and their strengths, limitations, and validity for predicting gains in student achievement (e.g., the MET project). From our analysis, we do not conclude that teacher knowledge is unrelated to the quality of instruction. Rather, we interpret our findings as evidence that in the first two years of instruction, teachers may be unable to access their knowledge in a way that improves the quality of instruction on the dimensions we measured. This may be because they are focusing on classroom management and other issues that are especially challenging for beginning teachers. While we know of no study that analyzes empirically the relationship between knowledge and instruction over several time points during the first two years of teaching, Baumert et al. (2010) found such links, suggesting that our findings may not hold for veteran teachers. The weak links between the MKT and IQA suggest, as we already knew, that teacher knowledge and instruction are complex, and not all types of knowledge are related to all the important aspects of instruction. This has implications for how we certify and test teachers and identify knowledge “necessary” for teaching. We are amassing a body of evidence that casts doubt on the ability of any particular measure to predict multiple aspects of good teaching. This provides support for alternate ways of identifying teacher quality, such as one that involves a measure of the applicant’s actual teaching, either through a submitted video or a practice lesson, as part of the interview process (see Kennedy, 2006). USING DISTAL KNOWLEDGE MEASURES TO PREDICT INSTRUCTION We found little evidence that MKT is a better predictor of instructional quality than distal measures, but we did find suggestive evidence that MKT may help to explain their predictive power. We found that MKT explained a small part of the relationship between advanced coursework and use of higher cognitive demands, and between student teaching and less time on basic instruction. This is consistent with the idea that more advanced coursework and more experience fosters a deeper understanding of mathematics, which in turn helps to improve teachers’ instruction in various ways (Ball, 1990). We found suggestive evidence that taking more advanced math courses predicts desirable teaching practices. Our findings are loosely consistent with previous work showing that math major and advanced coursework are the background characteristics most likely to be associated with student and teacher outcomes (Goldhaber & Brewer, 2000; Hanushek, 1986). This lends support to the practice of requiring teachers in their preparation programs to take indepth, subjectspecific coursework to develop expertise in subject areas.
A possibly novel finding (if only because student teaching is not often used as a predictor of instruction or achievement) is that the number of weeks of student teaching in math was consistently related to more rigorous instruction and less emphasis on basic topics. We found that having 17 weeks or more of student teaching, even with no formal mathematics training, was related to less use of basic instruction. Even though the effect sizes are quite small, these findings call attention to the possible importance of experiential learning as a dimension separate from subjectarea expertise (Kolb, 1984). Longer apprenticeships in teacher preparation programs could go a long way to building teachers’ knowledge and skills. Student teaching experiences can be terrific opportunities for preservice teachers to practice strategies and receive useful feedback about their practice. That being said, the effectiveness of a teacher’s student teaching placement is likely influenced by its relationship to the teacher’s broader teacher education program, and this link should be considered in future studies. Our finding here is consistent with calls for reforming teacher education, such as the expert panel commissioned by the National Council for Accreditation of Teacher Education that concluded that one of the fundamental shifts needed in teacher education was to “move to programs that are fully grounded in clinical practice” (Blue Ribbon Panel, 2010, p. ii). Also, the importance of the length of placement is likely related to providing increased opportunities for observation and feedback (Boyd et al., 2009) and other onsite professional learning (Ronfeldt, 2012), which have been shown to matter for later teacher effectiveness. Thus, we emphasize that increasing the length and quality of preservice teacher apprenticeships could be a fruitful avenue to improving teacher preparation programs. Ideally, longer apprenticeships in the classroom would also provide teachers with practice addressing behavior and management issues that may interfere with their ability to translate content knowledge to practice. These findings have implications for how we think about alternative certification programs such as Teach for America (TFA), which place teachers in the classroom without any studentteaching experience. Recent studies have shown that TFA teachers “catch up” to their traditionally certified counterparts in two to three years (Boyd et al., 2009), which has sparked debate about how the program’s model of training might be improved. While much debate has focused on course content, our findings suggest one avenue for improvement might be to provide alternatively certified teachers at least some level of experience in the classroom before becoming the “teacher of record” in their own classroom. NEXT STEPS We offer our findings as a starting point to suggest how teacher knowledge and instructional improvement trajectories may operate. Given the relative newness of both the MKT and using observation protocols with substantial samples of teachers, such as in the MET project, there are as yet no national norms for what to expect in terms of growth in teacher knowledge in the early years of teaching (or even for veteran teachers). Some research has documented small change and substantial variation (Parise & Spillane, 2010), but our field does not have empirical, systematic data on the growth of instructional quality over time. While decades of work in education suggest we should expect modest effect sizes (see Bloom, Hill, Black, & Lipsey, 2008), we encourage studies that chart teacher knowledge and instruction growth using multiple measures, with different samples of teachers, for more than two years, to see how consistent effect sizes are across different samples, contexts, and instructional quality measures. We hope that our study, along with recent work by others (Baumert et al., 2010; Hill, 2007), will allow other researchers to benchmark their results so that we can begin to understand growth in knowledge and instructional quality, and how these two factors are related to each other on particular dimensions. Such information will allow us to better shape and evaluate teacher preparation programs and policies to support the learning of inservice teachers. Notes 1. There are debates about how large a sample should be to justify imputation (Graham, 2009; Graham & Schafer, 1999; Hardt, Herke, & Leonhart, 2012). We took a conservative approach in not imputing data with a modest sample size. As a sensitivity test, we ran a multiple imputation model; resulting parameter estimates and significance levels remained the same. 2. We consider this work exploratory in an area that has been given little attention (i.e., empirical documentation of the relationship between knowledge and instructional quality growth in new teachers). As such, multiple comparison corrections such as BenjaminHochberg are not necessarily called for as they would be in an experimental setting where the focus is on the significance of the treatment parameter and ignoring all the covariates in the model (see What Works Clearinghouse, 2010). 3. As noted above, IQA scores on each dimension were based on the average of two independent raters’ ratings across two separate lessons. Using this approach, no teachers scored close to a 4 on any dimension of the IQA. In some instances, individual raters assigned a score of 4 to individual lessons, but these individual scores decreased when we aggregated scores across raters and lessons. 4. For each lesson, we correlated IQA ratings from the two raters, and we conducted a oneway ANOVA to determine whether interrater correlations differed significantly by rater pair. They did not: F(41, 315) = 1.14, p = .27. We also analyzed the difference between raters across lessons. Although we found statistically significant differences in average ratings on four IQA dimensions for some rater pairs when we looked across all lessons coded by each rater, these differences were not attributable solely to the raters because lessons were not randomly assigned for rating, and individual raters were members of multiple rater pairs. Thus, we did not find clear evidence that would identify an outlier rater or pair of raters. 5. The unstandardized coefficient of .56 for fall of year 1*standard deviation of .53=.29. 6. .91 (unstandardized beta)*.54 (SD for IQA rigor in spring year 2) = .49 7. 03 (unstandardized beta)*.53 (SD for IQA rigor in fall year 1) = .01 References Alonzo, A. C. (2007). Challenges of simultaneously defining and measuring knowledge for teaching. Measurement: Interdisciplinary Research and Perspectives, 5(2–3), 131–137. doi:10.1080/15366360701487203 Ball, D. L. (1990). The mathematical understandings that prospective teachers bring to teacher education. Elementary School Journal, 90(4), 449–466. Ball, D. L. (1991). Research on teaching mathematics: Making subject matter part of the equation. In J. Brophy (Ed.), Advances in research on teaching (vol. 2, pp. 1–48). Greenwich, CT: JAI Press. Ball, D. L. (1993). With an eye on the mathematical horizon: Dilemmas of teaching elementary school mathematics. Elementary School Journal, 93(4), 373–397. Ball, D. L. (2000). Bridging practices: Intertwining content and pedagogy in teaching and learning to teach. Journal of Teacher Education, 51(3), 241–247. doi:10.1177/0022487100051003013 Ball, D. L., Lubienski, S. T., & Mewborn, D. S. (2001). Research on teaching mathematics: The unsolved problem of teachers’ mathematical knowledge. In V. Richardson (Ed.), Handbook of research on teaching (4th ed., pp. 433–456). Washington, DC: American Educational Research Association. Ball, D. L., & Rowan, B. (2004). Introduction: Measuring instruction. Elementary School Journal, 105(1), 3–10. Ball, D. L., Sleep, L., Boerst, T. A., & Bass, H. (2009). Combining the development of practice and the practice of development in teacher education. Elementary School Journal, 109(5), 458–474. Ball, D. L., Thames, M. H., & Phelps, G. (2008). Content knowledge for teaching: What makes it special? Journal of Teacher Education, 59(5), 389–407. doi:10.1177/0022487108324554 Baumert, J., Kunter, M., Blum, W., Brunner, M., Voss, T., Jordan, A., & Tsai, Y. (2010). Teachers’ mathematical knowledge cognitive activation in the classroom, and student progress. American Educational Research Journal, 47(1), 133–180. doi:10.3102/0002831209345157 Baxter, J. A., & Lederman, N. G. (1999). Assessment and measurement of pedagogical content knowledge. In J. GessNewsome & N. G. Lederman (Eds.), Examining pedagogical content knowledge (pp. 147–161). Dordrecht, The Netherlands: Kluwer. Begle, E. G. (1972). Teacher knowledge and pupil achievement in algebra (NLSMA Technical Report No. 9). Palo Alto, CA: Stanford University, School Mathematics Study Group. Begle, E. G. (1979). Critical variables in mathematics education: Findings from a survey of the empirical literature. Reston, VA: National Council of Teachers of Mathematics. Blank, R. K., Birman, B., Garet, M., Yoon, K. S., Jacobson, R., Smithson, J., & Minor, A. (2005). Longitudinal study of the effects of professional development on improving mathematics and science instruction, Year 2 progress report (MSP PD study). Washington, DC: Council of Chief State School Officers. Blank, R. K., Porter, A., & Smithson, J. (2001). New tools for analyzing teaching, curriculum and standards in mathematics & science. Results from Survey of Enacted Curriculum Project. Washington, DC: Council of Chief State School Officers. Bloom, H. S., Hill, C. J., Black, A. R., & Lipsey, M. W. (2008). Performance trajectories and performance gaps as achievement effectsize benchmarks for educational interventions. Journal of Research on Educational Effectiveness, 1(4), 289–328. Blue Ribbon Panel on Clinical Preparation and Partnerships for Improved Student Learning. (2010). Transforming teacher education through clinical practice: A national strategy to prepare effective teachers (Report commissioned by the National Council for Accreditation of Teacher Education). Washington, DC. Boardman, A. E., Davis, O. A., & Sanday, P. R. (1977). A simultaneous equations model of the educational process. Journal of Public Economics, 7, 23–49. Borko, H., Eisenhart, M., Brown, C. A., Underhill, R. G., Jones, D., & Agard, P. C. (1992). Learning to teach hard mathematics: Do novice teachers and their instructors give up too easily? Journal for Research in Mathematics Education, 23(3), 194–222. Borko, H., Stecher, B. M., Martinez, F., Kuffner, K. L., Barnes, D., Arnold, S. C., & Gilbert, M. L. (2006). Using classroom artifacts to measure instructional practice in middle school science: A twostate field test (CSE Technical Report 690). Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing. Boston, M., & Wolf, M. K. (2006). Assessing academic rigor in mathematics instruction: The development of the Instructional Quality Assessment Toolkit (CSE Technical Report 672). Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing. Boyd, D., Grossman, P. L., Hammerness, K., Lankford, R. H., Loeb, S., McDonald, M., . . . Wyckoff, J. (2008). Surveying the landscape of teacher education in New York City: Constrained variation and the challenge of innovation. Educational Evaluation and Policy Analysis, 30(4), 319–343. Boyd, D., Grossman, P. L., Lankford, H., Loeb, S., & Wyckoff, J. (2009). Teacher preparation and student achievement. Educational Evaluation and Policy Analysis, 31(4), 416–440. Carpenter, T. P., Fennema, E., & Franke, M. L. (1996). Cognitively guided instruction: A knowledge base for reform in primary mathematics instruction. Elementary School Journal, 97(1), 3–20. Carpenter, T. P., Fennema, E., Peterson, P. L., Chiang, C.P., & Loef, M. (1989). Using knowledge of children’s mathematics thinking in classroom teaching: An experimental study. American Educational Research Journal, 26(4), 499–531. doi:10.3102/00028312026004499 Charalambous, C. Y. (2010). Mathematical knowledge for teaching and task unfolding: An exploratory study. Elementary School Journal, 110(3), 247–278. Cobb, P., Wood, T., Yackel, E., & McNeal, E. (1993). Mathematics as procedural instructions and mathematics as meaningful activity: The reality of teaching for understanding. In R. Davis & C. Maher (Eds.), Schools, mathematics, and the world of reality (pp. 119–133). Boston, MA: Allyn & Bacon. CochranSmith, M., & Lytle, S. I. (1999). Relationships of knowledge and practice: Teacher learning in communities. Review of Research in Education, 24(1), 249–305. doi:10.3102/0091732X024001249 CochranSmith, M., & Zeichner, K. M. (Eds.). (2005). Studying teacher education: The report of the AERA panel on research and teacher education. Mahwah, NJ: Lawrence Erlbaum. Cohen, D. (1990). A revolution in one classroom: The case of Mrs. Oublier. Educational Evaluation and Policy Analysis, 12(3), 311–329. Cohen, D. K., McLaughlin, M. W., & Talbert, J. E. (Eds.). (1993). Teaching for understanding: Challenges for policy and practice. San Francisco, CA: JosseyBass. Council of Chief State School Officers. (CCSSO). (2005). Surveys of enacted curriculum: A guide for SEC state and local coordinators. Washington, DC: Author. Crick, J. E., & Brennan, R. L. (2001). GENOVA (Version 3.1) [Computer software]. DarlingHammond, L., & BaratzSnowden, J. (2005). A good teacher in every classroom: Preparing the highly qualified teachers our children deserve. San Francisco, CA: John Wiley & Sons. Duncan, T. D., Duncan, S. C., Strycker, L. A., Li, R., & Alpert, A. (1999). An introduction to latent variable growth curve modeling: Concepts, issues, and applications. Mahwah, NJ: Lawrence Erlbaum. FeimanNemser, S., & Parker, M. B. (1990). Making subject matter part of the conversation in learning to teach. Journal of Teacher Education, 41(3), 32–43. doi:10.1177/002248719004100305 Fennema, E., & Franke, M. L. (1992). Teachers’ knowledge and its impact. In D. A. Grouws (Ed.), Handbook of research on mathematics teaching and learning: A project of the National Council of Teachers of Mathematics (pp. 147–164). New York, NY: Macmillan. Ferguson, R. F. (1991). Paying for public education: New evidence on how and why money matters. Harvard Journal on Legislation, 28(2), 465–498. Gamoran, A., Porter, A. C., Smithson, J., & White, P. A. (1997). Upgrading high school mathematics instruction: Improving learning opportunities for lowachieving, lowincome youth. Educational Evaluation and Policy Analysis, 19(4), 325–338. doi:10.3102/01623737019004325 Gearhart, M. (2007). Mathematics knowledge for teaching: Questions about constructs. Measurement: Interdisciplinary Research and Perspectives, 5(2–3), 173–180. doi:10.1080/15366360701487617 Goldhaber, D. D., & Brewer, D. J. (2000). Does teacher certification matter? High school teacher certification status and student achievement. Educational Evaluation and Policy Analysis, 22(2), 129–145. doi:10.3102/01623737022002129 Good, T. L., & Brophy, J. E. (2003). Looking in classrooms (9th ed.). Boston, MA: Allyn & Bacon. Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576. Graham, J. W., & Schafer, J. (1999). On the performance of multiple imputation for multivariate data with small sample size. In R. Hoyle (Ed.), Statistical strategies for small sample research (pp. 1–29). Thousand Oaks, CA: Sage. Greenwald, R., Hedges, L. V., & Laine, R. D. (1996). The effect of school resources on student achievement. Review of Educational Research, 66(3), 361–396. doi:10.3102/00346543066003361 Grossman, P. L. (1990). The making of a teacher: Teacher knowledge and teacher education. New York, NY: Teachers College Press. Haertel, E. H. (1991). New forms of teacher assessment. Review of Research in Education, 17, 3–29. Hanushek, E. A. (1972). Education and race: An analysis of the educational production process. Lexington, MA: DC Heath. Hanushek, E. A. (1981). Throwing money at schools. Journal of Policy Analysis and Management, 1(1), 1941. doi:10.2307/3324107 Hanushek, E. A. (1986). The economics of schooling: Production and efficiency in public schools. Journal of Economic Literature, 24(3), 1141–1177. Hanushek, E. A. (2011). The economic value of higher teacher quality. Economics of Education Review, 30(3), 466–479. Hanushek, E. A., & Rivkin, S. G. (2006). School quality and the blackwhite achievement gap (NBER W12651). Cambridge, MA: National Bureau of Economic Research. Hanushek, E. A., & Rivkin, S. G. (2010). The quality and distribution of teachers under the No Child Left Behind Act. Journal of Economic Perspectives, 24(3), 133–150. Harbison, R. W., & Hanushek, E. A. (1992). Educational performance of the poor: Lessons from rural Northeast Brazil. New York, NY: Oxford University Press. Hardt, J., Herke, M., & Leonhart, R. (2012). Auxiliary variables in multiple imputation in regression with missing X: A warning against including too many in small sample research. BMC Medical Research Methodology, 12(1), 184. Hiebert, J., Carpenter, T. P., Fennema, E., Fuson, K., Human, P., Murray, H., . . . Wearne, D. (1996). Problem solving as a basis for reform in curriculum and instruction: The case of mathematics. Educational Researcher, 25(4), 12–21. doi:10.3102/0013189X025004012 Hiebert, J., Carpenter, T., Fennema, E., Fuson, K., Wearne, D., Murray, H., & Human, P. (1997). Making sense: Teaching and learning mathematics with understanding. Portsmouth, NH: Heinemann. Hill, H. C. (2007). Mathematical knowledge of middle school teachers: Implications for the No Child Left Behind policy initiative. Education Evaluation and Policy Analysis, 29(2), 95–114. doi:10.3102/0162373707301711 Hill, H., & Ball, D. L. (2009). The curious—and crucial—case of mathematical knowledge for teaching. Phi Delta Kappan, 91(2), 68–71. Hill, H. C., Ball, D. L., Blunk, M., Goffney, I. M., & Rowan, B. (2007). Validating the ecological assumption: The relationship of measure scores to classroom teaching and student learning. Measurement: Interdisciplinary Research and Perspectives, 5(2–3), 107–118. doi:10.1080/15366360701487138 Hill, H. C., Ball, D. L., & Schilling, S. G. (2008). Unpacking pedagogical content knowledge: Conceptualizing and measuring teachers’ topicspecific knowledge of students. Journal for Research in Mathematics Education, 39(4), 372–400. Hill, H. C., Blunk, M. L., Charalambous, C. Y., Lewis, J. M., Phelps, G. C., Sleep, L., & Ball, D. L. (2008). Mathematical knowledge for teaching and the mathematical quality of instruction: An exploratory study. Cognition and Instruction, 26(4), 430–511. doi:10.1080/07370000802177235 Hill, H. C., Rowan, B., & Ball, D. L. (2005). Effects of teachers’ mathematical knowledge for teaching on student achievement. American Educational Research Journal, 42(2), 371–406. doi:10.3102/00028312042002371 Hill, H. C., Schilling, S. G., & Ball, D. L. (2004). Developing measures of teachers’ mathematics knowledge for teaching. Elementary School Journal, 105(1), 11–30. Ho, A. D., & Kane, T. J. (2013, January). The reliability of classroom observations by school personnel (A Measures of Teaching Effectiveness Research Paper). Seattle, WA: Bill and Melinda Gates Foundation. Hollins, E., & Guzman, M. T. (2005). Research on preparing teachers for diverse populations. In M. CochranSmith & K. Zeichner (Eds.), Studying teacher education: The report on the AERA panel on research and teacher education (pp. 477–548). Mahwah, NJ: Erlbaum. Ingersoll, R. M., & Perda, D. (2010). Is the supply of mathematics and science teachers sufficient? American Educational Research Journal, 47(3), 563–594. Junker, B. W., Matsumura, L. C., Crosson, A., Wolf, M. K., Levison, A., Wiesberg, J., & Resnick, L. (2006). Overview of the Instructional Quality Assessment (CSE Technical Report 671). Los Angeles, CA: University of California, National Center for Research on Evaluation, Standards, and Student Testing. Kennedy, M. (1997). Defining optimal knowledge for teaching science and mathematics. Research monograph no. 10. Madison, WI: National Institute for Science Education. Kennedy, M. M. (1999). Approximations to indicators of student outcomes. Educational Evaluation and Policy Analysis, 21(4), 345–363. doi:10.3102/01623737021004345 Kennedy, M. M. (2006). From teacher quality to quality teaching. Educational Leadership, 63(6), 14–19. Kersting, N. B., Givvin, K. B., Sotelo, F. L., & Stigler, J. W. (2010). Teachers’ analyses of classroom video predict student learning of mathematics: Further explorations of a novel measure of teacher knowledge. Journal of Teacher Education, 61(1–2), 172–181. doi:10.1177/0022487109347875 Kilday, C. R., & Kinzie, M. B. (2009). An analysis of instruments that measure the quality of mathematics teaching in early childhood. Early Childhood Education Journal, 36(4), 365–372. doi:10.1007/s1064300802868 Kilpatrick, J., Swafford, J., & Findell, B. (Eds.). (2001). Adding it up: Helping children learn mathematics. Washington, DC: Mathematics Learning Study Committee, National Research Council. Kolb, D. A. (1984). Experiential learning: Experience as the source of learning and development Englewood Cliffs, NJ: Prentice Hall. Korthagen, F., Loughran, J., & Russell, T. (2006). Developing fundamental principles for teacher education programs and practices. Teaching and Teacher Education, 22, 1020–1041. Krauss, S., Brunner, M., Kunter, M., Baumert, J., Blum, W., Neubrand, M., & Jordan, A. (2008). Pedagogical content knowledge and content knowledge of secondary mathematics teachers. Journal of Educational Psychology, 100(3), 716–725. La Paro, K. M., Pianta, R. C., & Stuhlman, M. (2004). The Classroom Assessment Scoring System: Findings from the prekindergarten year. Elementary School Journal, 10(5), 409–426. Learning Mathematics for Teaching. (2006). A coding rubric for measuring the mathematical quality of instruction (Technical report LMT1.06). Unpublished technical report, University of Michigan, School of Education. Leinhardt, G., & Smith, D. (1985). Expertise in mathematics instruction: Subjectmatter knowledge. Journal of Educational Psychology, 77(3), 247–271. doi:10.1037/00220663.77.3.247 Loveless, T. (Ed.). (2001). The great curriculum debate: How should we teach reading and math? Washington, DC: Brookings Institution. Ma, L. (1999). Knowing and teaching elementary mathematics: Teachers’ understanding of fundamental mathematics in China and the United States. Mahwah, NJ: Lawrence Erlbaum. Magnusson, S., Krajcik, J., & Borko, H. (1999). Nature, sources, and development of pedagogical content knowledge for science teaching. In J. GessNewsome & N. G. Lederman (Eds.), Examining pedagogical content knowledge (pp. 95–132). Dordrecht, The Netherlands: Kluwer. Matsumura, L. C., Slater, S. C., Junker, B., Peterson, M., Boston, M., Steele, M., & Resnick, L. (2006). Measuring reading comprehension and mathematics instruction in urban middle schools: A pilot study of the Instructional Quality Assessment (CSE Technical Report 681). Los Angeles, CA: University of California, National Center for Research on Evaluation, Standards, and Student Testing. Mayer, D. P. (1999). Measuring instructional practice: Can policymakers trust survey data? Educational Evaluation and Policy Analysis, 21(1), 29–45. doi:10.3102/01623737021001029 Met Project. (2010, June). Working with teachers to develop fair and reliable measures of effective teaching. Seattle, WA: Bill and Melinda Gates Foundation. Mihaly, K., McCaffrey, D. F., Staiger, D. O., & Lockwood, J. R. (2013, January). A composite estimator of effective teaching (Measures of Effective Teaching Study Research Paper). Seattle, WA: Bill and Melinda Gates Foundation. MorineDershimer, G., & Kent, T. (1999). The complex nature and sources of teachers’ pedagogical knowledge. In J. GessNewsome & N. G. Lederman (Eds.), Examining pedagogical content knowledge (pp. 21–50). Dordrecht, The Netherlands: Kluwer. Mullens, J. E., Murnane, R. J., & Willett, J. B. (1996). The contribution of training and subject matter knowledge to teaching effectiveness: A multilevel analysis of longitudinal evidence from Belize. Comparative Education Review, 40(2), 139–157. National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Reston, VA: Author. National Council of Teachers of Mathematics. (2006). Curriculum focal points for prekindergarten through grade 8 mathematics: A quest for coherence. Reston, VA: Author. National Mathematics Advisory Panel. (2008). Foundations for success: The final report of the National Mathematics Advisory Panel. Washington, DC: U.S. Department of Education. New Teacher Center. (2013). Crossstate analysis of results of 201213 Teaching Empowering Leading and Learning (TELL) Survey Research Report. Santa Cruz, CA: Author. Papay, J. P., & Kraft, M. A. (2010, October). Do teachers continue to improve with experience? Evidence of longterm career growth in the teacher labor market. Paper presented at the annual meeting of the Association for Public Policy Analysis and Management, Boston, MA. Papay, J. P., West, M. R., Fullerton, J. B., & Kane, T. J. (2012). Does an urban teacher residency increase student achievement? Early evidence from Boston. Educational Evaluation and Policy Analysis, 34(4), 413–434. Parise, L. M., & Spillane, J. P. (2010). Teacher learning and instructional change: How formal and onthejob learning opportunities predict changes in elementary school teachers’ instructional practice. Elementary School Journal, 110(3), 323–346. Penuel, W. R., Gallagher, L. P., & Moorthy, S. (2011). Preparing teachers to design sequences of instruction in earth systems science: A comparison of three professional development programs. American Educational Research Journal, 48(4), 996–1025. Phelps, G. (2009). Just knowing how to read isn’t enough! Assessing knowledge for teaching reading. Educational Assessment, Evaluation and Accountability, 21(2), 137–154. doi:10.1007/s1109200990706 Phelps, G., & Schilling, S. (2004). Developing measures of content knowledge for teaching reading. Elementary School Journal, 105(1), 31–48. Phillips, K. J. R. (2010). What does “highly qualified” mean for student achievement? Evaluating the relationships between teacher quality indicators and atrisk students’ mathematics and reading achievement gains in first grade. The Elementary School Journal, 110(4), 464–493. Piasta, S. B., Connor, C. M., Fishman, B. J., & Morrison, F. J. (2009). Teachers’ knowledge of literacy concepts, classroom practices, and student reading growth. Scientific Studies of Reading, 13(3), 224–248. doi:10.1080/10888430902851364 Porter, A. C. (1989). A curriculum out of balance: The case of elementary school mathematics. Educational Researcher, 18(5), 9–15. (Also Research Series No. 191, East Lansing, MI: Michigan State University, Institute for Research on Teaching.) doi:10.3102/0013189X018005009 Porter, A. C. (2002). Measuring the content of instruction: Uses in research and practice. Educational Researcher, 31(7), 3–14. doi:10.3102/0013189X031007003 Porter, A. C., Polikoff, M. S., & Smithson, J. (2009). Is there a de facto national intended curriculum? Evidence from state content standards. Educational Evaluation and Policy Analysis, 31(3), 238–268. doi:10.3102/0162373709336465 Porter, A. C., Smithson, J. L., Blank, R., & Zeidner, T. (2007). Alignment as a teacher variable. Applied Measurement in Education, 20(1), 27–51. doi:10.1080/08957340709336729 Putnam, R. T., & Borko, H. (1997). Teacher learning: Implications of new views of cognition. In B. J. Biddle, T. L. Good, & I. F. Goodson (Eds.), International handbook of teachers and teaching (Vol. II, pp. 1223–1296). Dordrecht, The Netherlands: Kluwer. Putnam, R., Heaton, R., Prawat, R., & Remillard, J. (1992). Teaching mathematics for understanding: Discussing case studies of four fifthgrade teachers. The Elementary School Journal, 93(2), 213–228. Putnam, R., Lampert, M., & Peterson, P. (1990). Alternative perspectives on knowing mathematics in elementary schools. In C. Cazden (Ed.), Review of research in education (Vol. 16, pp. 57–150). Washington, DC: American Educational Research Association. Quint, J. C., Akey, T. M., Rappaport, S., & Willner, C. J. (2007). Instructional leadership, teaching quality and student achievement: Suggestive evidence from three urban school districts. New York, NY: MDRC. Remillard, J. T. (2005). Examining key concepts in research on teachers’ use of mathematics curricula. Review of Educational Research, 75(2), 211–246. doi:10.3102/00346543075002211 Resnick, L. B., & Hall, M. W. (1998). Learning organizations for sustainable education reform. Daedalus, 127, 89–118. Rivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools and academic achievement. Econometrica, 73(2), 417–458. doi:10.1111/j.14680262.2005.00584.x Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from panel data. American Economic Review, 94(2), 247–252. Rohaan, E. J., Taconis, R., & Jochems, W. M. G. (2009). Measuring teachers’ pedagogical content knowledge in primary technology education. Research in Science & Technological Education, 27(3), 327–338. doi:10.1080/02635140903162652 Ronfeldt, M. (2012). Where should student teachers learn to teach? Effects of field placement school characteristics on teacher retention and effectiveness. Educational Evaluation and Policy Analysis, 34(1), 3–26. Rowan, B., Chiang, F. S., & Miller, R. J. (1997). Using research on employees’ performance to study the effects of teachers on students’ achievement. Sociology of Education, 70(4), 256–284. Sawada, D., Piburn, M., Falconer, K., Turley, J., Benford, R., & Bloom, I. (2000). Reformed Teaching Observation Protocol (ACEPT Technical Report No. IN001). Tempe, AZ: Arizona Collaborative for Excellence in the Preparation of Teachers. Schmidt, W. H., McKnight, C. C., & Raizen, S. A. (1997). A splintered vision: An investigation of U.S. science and mathematics. Executive summary. Lansing, MI: Michigan State University, U.S. National Research Center for the Third International Mathematics and Science Study. Retrieved from http://ustimss.msu.edu/splintrd.pdf Schoen, H. L., Cebulla, K. J., Finn, K. F., & Fi, C. (2003). Teacher variables that relate to student achievement when using a standardsbased curriculum. Journal for Research in Mathematics Education, 34(3), 228–259. Schoenfeld, A. H. (2007). The complexities of assessing teacher knowledge. Measurement: Interdisciplinary Research and Perspectives, 5(2–3), 198–204. doi:10.1080/15366360701492880 Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(2), 4–14. doi:10.3102/0013189X015002004 Shulman, L. S. (1987). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review, 57(1), 1–21. Smith, P. S. (2005). Assessing teacher learning about science teaching: Report of project activities and findings, year two (EHR0335328). Chapel Hill, NC: Horizon Research, Inc. Stein, M. K., Baxter, J. A., & Leinhardt, G. (1990). Subjectmatter knowledge and elementary instruction: A case from functions and graphing. American Educational Research Journal, 27(4), 639–663. doi:10.3102/00028312027004639 Stein, M. K., Grover, B. W., & Henningsen, M. (1996). Building student capacity for mathematical thinking and reasoning: An analysis of mathematical tasks used in reform classrooms. American Educational Research Journal, 33(2), 455–488. doi:10.3102/00028312033002455 Stein, M. K., & Lane, S. (1996). Instructional tasks and the development of student capacity to think and reason: An analysis of the relationship between teaching and learning in a reform mathematics project. Educational Research and Evaluation, 2(1), 50–80. doi:10.1080/1380361960020103 Stigler, J. W., Gallimore, R., & Hiebert, J. (2000). Using video surveys to compare classrooms and teaching across cultures: Examples and lessons from the TIMSS video studies. Educational Psychologist, 35(2), 87–100. Strauss, R. P., & Sawyer, E. (1986). Some new evidence on teacher and student competencies. Economics of Education Review, 5(1), 41–48. doi:10.1016/02727757(86)901615 Tatto, M. T., Nielsen, H. D., Cummings, W., Kularatna, N. G., & Dharmadasa, K. H. (1993). Comparing the effectiveness and costs of different approaches for educating primary school teachers in Sri Lanka. Teaching and Teacher Education, 9(1), 41–64. doi:10.1016/0742051X(93)900148 Thompson, P. W., & Thompson, A. G. (1994). Talking about rates conceptually, Part I: A teacher’s struggle. Journal for Research in Mathematics Education, 25(3), 279–303. U.S. Department of Education. (2013). Preparing and Credentialing the Nation’s Teachers: The Secretary’s Ninth Report on Teacher Quality. Washington, DC: Author. Available at http://www2.ed.gov/about/reports/annual/teachprep/index.html. Wayne, A. J., & Youngs, P. (2003). Teacher characteristics and student achievement gains: A review. Review of Educational Research, 73(1), 89–122. doi:10.3102/00346543073001089 What Works Clearinghouse. (2010). Procedures and standards handbook, v. 2.1. Washington, DC: Institute for Education Sciences. Wideen, M., MayerSmith, J., & Moon, B. (1998). Critical analysis of the research on learning to teach: Making the case for an ecological perspective on inquiry. Review of Educational Research, 68, 130–178. Wilson, S. M., & Floden, R. E. (2003). Creating effective teachers: Concise answers for hard questions (An addendum to the report Teacher preparation research: Current knowledge, gaps, and recommendations). Washington, DC: AACTE Publications. Wilson, S. M., Floden, R., & FerriniMundy, J. (2001). Teacher preparation research: Current knowledge, gaps, and recommendations (A research report prepared for the U.S. Department of Education). Seattle, WA: Center for the Study of Teaching and Policy, University of Washington.


