A study on pre-service mathematics teachers’ criteria of proof evaluation

Proof is foundational to mathematics, and constructing proofs and establishing their validity are both important mathematical studies. Determining the validity of a proof is a part of the process of proof evaluation. Proof evaluation contributes to students’ ability to construct and revise their own proofs. The purpose of this study was to determine the criteria pre-service mathematics teachers take into account in evaluating a proof. The study was carried out with 50 first-year university students enrolled in an elementary mathematics teacher education program. The data were collected through activities relating to proving. The results of this study revealed that, when evaluating a proof, the participants regarded use of appropriate definitions, axioms, or theorems in the steps of the proof as the most important criterion with reference to justification, while in terms of mathematical language; they regarded appropriate use of symbolic language as the most important criterion. However, they tended to ignore situations where non-symbolic language was used. To address this issue, it is recommended that mathematics-learning environments include the use of non-symbolic language, as well as symbolic representations, in the definition of mathematical concepts. ProQuest and Theses Global.


Introduction
Mathematical proof, being foundational to mathematics, is a concept that distinguishes mathematics from other disciplines. Scholars have for this reason often defined the concept of proof as the heart of mathematics (Anderson, 1996;Hanna, 2000;Herbst & Brach, 2006;Hoyles, 1997;Öztürk, 2020;Rav, 1999). Flores (2002) notes that recognition of the relationships between mathematical concepts, realization of mathematical thinking, and making sense of mathematical concepts all depend on proof. As such, the stages of proof writing have an important place in the formation of the specific language of mathematics (Forman, Larreamendy-Joer, Stein & Brown, 1998) and proving should be an essential part of learning mathematics for students of all ages (Stacey & Vincent, 2009). Furthermore, reasoning and proof are each one of significant activities characterized as working mathematically (Clarke, Goos & Morony, 2007). The scope of reasoning proficiency is directly associated with proof at all levels of mathematics education (Lesseig, Hine, Na & Boardman, 2019). This proficiency includes several actions as analyzing, proving, evaluating, explaining, 211 inferring, justifying, and generalizing (Clarke, Clarke & Sullivan, 2012). National Council of Teachers of Mathematics [NCTM] (2000) particularly recommended that importance be attached to the teaching of proof at all levels of mathematics education.
Proving is a comprehensive process that involves several actions as exploring, conjecturing, reasoning, and formulating arguments (Stylianides & Ball, 2008). Furthermore, this process occurs in stages (Hanna, Bruyn, Sidoli & Lomas 2004) representing these kinds of the actions. Many researchers also identified several stages for the proving process and these stages can be followed in the proof-related activities (e.g., Boero, 1999;Edwards, 1997;Perry, Molina, Camargo & Samper, 2011). Thompson, Senk and Johnson (2012) state that the proof-related activities include some stages as make a conjecture, investigate a conjecture, evaluate an argument or proof, find a counter-example to a mathematical claim. Öztürk (2016), expressing that proving is a multi-step process, defined an instructional model that can be used in teaching proof. This model consists of seven stages that reflects the nature of a mathematical proof such as understanding the problem, working on the structure and conjecturing, postulation of the relationship, proving, investigation of the coherence of the proof, [and] formalization of the proof. Investigation of the coherence of the proof as a stage of the model involves examining the validity and coherence of the proofs and expressing any deficiencies and errors, and so it represents the action of proof evaluation. Therefore, proof evaluation is one of the significant actions that construct the final version of the proof.
According to Pfeiffer (2011a), the most significant part of the process of accepting a proof is validation, as a determination of the correctness of an argument. During determining when a mathematical argument becomes a proof of a theorem, mathematicians employ several criteria to decide whether it can be a new mathematical knowledge (Dickerson & Doerr, 2014). Bass (2009) emphasizes the social aspect of proof, noting the act of proving and accepting new proofs constitutes certification of new knowledge within a community of practice. This notion that is upheld by Manin's (2010) assertion that acceptance of a proof occurs as a social process. Given the critical role of proof in the mathematical community, Powers, Craviotto and Grassl (2010) emphasize that proof validation skills are important for every individual who will be involved in the teaching of mathematics.

Proof validation and evaluation in mathematics practice
Examining a mathematical proof with the aim of determining whether it is valid is a significant activity for pre-service teachers and those who may be involved in instruction or training as a graduate student or supervisor (Powers et al., 2010;Selden & Selden, 2000); however, Inglis and Alcock (2012) indicate that validation is only one of the reasons for reading a proof. As Pfeiffer (2011b) points out, proof validation falls within the scope of proof evaluation, which Selden and Selden (2003a) describe as a complex process involving evaluating statements, posing and answering questions, constructing sub-proofs, and recalling definitions and theorems. Thus, evaluating a proof is useful in determining its appropriateness with reference to a range of features, such as its relation to the functions of proof (exploring, conjecturing, generalizing, association, etc.), as well as establishing the truth of the statement (validation) (Pfeiffer, 2011a). In other words, proof evaluation involves assessment of the significance and merits of a proposed proof.
In this respect, Tabach et al. (2010) emphasize that proof evaluation embraces not only the validity of a proof, but also an assessment of evaluators' knowledge as it comes to light during the process. Similarly, Pfeiffer (2011b) states that proof evaluation includes not only determining whether proof is correct but also establishing the truth of a statement (validation). Proof evaluation is more comprehensive than proof validation; while proof validation is restricted to determining whether a proof is successful in establishing the truth of a statement, proof evaluation examines proofs in terms of dimensions such as mathematical language, justifications, proof steps, and so on (Moore, 2016).
The activities of validating and evaluating proofs in an educational context enable students to acquire insights about their own and others' existing mathematical knowledge, as well as learning to follow the steps necessary to carry out a proof (Stylianides & Stylianides, 2009). The ability to validate proofs (i.e., to determine their correctness) has a positive effect on proof writing (Powers et al., 2010). The more adept the writers of proofs become at determining whether a proof written by a peer is correct, the better able they will be to apply this ability to their own proofs, as with Selden and Selden's (2003a) contention that the ability to validate proofs relates to the ability to construct them. Furthermore, as a more comprehensive activity than proof validation, proof evaluation significantly contributes to students' ability to construct and revise their own proofs. However, many researchers have emphasized that not only the construction but also the validation and evaluation of proofs present a significant challenge for many students (Inglis & Alcock, 2012;Lin, Yang & Chen 2004;Pfeiffer, 2011a;2003a;Selden & Selden, 1995;Varghese, 2011). Pfeiffer (2011a) noted that students generally focus on the presence or absence of examples or of algebraic formalism in their evaluations, while Selden and Selden (2003a) expressed that students mainly concentrate on surface features such as algebraic notation and computation in their evaluation of proofs. Inglis and Alcock (2012) assert that there are several reasons for students' difficulties in proof validation, including insufficient understanding of what constitutes a proof and a tendency to focus only on certain proofs. Furthermore, according to Knuth (2002), even teachers have serious difficulties with the task of proof validation, and many are prepared to accept flawed arguments as valid mathematical proofs. As such, both mathematics students and pre-service mathematics teachers should be exposed to proof validation and evaluation applications in the course of their learning. In this manner, students at all educational levels may be supported in making sense of their mathematical knowledge, as well as understanding what constitutes a proof and how a proof is constructed. Pre-service teachers, in particular, should be exposed to these applications, so that they will be better equipped to develop their future students' ability to construct and evaluate proofs. In this respect, it is important to determine what factors pre-service teachers take into account in the evaluation of proofs. In addition, studies about proof evaluation (Knuth, 2002;Stylianides & Stylianides, 2009;Weber, 2009) were limited and the applications in these studies are generally performed as giving various proofs to a certain group of participants and asking them for evaluating the proofs. Their mathematical context was also mainly algebra, number theory and calculus. In this study, carrying out both proof construction and evaluation activities, their implementation through group work and the fact that its mathematical context was geometry were the originality of the study.
Proving is one of the general objectives of geometry education (Marrades & Gutiérrez, 2000;MEB, 2013;Nasibov & Kaçar, 2005). Besides, geometry provides a rich content for developing mathematical reasoning, which includes deductive and inductive reasoning, making assumptions and evaluating the validity of these assumptions (NCTM, 2000). Therefore, geometry is a course that enables to occur actions inherent in proof. In Imamoğlu and Yontar-Togrol's (2015) study, both proof construction and evaluation practices are existing. However, particularly for proof evaluation, they focused on investigating whether a mathematical argument can be accepted as a valid proof. It can be stated that the evaluation of proofs is an examination in general in the study. Accordingly, it is necessary to investigate in depth which criteria are taken into account during proof evaluation. Considering the preservice mathematics teachers are responsible for instilling the fundamentals of mathematics to students after they have completed education (Lesseig, 2016), it is worth to examine their criteria of proof evaluation in the proving process. In consideration of this issue, we proposed to determine the criteria that pre-service mathematics teachers take into account when evaluating their peers' proofs.

Method
The data of this study were acquired from twenty activities in a geometry course carried out in an elementary mathematics teacher education program in a state university in Turkey. These activities were designed in alignment with the weekly topics in this course and during the course, two or three of these activities were implemented in every week. The details of the geometry course and the activities are below for this study.

The structure of the geometry course
The geometry is a course conducted in the spring semester of the first grade of elementary mathematics teaching program. The research participants included 50 first-year university students enrolled in this program. Each of these students is a preservice teacher and they will educate middle school students in the future. The preservice mathematics teachers have recently completed their high school education and that they are not likely to have any structured ideas about mathematical proof. In the fall semester, they took the abstract mathematics. Thus, they had enough knowledge of basic concepts about proof, proof methods, proving. In the scope of the geometry course, the activities about angles, congruence and similarity of triangles, angle bisector, altitude, parallel lines, orthocenter, centroid, incenter, excenter, circumcenter, polygons previously designed were implemented. The pre-service teachers gained experiences about proof construction and evaluation through the activities.

The content of the activities
In this study, twenty activities were conducted over the course of nine weeks to determine which criteria are considered by pre-service mathematics teachers in the process of proof evaluation. The activities were developed with consideration for the opinions of experts in the field. The activities included questions related to geometric proof on topics such as angles, congruent and similar triangles, polygons, height, angle bisector, median, and circles. In the activities, the finished proofs were not directly presented to the pre-service teachers. They were asked to construct the proofs step-by-step. In this paper, the stages of activities were based on the instructional model designed by Öztürk (2016). Apart from the final stage of this instructional model, an activity was consisted of six stages as "understanding the problem, constructing a structure, working on the structure and conjecturing, postulation of the relationship, proving, [and] evaluating the proof." Each activity is implemented through group work. The implementation of an activity is as follows: A geometry problem, relevant to unfamiliar mathematical propositions, theorems or mathematical relations, is given to examine its content in the stage of understanding the problem. A geometric structure based on the statements given in the problem is constructed with dynamic geometry software in the stage of constructing a structure. In the stage of working on the structure and conjecturing, the measurements are performed on the structure, and then the relationships are searched. In the stage of postulation of the relationship, the hypothesis and conclusion are determined, and the proposition is stated. In the stage of proving, each of group members thinks how to prove individually and his/her opinions about proving are shared with the group. Then, the group designs the common proof plan, and writes the steps of the proof. It takes approximately 25 minutes. In the stage of evaluating the proof, the proof plans are switched between the groups, and then they express each other's errors and deficiencies in the proofs. The groups spend about 15 minutes in this stage. At the end of each activity, the designed proofs and their evaluations were criticized. During the whole-class discussion, the errors about their steps of proofs and proof evaluations were deliberated. The aim of these discussions was to enable the students to recognize other errors and deficiencies that were ignored during the evaluation of proofs. After each discussion, the researchers summarized errors and deficiencies in the proofs. Thus, it was provided to experience the preservice teachers to construct proofs and to evaluate them.

Data analysis
Firstly, the pre-service teachers' proofs were evaluated by the researchers in terms of the participants' justifications and the proper use of mathematical language regardless of the students' evaluations of proofs. The researchers' evaluations enabled to determine certain dimensions and their subcategories for proof evaluation. Then, these evaluations were compared with the students' evaluations of the proofs in terms of determined dimensions and their subcategories. The final version of subcategories was constructed on the researchers' and pre-service teachers' evaluations. In other words, the data about these proof evaluations were categorized. Therefore, content analysis, a method for making replicable and valid inferences from texts to the contexts of their use (Krippendorff, 2004), was carried out.
Proving requires some competencies such as reasoning, giving justifications, using mathematical language etc. In this sense, proof has a structure including justifications for each step (Weber, 2008), proper use of mathematical symbols and expressions (Mercer, Dawes, Wegerif & Sams 2004), following the deductive steps and writing these steps systematically (de Villiers, 1999). Apart from these, focusing on only the results was identified as a dimension by the evaluation of the participants' proofs. The dimension as "Focusing on only the results" means that a proof is evaluated with an opinion as "If a proposition is expressed correctly in an activity, the proof of this proposition is constructed correctly." Along with main categories about evaluating the proofs, it is also determined the subcategories for the dimensions of justification and mathematical language.
At the result of analyzing their proof evaluations, the frequencies of these criteria were determined. The frequency of certain category represents the total number of different proof evaluations of pre-service teachers or researchers. In other words, it is accepted as a frequency that different evaluations about errors or deficiencies in a proof belong to the same category. As evaluations in the process of discussions had for purpose to enable pre-service teachers to recognize all of the errors or deficiencies in the proofs, frequencies about the categories of proof evaluation were determined on the basis of their evaluations about peers' proofs.
For coding reliability, a researcher being experienced in mathematics education evaluated 30% of the pre-service teachers' proofs. The coherence between the researchers' coding was determined as 87%.

Results
An examination of the proofs constructed by the pre-service teachers revealed that 58% considered justification as the criteria for proof evaluation while 35,9% considered the proper is of mathematical language as the important criteria. The criteria of proof evaluation about justification were divided into subcategories. The subcategories relating to justifications were identified as Including explanations for mathematical statements (J1), Expressing stages between the steps of proof (J2), Qualifying as a proof (J3) and Using appropriate definitions, axioms or theorems in the steps of the proof (J4). J1 is related to giving justifications for mathematical statements used in each step of proof. In other words, it is about expressing how to write a step in a proof. J2 is about accounting for the relationship between the steps of a proof. Besides, this subcategory requires not occur any gaps from a step to another step in a proof. J3 illustrates that all the statements written for proving a mathematical relation is accepted as correct steps of a proof or a complete proof. J4 is about selecting appropriate mathematical definitions, axioms or theorems for proving a mathematical relation. With respect to mathematical language, several subcategories were identified, including Using symbolic language appropriately (L1), Using non-symbolic language appropriately (L2), and Proof writing in a systematic way (L3). L1 is related to using appropriate mathematical symbols in the proof process. L2 is about using appropriate mathematical concepts and statements in the proof process. L3 is related to writing the steps of a proof regularly in an appropriate order. In addition, a separate category, Focusing on only the results, was defined. Table 1 summarizes the distribution of frequencies of the pre-service mathematics teachers' criteria for proof evaluation and the researchers' proof evaluations. In Table 1, the frequencies of the researchers' evaluations represent the number of evaluations that must be carried out in all the pre-service teachers' proofs. As Table 1 demonstrates, J4, as a subcategory of justifications, had the highest frequency (f = 98); and in terms of frequencies, J1 (f =81) followed J4. Furthermore, as a subcategory of mathematical language, L1 had the highest frequency (f = 96). On the other hand, J2 had the lowest frequency (f = 4) among the subcategories of the justifications, while L2 had the lowest frequency (f = 9) among the subcategories relating to mathematical language. When the frequencies of the pre-service teachers' evaluations are compared with the frequencies of the researchers' evaluations, they ignored some situations in the proofs. According to the comparison of frequencies, the pre-service teachers mostly ignored the situations about J1, as a subcategory of justifications, in the proofs. Situations about J2 is more ignored than situations about J3 criterion. However, the ignored situations about J2 and J3 have an approximate frequency. The pre-service teachers' evaluations relating to J4 have higher frequency than the researchers' evaluations relating to J4. The fact that the pre-service teachers mostly used general expressions as "The steps of proof are reasonable. Because theorems were used appropriately." in their evaluations led to increase the frequency of J4 criterion. Similarly, stated as "The sequence of the steps of proof is written correctly." by preservice teachers contribute that the frequency of L3, as a subcategory of mathematical language, in their evaluations exceeds the frequency of researchers' evaluations about L3. Comparing with the researchers' evaluations in terms of mathematical language, there are also several situations that the pre-service teachers ignored in their evaluations of proofs. These situations are related to L1 and L2. However, they ignored the situations about L1 more often than the situations about L2.
According to these frequencies, in the course of evaluating a mathematical proof, the pre-service teachers considered that a proof should include explanations for each mathematical statement, along with appropriate definitions, axioms or theorems. Furthermore, in terms of mathematical language, the participants emphasized the appropriate use of mathematical symbols. Furthermore, in addition to these criteria, some of the participants considered Focusing on only the results as a criterion for proof evaluation, wherein they evaluated a proof according to the correctness of a mathematical statement.

Pre-service mathematics teachers' criteria for proof evaluation relating to justifications
In the process of proof evaluation, the pre-service mathematics teachers primarily emphasized the criterion of Using appropriate definitions, axioms, or theorems in the steps of the proof (J4); this subcategory was preferred more than other criteria among the subcategories relating to justification. In addition, the criterion Including explanations for mathematical statements (J1) was given considerable importance.
On the other hand, Expressing stages between the steps of proof (J2) and Qualifying as a proof (J3) were rarely preferred, with J2 receiving the least importance. This finding is not unexpected, as most of the pre-service mathematics teachers' proofs did not have many reasoning gaps between the steps. When the pre-service mathematics teachers' proof evaluations were compared with those of the researchers', it could be seen that not all of the evaluations were carried out appropriately. Namely, the participants did not fully evaluate the proofs with regard to justification; and even in cases where they took into account the suitability of definitions, axioms or theorems and the availability of explanations of mathematical statements, their evaluations were deficient with respect to these criteria. Moreover, their proof evaluations in relation to other criteria for justification were incomplete. One example from each of criteria identified by the pre-service teachers is presented below. It is important that a proof includes adequate explanations for each mathematical operation or statement in order to clarify how a mathematical relationship has been determined. As such, J1 is one of the important criteria for proof evaluation. However, while the pre-service mathematics teachers considered the criterion of J1 in their evaluations, their results were not always in accord with the researchers' evaluations, in that they did not remark on certain issues relating to the criterion of J1. For instance, in the process of proof construction, certain questions should be answered, such as What do the variables in a proof mean? How is the mathematical result reached? and Which axioms, definitions or theorems are used? However, the participants did not always seek the answers to these questions while evaluating their peers' proofs. Figure 1 presents an example of an evaluation that ignored the criterion of J1. When proving a mathematical theorem, it is necessary to write all of the steps that show how to reach the proof. In this manner, a proof becomes more descriptive, and a connection is made between the steps. With this in mind, J2 is an essential criterion for proof evaluation. In the proofs performed by the pre-service mathematics teachers, instances where stages between the steps of a proof were not expressed occurred less often than other errors; therefore, references to problems concerning J2 were less frequent in the evaluations. However, when compared with the researchers' evaluations, it can be seen that less than half of the existing issues were noted in the participants' evaluations. For example, in constructing their proofs, some of the preservice teachers performed mathematical operations (finding the measures of angles or lengths of sides, drawing the lines or line segments etc.) on geometric figures or described variables in the problems, and then wrote the existing mathematical relationships directly. During the evaluation of these proofs, their peers generally realized that there were gaps in the steps; accordingly, their evaluations reflected the J2 criterion, as with the example below (see Figure 2). When mathematical operations were missing in the proofs, the participants had difficulties in recognizing the gaps between the steps; thus, their evaluations ignored some aspects of the criterion of J2. An example of this is as follows (see Figure 3):

Results relating to the criterion qualifying as a proof (J3)
Efforts such as checking the validity of a mathematical theorem, examining special cases, or testing numerical values cannot be accepted as a proof; nor can stating the required steps for proving without including the related mathematical operations. Rather, the criterion of Qualifying as a proof depends on including the required steps for proving, along with their justifications. Moreover, it is important to examine whether a proof is valid for all cases, or only in special instances. As such, J3 is considered one of the important criteria for proof evaluation. Although certain issues relating to J3 were ignored in the participants' evaluations, the researchers' evaluations in this respect were sometimes in accord with those of the pre-service mathematics teachers. In some cases, they described the required variables for a proof; furthermore, they preferred writing a proof in the form of checking the validity of the mathematical theorem. In their evaluations of these proofs, they ignored some of the required evaluations about this more than other evaluations relating to J3. One such example is given below (see Figure 4). Furthermore, most of the pre-service mathematics teachers stated that the proofs they evaluated examined only special cases or relied on trying numerical values; and thus, they did not qualify as a proof. In other words, they realized that such instances cannot be accepted as a proof, in line with the J3 criterion for proof evaluation. In addition, the proofs that contained the required steps of proof, but without expressing the mathematical operations, or those that stated only the mathematical relationships, were generally evaluated with reference to J3. Thus, the participants were able to explain that these instances did not qualify as a proof. One example of this is as follows (see Figure 5): The process of proving is based on the use of appropriate definitions, axioms, or theorems. In order to construct a qualified proof, the mathematical operations for each step should be provided, and they should be supported by appropriate definitions, axioms or theorems. Therefore, in proof evaluation, it is necessary to examine the suitability of the definitions, axioms, or theorems according to the criterion of J4. When compared with the researchers' evaluations, the pre-service mathematics teachers' evaluations referenced the criterion of J4 more frequently. The reason for this was that they mainly evaluated the proofs with respect to the best mathematical statement for proving, or on the basis of similarity to their own proofs. Namely, they generally gave attention to whether the mathematical statements in their proofs were the same as those in the proofs developed by their classmates. On the other hand, they ignored other aspects of evaluation relating to J4. An example of these evaluations is below (see Figure 6). Another aspect of students' evaluations related to the use of inappropriate definitions, axioms, or theorems; however, these were limited to the evaluations wherein this was necessary. An example of the students' determination of the use of inappropriate definitions, axioms or theorems is provided below (see Figure 7). In addition, the students sometimes regarded insignificant details about J4, such as the expression of similar versus congruent triangles. In other words, they did not realize that congruent triangles are also similar triangles, as demonstrated in the example below (see Figure 8):

Pre-service mathematics teachers' criteria for proof evaluation in relation to mathematical language
In their evaluations, the pre-service mathematics teachers considered the criterion of Using symbolic language appropriately (L1) more frequently than other criteria in the subcategories of mathematical language. However, many of the instances in the proofs relating to L1 were not noted in the students' evaluations, as the comparison with the researchers' proof evaluations demonstrates. On the other hand, they considered L2, Using non-symbolic language appropriately, less frequently than the other criteria. This was due in part to the fact that this situation was encountered less often in their proofs; therefore, this result is expected. Among the other criteria in the subcategories of mathematical language, Proof writing in a systematic way (L3) occurred at an average frequency in the proof evaluations. All of the criteria for proof evaluation concerning mathematical language are presented in the following sections.

4.2.1
Results relating to the criterion using symbolic language appropriately (L1) The appropriate use of mathematical symbols facilitates the comprehensibility of a mathematical proof, including the steps of the proof and the connections between them. As such, the criterion of L1 is considered as one of the subcategories of mathematical language for proof evaluation. In this study, the pre-service mathematics teachers took this criterion into account more often than other criteria among the subcategories of mathematical language. However, in comparison with the researchers' proof evaluations, most of the pre-service mathematics teachers ignored some of the errors or deficiencies in this respect, or they accepted some mathematical symbols as correct use of notations. An example of this is provided here (see Figure  9): LUMAT 228 Figure 9. An example of an evaluation that ignored the criterion of L1.
The pre-service teachers also tended to realize errors concerning L1 for certain mathematical symbols, i.e., angle and length. However, they were seen to ignore some required evaluations in this respect. An example of this issue is given below (see Figure 10).

4.2.2
Results relating to the criterion using non-symbolic language appropriately (L2) When justifications are presented for the steps of a proof, mathematical concepts or explanations should be expressed using appropriate mathematical language; and in order to construct a quality proof, these concepts and explanations should be comprehensible. Accordingly, it is necessary that non-symbolic language be used appropriately throughout the steps of a proof; and therefore, L2 constitutes one of the criteria for proof evaluation. In the current study, while the pre-service mathematics teachers were evaluating their peers' proofs, they generally remarked on discrepancies between mathematical symbols and statements. For example, when the congruence symbol was used, but the similarity theorem was referenced in the proof, the students who evaluated the proof noted that the congruence theorem should have been applied. An example of this instance is below (see Figure 11). On the other hand, unlike the researchers' evaluations, they ignored some issues relating to the criterion of L2. Although fewer errors relating to L2 were encountered in their proofs, there were some situations where the pre-service teachers did not recognize an error, or they sometimes made errors in their evaluations. For example, one of their evaluations noted that a statement should have been written as "ABC isosceles triangle" instead of isosceles. During the construction of a proof, the students sometimes did not state mathematical definitions, axioms, or theorems appropriately; and in some of these situations, the evaluations did not reflect the L2 criterion, as demonstrated below (see Figure 12).

Results relating to the criterion proof writing in a systematic way (L3)
By writing the successive stages of a proof, it is possible to understand which mathematical operations have been performed and which steps have been followed. In this manner, the connection between the steps of a proof can be easily seen, and those steps that are necessary or unnecessary can be distinguished. Through this process, a proof becomes comprehensible. With this in mind, it is important to write a proof in a systematic way; therefore, L3 can be considered as a criterion for proof evaluation. Accordingly, during their evaluations of their peers' proofs, the pre-service mathematics teachers generally considered whether the appropriate steps had been followed. When the steps of their peers' proofs were similar to their own, they indicated in their evaluations that their peers had followed the appropriate steps; furthermore, most of their notations relating to L3 concerned the suitability of the steps of the proofs. However, in comparison with the researchers' evaluations, they ignored situations in which the mathematical operations were vague. Apart from that, there were certain instances in which they did not remark on unnecessary steps. An example of this is given below (see Figure 13).

Pre-service mathematics teachers' criteria for proof evaluation in relation to focusing on only the results
Aside from justifications and mathematical language, the pre-service mathematics teachers also regarded a criterion identified as Focusing on only the results. Namely, while they were evaluating the proofs, they sometimes considered them only in terms of focusing on the results; and in cases where the results of a peer's proof were the same as their own; they determined it to be valid. One such instance of this is illustrated below (see Figure 14).

Discussion and Conclusion
At the end of the series of activities relating to proof construction and evaluation in this study, there were cases that the pre-service mathematics teachers determine or ignore errors or deficiencies in the proofs. However, it can be stated that they determined the errors and deficiencies of them better in the process of proof evaluation through observation in class discussions. Some criteria about proof evaluation had more high frequency at the end of all the activities. For example, as subcategories of the justification, these criteria were J1, J4; as a subcategory of mathematical language, it was L1. In other words, throughout the process of both constructing proofs and evaluating them, the frequencies of stating some criteria for proof evaluation changed, as they became better at elaborating their criteria for evaluation. A similar situation was noted by Powers et al. (2010), who argued that using specific activities for proof validation may contribute to the development of proof writing and validation abilities. Considering that proof evaluation is more comprehensive than proof validation, their contention supports the results of the current study. In this study, three main categories for proof evaluation emerged as justification, mathematical language, and focusing on only the results. Furthermore, the fact that the subcategories related to justification and mathematical language were determined presents a framework for proof evaluation. Although focusing on only the results is one of main categories for proof evaluation, it is only a criterion that pre-service teachers take into account in evaluating a proof. In the studies on proving (e.g., Lee, 2011;Pulley, 2010;Senk, 1983), proofs were generally evaluated in terms of reasoning, mathematical language, justification that were not identified as a separate category. The fact that justification and mathematical language were among the categories for proof evaluation in current study coincides with considering these dimensions for proof evaluation in these studies. However, this study differs from the others by identifying certain dimensions and subcategories for proof evaluation.
Moreover, the criteria for proof evaluation in this study took form in line of reasoning errors during the proof process. For example, J1 and J2 as subcategories of justification is related to holes, which occurs as result of claiming that a statement follows immediately from previously established results when in reality a considerable argument is required, defined by Selden and Selden (2003b). In addition, Demir (2017) classified reasoning errors into three categories. Reasoning gaps is one of the categories. A part of subcategories of reasoning gaps is about not giving justifications and omitting some steps of proof. Therefore, J1 and J2 are directly associated with these subcategories.
In the process of proof evaluation, the pre-service teachers generally expected that the steps of a proof should be related to one another, and that a proof should include appropriate mathematical statements (axioms, theorems etc.). These expectations are related to J1 and J4 as subcategories of justification. A deductive approach is originated from justification (de Villiers, 1990;Hanna, 1990). Moreover, deductive logic is well-exemplified by proof (Remillard, 2009). As a result, some criteria relating to justification (J1, J4) are likely to be considered more by pre-service teachers while evaluating proofs. On the other hand, proving can be characterized as problem solving task (Weber, 2005). Completing a problem solving task requires to perform appropriate mathematical actions and operations. Similarly, proving is a mathematical task including actions as using some initial information (e.g., assumptions, axioms, definitions) and applying rules of inferences (Anderson, 2000). In both problem solving and proving, these actions play an explanatory role. Thus, it is probable that during proof evaluations, the pre-service teachers paid more attention to the explanations for mathematical statements.
Furthermore, in order to accept successive mathematical steps as a proof, it is necessary that these steps include explanations about how to reach a mathematical relationship; and in this process, appropriate mathematical symbols and concepts should be applied. Therefore, when evaluating a proof, justifications and mathematical language are important considerations. In this respect, the pre-service teachers did consider some of the required criteria, but they ignored others. With the instances where issues relating to justification and mathematical language were ignored, there were generally no obvious errors or deficiencies. Therefore, it may be concluded that the pre-service teachers were more conscious of obvious errors. Imamoglu and Yontar-Togrol (2012) likewise indicated that pre-service teachers have difficulties in evaluating a proof when there are no obvious errors in an argument. In contrast with the researchers' evaluations, this situation was valid for all of the criteria for proof evaluation relating to justification and mathematical language. Namely, although there were fewer differences between the evaluations of the researchers and the pre-service teachers relating to J3 than to other criteria, the pre-service teachers tended on the whole to ignore the gaps between the steps of the proofs that they did not recognize directly.
Moreover, the researchers' evaluations in relation to the criteria of J1 and L1 were more frequent than those of the pre-service teachers; thus, it was concluded that they mainly ignored these criteria. On the other hand, the criteria of J4 and L3 were featured more prominently in the pre-service teachers' evaluations than in the researchers' evaluations. In this sense, their evaluations dealt with J4 in terms of whether a peer's proof resembled their own in terms of method, theorems, and so on; and with respect to L3, they mainly considered whether the steps of a proof were neatly written. As such, it can be inferred that their evaluations were sometimes superficial, and when their peers' proofs resembled their own, they assumed them to be correct. This circumstance relates to the work of Selden and Selden (2003a), who also found that undergraduate students generally focused on the surface features of arguments, rather than their underlying logical structure.
In terms of mathematical language, using symbolic language appropriately was a criterion that the pre-service teachers primarily considered when evaluating proofs. As such, it can be asserted that they paid more attention mathematical symbols than non-symbolic statements during proof evaluations. Hill, Ball and Schilling (2008) stated that teachers can be unaware of their students' verbal justifications. Moreover, Stylianides, Stylianides and Philippou (2004) pointed out that it is plausible to assume that symbolic reasoning is a focal point in the collegiate mathematics curriculum. The more emphasis on symbolic mathematical statements during their undergraduate education can lead them to disregard verbal justifications. Therefore, the pre-service teachers may also tend to evaluate the proofs in terms of symbolic language.
On the other hand, when evaluating their peers' proofs, the participants generally did not accept specific examples as proof, as with Imamoglu and Yontar-Togrul (2012), who pointed out that most pre-service teachers were good at indicating the arguments that did not support the truth of the statement for all cases and understood specific examples cannot be accepted as proof. Goetting (1995) similarly determined that pre-service teachers were aware of the distinction between proofs and empirical arguments, unlike Weber (2010), who expressed that some university students accepted empirical arguments as valid proofs. In Weber's study, aside from specific cases or tried numerical values, students were generally unsuccessful in determine whether a proof is valid or not, in line with Selden and Selden's (2003a) finding that undergraduate students had difficulty in distinguishing invalid from valid proofs.
With respect to the present study, when there were discrepancies between mathematics symbols and statements in a proof, the pre-service teachers generally managed to note the use of inappropriate statements in their evaluations. However, in cases where only inappropriate statements were given, they sometimes had difficulty in recognizing these. Moreover, they sometimes regarded insignificant details in their evaluations. For example, some of the pre-service teachers remarked in their evaluations that a statement about similar triangles was incorrect in a proof involving congruent triangles.
Finally, the pre-service teachers demonstrated that they accepted focusing on only the results as a criterion for proof evaluation, which represents a deficiency in their skills in this regard. In order to overcome this issue, provisions should be made for activities involving proof construction and evaluation in the learning process. Furthermore, as Pfeiffer (2011a) notes, interaction and discussion can contribute to skills in validating, evaluating and constructing proofs; thus, opportunities should be provided for students to discuss their proofs.
The main contribution of this study can be summarized in four items.
1. The activities of this study require both proof construction and proof evaluation.
Activities of other studies about proof validation and evaluation were generally carried out as determining whether the written steps of proof are valid or sufficient for any proof (Inglis, Mejia-Ramos, Weber & Alcock 2013;Panse, Alcock & Inglis, 2018;Pfeiffer, 2010Pfeiffer, , 2011aPfeiffer, , 2011bPowers et al., 2010;Segal, 1999). In other words, the participants of these studies evaluated the proofs as valid or invalid, enough, or not enough to be a proof etc. Unlike these studies, Inglis et al. (2013) allowed the participants to explain the reasons of their proof evaluations. Accordingly, the proofs were not constructed by the participants and the finished proof is generally evaluated without going into details. Moreover, the suggestions of revising the steps of proofs were not presented to the participants in other studies. The present study enables the participants to assert the errors or deficiencies of the proofs. 2. This study has a diversity of categories for proof evaluation. The proofs were particularly not evaluated in terms of different categories (mathematical language, concepts etc.) in other studies. Mejia-Ramos et al. (2012) presented a multidimensional model for assessing proof comprehension in undergraduate mathematics in their study. This model had a purpose of assessing students' understanding through the proofs. Criteria about assessing students' understanding were related with seven different aspects of a proof. Therefore, conceptual aspect of the study predominates between other studies and it takes the context of the proofs to forefront. In brief, the evaluations of the proofs were carried out as determining the validity of mathematical arguments, statements and conjectures asserted for proving in the studies about validating and evaluating proofs (Goetting, 1995;Inglis et al., 2013;Pfeiffer, 2010Pfeiffer, , 2011aSegal, 1999;Stylianides & Stylianides, 2009;Weber, 2010). However, in this study the evaluations were carried out in terms of actions taking part in a proof (giving justifications, use of mathematical language, writing the steps deductively etc.) thoroughly. Besides, the study enabled the participants to be aware of errors or deficiencies in their own proofs and to make up for them in other evaluations. Therefore, the study has a nature of providing the participants to develop their proof construction and evaluation. 3. The categories for proof evaluation in this study can serve as an inspiration for how to write a proof. In other words, this study can have a role in revealing the necessary qualities for acceptance a proof. Therefore, the present study can contribute to notice the necessities of proof construction, and the categorization of proof evaluation criteria. 4. This study can contribute to overcome the scarcity of studies on proof evaluation. The studies on proof validation and evaluation have a less ratio among research on proof. Mejia-Ramos and Inglis (2009) and Moore (2016) support this by stating that studies on proof construction dominates among published research.
The results from comparing the proof evaluations of pre-service teachers and researchers show that pre-service teachers are not completely capable of evaluating proofs. Proof activities in undergraduate education should be implemented in an interactive manner. In other words, instructors should encourage their students to think about proof and they should allow them to discuss the proofs thoroughly in the class.
Proof writing requires the use of justification and mathematical language. In the current study, they are among the dimensions for proof evaluation. It is critical that students evaluate their own or peers' proofs in terms of several dimensions in the learning environment. It can be suggested that the instructors lead their students to provide justifications and to analyze proofs. Thus, the students may improve the skills in both constructing and evaluating proofs.
In this study, the activities of proof construction and evaluation were implemented through group work. This implementation represents both a strength and a weakness of the instructional strategy of the study. The strength of the instructional strategy was to provide the students to construct and evaluate the proofs interactively. Thus, the students not only learn something from each other but also overcome their deficiencies. However, working in groups may sometimes bring only one person into forefront. Accordingly, the weakness of the instructional strategy was to prevent the students to think and reason about proof construction and evaluation.