The Finnish matriculation examination in biology from 1921 to 1969 – trends in knowledge content and educational form

The history and evolution of science assessment remains poorly known, especially in the context of the exam question contents. Here we analyze the Finnish matriculation examination in biology from the 1920s to 1960s to understand how the exam has evolved in both its knowledge content and educational form. Each question was classified according to its topic in biology, and its cognitive level by Bloom’s taxonomy. Overall, the exam progressed from a rather dichotomous test of botany and zoology to a modern exam covering biology from biochemistry to environmental science, reflecting the development of biology as a scientific discipline. The contribution of genetics increased steadily, while ecology witnessed a decline and a renaissance during the same time period. The biological profile of the questions was established by the 1950s. The educational standard and cognitive demand of the questions was always high and established by the 1940s.

The history and evolution of science assessment remains poorly known, especially in the context of the exam question contents. Here we analyze the Finnish matriculation examination in biology from the 1920s to 1960s to understand how the exam has evolved in both its knowledge content and educational form. Each question was classified according to its topic in biology, and its cognitive level by Bloom's taxonomy. Overall, the exam progressed from a rather dichotomous test of botany and zoology to a modern exam covering biology from biochemistry to environmental science, reflecting the development of biology as a scientific discipline. The contribution of genetics increased steadily, while ecology witnessed a decline and a renaissance during the same time period. The biological profile of the questions was established by the 1950s. The educational standard and cognitive demand of the questions was always high and established by the 1940s.

Introduction
Current educational research focuses on contemporary issues, and the historical development of education is often overlooked (Eymard-Simonian, 2000;Sáez-Rosenkranz, 2016). However, as in all social sciences and humanities, a historical viewpoint can complement educational research with a systematic synthesis of ideas, facts, and past events to answer problems, identify future trends and delineate different interactions and causalities (Cohen et al., 2013;Gall et al., 1996;Sáez-Rosenkranz, 2016). The research in the history of education has concentrated on the history of educators as well as educational institutions and general practices, and only a few studies have looked into the history of educational content and assessment, let alone in biology (Caroli, 2019;Jenkins, 1979;Rosenthal, 1990;Sáez-Rosenkranz, 2016;Virta, 2014). In order to study the history of biology assessment successfully, expertise from three distinct fields, biology, education, and history, must be integrated in an interdisciplinary way. In this article, we examine the history of formal assessment in biology in view of both knowledge content and educational form by analyzing the biology questions of the Finnish matriculation examination over a timespan of five decades from 1921 to 1969.

The Finnish matriculation examination
The Finnish matriculation examination (FME) is the final exam of the upper secondary school that was instated in 1852 (Virta, 2014). Initially, the exam sessions were held at the University of Helsinki (known as the Imperial Alexander University before 1919), but from 1874 onwards directly in schools (Virta, 2014). According to Kaarninen and Kaarninen (2002), only four compulsory subjects were tested before 1921: Finnish, Swedish, one elective foreign language (Latin, German, French or Russian) and mathematics. The authors note that the exam questions were prepared by an autonomous Matriculation Examination Committee (from 1921 named the Matriculation Examination Board, MEB), which consisted of academics from the University of Helsinki and senior teachers from upper secondary schools. Therefore, Kaarninen and Kaarninen (2002) emphasize that the exam has always been influenced by the latest advances in Finnish academia, which here is understood as higher or tertiary education, universities and research institutions in Finland. In 1921, the test battery in humanities and natural sciences (Fin. Reaalikoe, Swe. Realprovet) was introduced, including physics, chemistry, biology, geography, history, and religion. The test battery was unique to school systems within Western Europe, as the examinee could freely choose questions from different subjects in both humanities and sciences, which Kaarninen and Kaarninen (2002) see to reflect the Humboldtian ideal of the Finnish education system. The test batteries consisted of almost exclusively essay-based questions, in accordance with the written exams in Finnish and other languages. Towards the end of the century, this format was criticized for being obsolete and favoring languages over sciences, and the test battery was divided into independent subject-specific tests in 2005, including a separate exam in biology with a more diverse set of question formats. Moreover, the examinee could now choose whether to take the exams in the spring or autumn, whereas before this the autumn exam had been for resits only. In the 2010s, the examination has gradually been digitalized, and in 2018 the exam in biology was taken online for the first time (Tuulosniemi, 2019). Over the years, the number of students matriculating each year has risen from about 1000 in the 1920s to 30 000 today (Tuulosniemi, 2019).
Throughout its entire existence, the MEB and the exam have elicited respect and fear alike in both students and teachers (Vuorio-Lehti, 2007). The questions have always been criticized for being overly academic and demanding, unlinked to normal school teaching, although the exam has also been praised for these very same reasons (Kaarninen & Kaarninen, 2002). In addition, the exam has been claimed to direct the teaching and learning more than the formal curriculum, commonly known as the backwash effect in educational research (Ahvenisto et al., 2013;Virta, 2014). However, backwash is not necessarily negative, as it may clarify and strengthen the formal curriculum, but if the exam assesses some areas disproportionately, it can adversely skew the curriculum (Ahvenisto et al., 2013;Virta, 2014).

The theory of educational assessment
McTighe and Ferrara (1998) subdivide the assessment of learning into three types: diagnostic, formative and summative. Diagnostic assessment includes, e.g. pre-exams to clarify the starting level of students, formative assessment encompasses routine assignments, e.g. homework, self-evaluations and learning diaries, while summative assessment denotes the final comprehensive assessment, e.g. exams and theses. Historically, as a final essay-based exam of upper secondary school, the FME can be considered to represent a summative assessment of learning, and therefore the history of FME in biology can specifically be seen as the history of summative assessment in science education. The theoretical framework of assessment by McTighe and Ferrara (1998) has been the most popular in characterizing FME in corresponding contemporary studies (Lindholm, 2017;Rostila, 2014;Tikkanen, 2010).
Bloom's taxonomy or hierarchy is widely used to quantify the success and standard of teaching and learning, and it has become the main approach to study the questions of the FME (Bloom, 1956;Lindholm, 2017;Rostila, 2014;Tikkanen, 2010;Vitikainen, 2014). The taxonomy encompasses six cognitive levels: knowledge, comprehension, application, analysis, synthesis and evaluation (Bloom, 1956). The taxonomy is also called a hierarchy, as the levels are ranked in the order of increasing cognitive difficulty, complexity and abstractness (Bloom, 1956). Bloom's original taxonomy analyzes only the level of cognition, but not the level of facts to be processed, and therefore Krathwohl and Anderson (2009) have complemented the taxonomy by adding a second dimension, the knowledge dimension including facts, concepts, methods and metacognition, and by modifying the cognitive process dimension so that creation (synthesis in Bloom's original taxonomy) is ranked over evaluation (Table 1). Facts and concepts overlap to some extent, but facts include single details and terminology ("Name the organelles of the cell"), while concepts encompass more general understanding ("What is the function of cell organelles?"). Methods include the knowledge of research methodology of an academic discipline ("How have the cell organelles been discovered?"), while metacognition denotes the students' knowledge of the relevance of the knowledge for themselves (Krathwohl & Anderson, 2009). Furthermore, metacognition encompasses the students' awareness of their learning styles and techniques with regard to a given study topic (Krathwohl & Anderson, 2009

The education of biology
In the 20 th century, biology as a scientific discipline underwent a drastic change from natural history to modern life science (Mayr, 1982). Therefore, the content of the biological curriculum has been gradually revised, mostly due to the advances in genetics, and even today it is being discussed which biological novelties should be included in the revised curriculum (Goldenfeld & Woese, 2007;Kinchin, 2010). Here, we define a biological novelty as a general term encompassing biological discoveries, concepts, and theories. We follow Mayr (1997) and regard a discovery as a single item of novel experimental knowledge of a biological phenomenon and a concept as an item of theoretical knowledge explaining a given discovery and linking it to biological theory. Lastly, a theory is seen as an explanation of a biological phenomenon that integrates a multitude of biological concepts. It is widely assumed that science and especially biology develops faster than ever before, but whether the time of introducing biological novelties into exams has changed over the years has never been properly clarified (Kurzweil, 2014;National Research Council, 2009). In recent years, 21 st -century biological novelties have been introduced into Finnish science curricula and exams rather quickly, e.g. CRISPR-Cas9 and induced pluripotent stem cells (iPSC) (Happonen et al., 2016).
The field of biology can be divided into subdisciplines in different ways, e.g. by stressing the studied organism group such as zoology or microbiology, the biological phenomenon such as genetics or physiology, or stressing applied research such as clinical microbiology or conservation biology. A couple of general classification schemes have been devised, but in educational research of the biology exam in the FME the framework of the National Research Council (2012) has emerged as the most popular (Lindholm, 2017;Rostila, 2014). As a simple and general classification, it is well suited for categorizing and comparing biological questions from different historical periods. The classification system subdivides biology into four broad categories: 1. LS1 From molecules to organisms 2. LS2 Ecosystems: interactions, energy, and dynamics 3. LS3 Heredity: inheritance and variation of traits 4. LS4 Biological evolution: unity and diversity

Aims and study questions
Was old school a good school? In this article, we inspect the FME in biology to understand how the exam changed in both biological knowledge content and educational form from 1921 to 1969. Previously, the modern FME in biology (2009 -2015) and its educational characteristics have been analyzed by Rostila (2014) and Lindholm (2017), who applied Bloom's taxonomy and McTighe's and Ferrara's assessment model to the exam questions. The exam in chemistry has been analyzed by Tikkanen (2010) and Vilhunen and Hopia (2012), in religion by Vitikainen (2014), and in history and social studies by Ahvenisto et al. (2013) and Virta (2014). Only Virta (2014) analyzed the exam from a historical perspective, and therefore this study aims to shed light on the development of a science exam in the FME for the first time. The study questions are as follows:

Knowledge content:
-What trends can be found in the biological knowledge content of the FME from 1921 to 1969? -At what timeframe were biological novelties introduced to the exam?

Educational form:
-What types of questions were asked? -What trends can be seen with respect to Bloom's revised taxonomy? -Were questions in different biological categories (National Research Council) equal with respect to Bloom's revised taxonomy? In other words, is there an interaction with the question topic and its cognitive demand?
The answers to these questions will help us understand how the science curriculum in the Finnish upper secondary school has evolved alongside both national and international trends, which in turn helps us predict the future of biological teaching. Furthermore, this study helps us identify questions of high educational standard, which may be used as an inspiration for devising future exams. Finally, this study is important for seeing whether certain biological subdisciplines have exhibited a certain educational profile compared with other subdisciplines, which will help us identify the special educational character of biology as a whole.

The exam material
The exams both in Finnish and Swedish from 1921 to 1969 were obtained from the free-access Digital Archives of the National Archives of Finland as scanned images from the online repository (http://digi.narc.fi/ylioppilastehtavat.html). Almost all exams had been preserved, only the exams from autumn 1921 to autumn 1923 were missing. From the test battery in sciences and humanities, only the biological questions from the section of biology and geography were chosen for further analysis. In the first years of the FME, biology and geography were taught as a single subject making it challenging in some instances to distinguish the biological questions from the geographical. Therefore, all geographical questions with some biotic component were considered to also be biological in character ("the nature of Iceland"), but questions with a clear abiotic component were left out (see Supplementary Material for chosen questions). It is possible that some questions from the section of physics and chemistry or psychology had a biological component, but these were not included in this study, as the number of these interdisciplinary questions is known to be minute during the studied time period (Kaarninen & Kaarninen, 2002). Furthermore, the exam questions in Finnish and Swedish as a first language have included essay-type questions on biological and other scientific themes. However, these questions were not included in this study, as the primary purpose of this assessment is to evaluate the student's literal and language skills rather than scientific knowledge (Kaarninen & Kaarninen, 2002).

Trends in knowledge content
Content analysis combining both qualitative and quantitative aspects was applied to all exam questions. Content analysis is a common approach in educational and social sciences to reduce and synthesize disorganized documents and identify the most important characteristics of the material (Neuendorf, 2016). Furthermore, content analysis can be used to historical documents, and for example, the history of the Finnish chemistry curriculum has been studied with this method previously (Vaskuri, 2017). The exam questions are thankful in the respect that they constitute a limited source of historical material, and therefore the drawback of overlooking important documents does not exist (Faire, 2016). Only some questions from the early 1920s were missing, but there is no reason to believe that the questions would have been radically different from the other questions of the decade.
Content analysis was applied in two regimes, here termed qualitative and quantitative. For qualitative content analysis, the biological knowledge content of each question was interpreted, analyzed and characterized as a representation of a biological subdiscipline. For example, several questions of the form "the plant family x" in the 1920s were synthesized to reflect an emphasis on plant systematics and taxonomy. In our qualitative content analysis, the questions were encoded into open categories of biological knowledge content that were considered to best characterize a given question.
For quantitative content analysis, the exam questions were strictly classified into one of four biological categories according to the National Research Council (2012). If the question had an integrative character, the main category was chosen (see Supplementary Material for classification and detailed criteria). For example, the structure of chromosomes was considered to belong to Genetics and not Molecules to organisms, while the drought tolerance of plants was seen as Ecology and not Molecules to organisms. The questions were classified independently by each author (Rater A and Rater B), and the interrater reliability was evaluated with crosstabulation and kappa analysis (Hallgren, 2012;McHugh, 2012). Kappa analysis is a common statistical technique used to evaluate whether two or more independent researchers agree on a given classification. The frequencies of each category were calculated for each decade, and the interdecadal (ID) change in question frequencies was compared with the chi-squared test of independence using Yates' correction and Fisher's exact test. Fisher's exact test was used when the sample size was not large enough for the chi-squared test of independence, and Yates' correction was used to prevent overestimation of statistical significance for small data samples (Ross, 2017). Both the chi-squared test of independence and Fisher's exact test are standard statistical tests used to compare frequencies of two or more categories (Ross, 2017).
To quantify the rate of introduction of biological novelties, all questions testing novelties were selected. For each biological novelty, the approximate year of academic establishment (AE) of the novelty was estimated from the history of science literature. Here, AE is understood as the approximate year when the biological novelty was broadly and internationally acknowledged. First, the primary scientific reference of the biological novelty was identified, after which succeeding literature was analyzed. For discoveries, AE is the year when the discovery had been conceptualized and linked to biological theory, while for concepts AE is the year when the concept had been linked to biological theory. Lastly, for biological theories, AE is the year when the theory had been acknowledged in preference of other alternatives. The explanation for how AE was estimated is presented in the Supplementary Material for each biological novelty. The authors estimated AE independently, and the mean of these estimates was used to reduce interrater variability. The time of introduction (T) was calculated as the difference between the year of appearance in the FME and AE (Equation 1).

T = FME -AE (Equation 1)
In order to see whether there was a temporal change in the rate of introduction, linear regression analysis was performed by having the year of appearance as the independent variable and the time of introduction (T) as the dependent variable (see Supplementary Material for details). Linear regression is commonly used to fit a linear model to continuous data and to assess whether the trend has been increasing or decreasing (Ross, 2017). All the statistical tests were performed in the R environment (v. 3.6.0) (R Core Team, 2019).

Trends in educational form
As for knowledge content, both qualitative and quantitative content analysis was performed on the exam questions in order to capture their educational form. For qualitative content analysis, the questions were classified into open categories of educational form and the types of assessment, according to McTighe and Ferrara (1998). For quantitative analysis, the questions were classified into the revised Bloom's taxonomy cognitive categories by Krathwohl and Anderson (2009). The questions were classified independently by each author (Rater A and Rater B), and the interrater reliability was evaluated with cross-tabulation and kappa analysis (Hallgren, 2012;McHugh, 2012).
To test temporal changes quantitatively, the frequencies of each question type were calculated for each decade, and the ID change was tested with Fisher's exact test. Furthermore, the frequencies of question types were calculated for each biological category, and the category-wise frequencies were compared with Fisher's exact test (see Supplementary Material for details).

General qualitative patterns of educational form from 1920s to 1960s
During the studied time period, almost all questions were essays, i.e. performancebased assessment of the product format according to McTighe and Ferrara (1998), and only a few crossing experiments were presented as solvable problems from the 1940s onwards. A lot of the essay-type assignments were simply listed as headings and not directly as questions such as "The circulatory system of fish" or "The plant family Orchidaceae". If the essays were written as bona fide questions, the language was rather consistent and only the verbs selittää 'explain', tietää/veta 'know', kertoa/berätta om, redogöra 'tell', and tehdä selkoa/redogöra 'clarify' were used. No figures or illustrations were included in the exam, and therefore all the decade-specific figures (Figures 1-5) have been collected from contemporary schoolbooks to present how the exam topics were visualized in the study material.

The 1920s -Plant systematics and comparative zoology
From the 1920s, 46 questions in total had been preserved. During the 1920s, the questions in biology and geography comprised five to six questions, of which usually three to four were devoted to biology and the rest to geography. The focus on botany and zoology was clearly visible, as about 80 % (35/46) dealt with these topics, while the remainder examined more general biological areas, including genetics, biogeography, evolutionary theory, microbiology, and anthropology ( Figure 1).
In botany, a typical question of the decade inspected plant systematics, and altogether 12 taxa were tested (Table 2). Thus, the systematical questions constituted about half of all the botanical questions. Meanwhile, the other questions examined plant physiology, morphology, development, and also, some ecological aspects were included (Table 2).
In zoology, the emphasis was on different aspects of morphology, physiology and embryology (Table 3). A noteworthy proportion of these questions were from a comparative viewpoint, integrating evolutionary thought to the exam, and only a few were testing comprehension of particularly human physiology. The other questions inspected animal behavior, community ecology, and systematics (Table 3).  (Kivirikko, 1923). Regarding other subdisciplines of biology, there were a few questions on microbiology, but only one question was stated on genetics, namely an essay on Mendelism and its relevance for biology (Table 4). In ecology, the tasks focused on biogeographical and faunistic and floristic aspects ( Table 4). The final question of the decade was the ominous "What do you know about negroes?", reflecting the attitudes toward human races of the time (Table 4). A lot of the questions tested only factual knowledge, e.g. "The plant family Orchidaceae," but many questions were already asking for comprehension of biological concepts "What do you know about the structure of seeds and germination?" or "The structure and function of the human eye." Interestingly, some of the questions were cognitively rather advanced and involved elements of analysis, for example, the examinees were presented the following questions: "How do phanerogams and cryptogams compare to each other," "The structure of the mouthparts of insects and their adaptions," and "The structure and morphology of mammalian teeth in relation to diet."

The 1930s -Genesis of genetics and diverse Darwinism
From the 1930s, 83 questions were asked on various aspects of biology. The emphasis on botany and zoology continued from the previous decade, but more questions were asked on both genetics and evolutionary theory. The botanical and zoological questions were asked from more diverse perspectives compared to the previous decade ( Figure 2).
As for the botanical questions, plant systematics had a lesser role than previously, and instead there was a stronger emphasis on plant morphology, physiology and development (Table 5). Moreover, there was also one applied question on the cultivation of coffee, tea and cocoa (Table 5).  In zoology, there were more systematic questions than in the previous decade (Table 6). With respect to physiology, morphology and development, classical zoology still outnumbered human biology in terms of the number of exam questions (Table  6). The decade saw a rise in the number of conceptual questions in ecology in contrast to the biogeographical questions that had been prevalent in the previous decade (Table 7). Also, some behavior-related questions were included (Table 7). In the previous decade, evolutionary issues had been integrated through systematics and comparative morphology, but in the 1930s, the examinees had to analyze the concepts of the evolutionary theory itself ( Table 7). The genetics questions tested knowledge on Mendelism and sex determination (Table 7). In addition, the decade witnessed the rise of biochemical, cytological and microbiological questions (Table 7). Lastly, there was one question on human races at the beginning of the decade (Table 7). In the 1930s, there were still questions testing simply knowledge, but questions testing comprehension and analysis increased in number. For example, the examinees were expected to find answers to analytic questions such as "What is the biological basis of plant and animal breeding and what methods are used for this," "How does parasitism affect the structure of the animal," "Darwinism, natural selection and the modern perceptions of the importance of selection for the origin of species," "Compare homology and analogy," and "Explain Linné's and Darwin's perceptions on the origin of species." In addition, there was one question of an evaluative character: "Plants as the foundation of animal and human existence."

The 1940s -Mendelism, nutrition and developmental biology
In the 1940s, 83 biological questions were asked. During WWII, the MEB took advantage of any cease-fire and organized several extraordinary exam sessions near the frontline whenever possible. In this decade, crossings established themselves as standard questions in almost all exams, both plant and human physiology concentrated on nutrition, and the zoological questions had an emphasis on developmental biology (Figure 3).  Marklund & Jalas, 1943).
In contrast to the two precedent decades, there were no systematic questions in botany, and the focus was firmly on plant morphology, physiology and development ( Table 8). The decade can be best characterized by the focus on water transport, nitrogen sources and the nutrition of plants, as this theme was inspected several times from both a pure physical-chemical perspective (the mechanism of osmosis) and an applied perspective (the use of fertilizers). Lastly, there were some questions of an ecological character (Table 8). Interestingly, all purely zoological questions inspected developmental biology (Table 9). In contrast to previous decades, the physiological questions were all on humans, or mammals and vertebrates in general ( Table 9). Several of the physiological questions focused on nutrition and food processing, specifically in the human digestive system. In this decade, the rise of genetics was even more prevalent, and the examinees were facing several questions on different aspects of genetics (Table 10). Also, evolutionary theory, cell biology and biochemistry were well represented (Table 10). Lastly, there were only a few questions on ecological themes (Table 10). The general trend was still essays, but some of the crossing experiments were presented as solvable problems. There were few questions testing simply knowledge, but most required comprehension, application and analysis. For instance, the decade included several crossing experiments testing the application of Mendel's laws. The analytically most complex questions were likely "On extinct organisms that combine characters from different systematic groups and their relevance for our view on evolution," "How is it determined whether the egg cell develops into a boy or girl and how can the equal number of boys and girls be explained?," and "Compare respiration and fermentation in plants." Finally, there were a couple of questions where the examinees were asked to evaluate ideas and concepts: "How does modern research view Darwinian selection as the force of evolution?" and "Overview of the cell concept throughout history."

The 1950s -Cytogenetics, human physiology and ecology
In the 1950s, 79 biological questions were included in the FME. In this decade, the focus of genetics shifted increasingly from Mendelism to cytogenetics. In zoology, there were few questions on the physiology of animals since most were examining human physiology. Some renaissance of ecological and systematic questions could also be observed, having been more or less absent since the mid-1930s (Figure 4). In botany, most questions were inspecting plant physiology, although a few ecological questions were also included (Table 11). A new theme was phototropism, which had not been encountered in previous decades (Table 11). During the 1950s, a few systematic questions were asked for the first time since the 1930s (Table 12). In addition, some assignments were on animal physiology and development, but otherwise, all the other assignments were on human physiology (Table 12). In this decade, the focus of genetics turned increasingly from Mendelism and crossings to cytogenetics (Table 13). Interestingly, the examinees were asked for the first time to evaluate the negative effects of inbreeding and consanguineous marriages (Table 13). As for evolutionary theory, central evolutionary concepts were tested as in previous decades (Table 13). In addition, the decade witnessed a renaissance of ecology, as community ecology, biogeography and ecosystems were examined from different perspectives (Table 13). In terms of biochemistry, cell biology and microbiology, an overarching theme of the decade was energy and the physical and chemical limitations of life on earth (Table 13). Educationally, the exam did not change from the 1940s, and most questions were essays, although some crossing problems were presented as well. As in the 1940s, only a few questions were testing solely knowledge, as most assignments involved comprehension, application and analysis. The cognitively most challenging questions involving analysis and evaluation were likely "What does genetics say about consanguineous marriages?," "Compare natural and artificial classification systems," "How does evolution result in speciation?," "How do organisms differ from the nonliving nature?" and "Biogeography as evidence for the evolutionary theory." Furthermore, some questions included creative elements such as "Is human breeding possible in the view of genetics?"

The 1960s -Towards modern biology and establishment
In the 1960s, 92 biological questions were included in the FME. The decade is characterized by further modernization and the inclusion of novel genetic concepts, but otherwise, the biological and educational profile of the exam was similar to the trend established in the 1950s ( Figure 5). Here, we define modern biology as the integrative discipline of biology encompassing all the fields from biochemistry to ecology that formed during the latter half of the 20 th century.  (Sorsa et al., 1966).
In the 1960s, the significance of botany and plants started to decline in the exam, but the exam nonetheless covered classical concepts of plant physiology (Table 14). Novel elements were plant hormones and the regulation of growth (Table 14). In addition, a few ecological questions were included (Table 14).
In the 1960s, there were only a couple of questions on classical zoology, and otherwise, the questions were testing human anatomy, histology and physiology (Table 15). In addition, there was the first purely clinical question when the examinees were asked to explain transplantations and tissue cultures (Table 15). In genetics, there were assignments on classical crossings, cytogenetics and other novel genetic concepts such as polyploidy (Table 16). Interestingly, there were a few questions on eugenics for the first time since the 1930s (Table 16). As for evolution, the questions examined the foundations and evidence for the evolutionary theory as well as the evolutionary history of life on earth (Table 16). Interestingly, there were relatively many questions about the Carboniferous period. In terms of ecology, the test asked for knowledge on ecosystems as well as ecological concepts (Table 16). With respect to cell biology and microbiology, the test asked classical questions on cellular structure, while the biochemical assignments focused on metabolism (Table 16). The educational profile of the exam was similar to the exam from the 1950s. Nonetheless, there were more tasks asking for the comprehension of experimental methods than in previous decades. Some of the more challenging questions were "How can you study the genotype of an individual if it expresses a dominant trait," "Changes in the genotype and its relevance for the evolution of organisms," and "Twins and their role in genetic research."

Quantitative trends in knowledge content
The kappa statistic of classifying the questions into categories of knowledge content was 0.89, which can be regarded as a strong agreement on the profile of the questions. The cross-tabulation of interrater classifications reveal that there was some disagreement between Molecules to organisms and the other categories (Supplementary Material, Figure 1).
As seen from the stacked area graph ( Figure 6, Table 17), most questions in the 1920s were in the category Molecules to organisms and the second most in Ecology.
In the 1930s, questions on Genetics increased at the expense of Ecology, and in the 1940s only a few questions on Ecology were asked. In the 1950s and 1960s, the profile stabilized and the percentage of questions in Molecules to organisms decreased. In 1944, 1945 and 1946 extraordinary exams were held, which slightly shifts the results in these years, and likely explains the lesser variation in categories during this period.  1922, 1923, 1940, 1942 and 1943 are missing. When inspecting frequencies of biological categories across decades, there was no significant change in the proportions between the 1920s and 1930s or between the 1930s and 1940s (Table 17), but as a whole, there was a significant change when moving from the 1920s to 1940s (X 2 = 11.47, p = 0.01**, Fisher p = 0.01**). Between the 1920s and 1940s, the increase in genetics and decrease in ecology explained this trend (Table 17). When moving from the 1940s to 1950s, there was a significant change in category proportions, while the 1950s and 1960s were similar in their knowledge content (Table 17). A decrease in Molecules to organisms and an increase in Genetics and Ecology stood for this result (Table 17).

Biological novelties
In total, 23 novel biological novelties were found in the exam questions, and the year of the academic establishment was delineated for the novelties (Table 18).  The mean time of introduction to the exam was 15-5 years, while the median was 9. The bimodal nature of the density plot of the time of introduction is explained by the fact that approximations were usually made to the closest start of the decade (Figure 7). Otherwise, the density plot and the Poisson test clearly indicate that the rate of introduction is Poisson distributed (p < 0.0001 *** ) with the event rate 15.

Quantitative trends in educational form
The kappa statistic of classifying the questions according to Bloom's taxonomy was 0.83, which can be regarded as a strong agreement on the educational form of the questions. Nonetheless, the cross-tabulation of interrater classifications reveal that there was some disagreement between knowing and comprehending as well as comprehending and applying (Supplementary Material, Figure 2).  1922, 1923, 1940, 1942 and 1943 are missing. When counting frequencies for Bloom's taxonomy, most questions represented the Concept class, and therefore only the cognitive dimension was chosen for further analysis. As seen from the stacked area graph (Figure 8), the proportion of knowledgetesting questions was high in the 1920s but decreased already in the 1930s. In contrast, the number of comprehensive and analytic questions increased over time, and in the 1940s, the educational profile had settled (Figure 8). Interestingly, there was a peak in the cognitive demand of the questions between 1936 and 1944, when as much as 44% of the questions were ranked to be at level 3 or higher.
As seen from the frequency data, there was a significant change in the educational form between the 1920s and 1930s, and between the 1930s and 1940s, but not after the 1940s anymore (Table 19). The differences are attributable to the change in the number of questions requiring comprehension, application and analysis. Interestingly, the percentage of assignments of different cognitive levels varied between biological categories (Figure 9, Table 20). Evolution had more analytic and evaluative questions than the other categories, while Ecology and Genetics had most applicative tasks. In Molecules to organisms, there were a few analytic and applicative tasks, but most were testing comprehension. Lastly, pairwise comparisons of the frequencies of the four categories show that Evolution stands out from the other categories, and Genetics differs from Molecules to organisms, whereas the difference between Genetics and Ecology as well as

Trends in knowledge content -the FME reflects Finnish academia
The MEB has always consisted of censors from Finnish academia (mostly from the University of Helsinki but later also from other universities), and hence it was assumed that the FME reflects both contemporary national and international academic trends (Kaarninen & Kaarninen, 2002). However, this connection has never been systematically demonstrated for biology. Here, we show that FME mirrors the Finnish history of biology both in Finnish academia and the upper secondary school system. The changes in question content further reflect the major advancements in biology as a field, and a great number of questions concerned novel topics discovered during the study period (Table 18). In addition, also political, social, and economic trends in Finnish history can be seen in the question content.
In the 1920s, the focus on comparative zoology in the FME nicely reflects the emphasis on this subject in Finnish zoology. In the beginning of the 20 th century, Finnish zoology was greatly influenced by scholars in Germany, where evolutionary morphology had a strong foothold in the beginning of the century (Levit et al., 2014). However, it is not clear why so few systematic questions were asked in zoology, as zootaxonomy was firmly established in Finnish academia in the 19 th century and well presented in contemporary school books (Kivirikko, 1923;Leikola, 2011). As the MEB does not produce protocols or other documents of the exam preparation, the work of the MEB must be interpreted from secondary sources (Kaarninen & Kaarninen, 2002). It seems that there was an agreement in the MEB that the botanical questions focused on systematical aspects, as botanical research in Finnish academia heavily concentrated on taxonomic research rather than plant physiology in the beginning of the 20 th century (Morton et al., 1999). This is also reflected by the fact that the professorship of plant physiology at the University of Helsinki was instated no sooner than 1939, while several positions were already devoted to plant systematics (Autio, 2000). In contrast, the professorship of zoophysiology had been instated already in 1910 alongside positions in systematics and ecology (Autio, 2000). In the 1930s, the interest for plant physiology rose in Finnish academia thanks to the works of Fredrik Elfving, which apparently led to more assignments on plant physiology in the FME and likely left room for taxonomic questions in zoology (Autio, 2000;Morton et al., 1999). However, further investigation into the history of both university and secondary school teaching is required to assess whether this focus on plant systematics and comparative zoology was a general trend in Finnish universities and schools or only a peculiarity of the FME.
In the 1930s and 1940s, the increased focus on genetics and the evidence and foundations of the evolutionary theory reflected the ongoing academic debate and establishment of the Modern Synthesis both internationally and in Finland (Gayon, 2016). Interestingly, the only purely racist and eugenic questions were asked in 1929 and 1930, which we see to mirror both the academic and political history of Finland. As for academia, Mattila (1999) reports that eugenic thoughts were introduced and advocated in the 1910s mainly by three leading Swedish-speaking professors: Ossian Schauman, professor of internal medicine, Jarl Hagelstam lecturer in neurology, and Harry Federley, the first professor of genetics, all based at the University of Helsinki. He notes that their eugenic ideas were faced with suspicion by several contemporary physicians at first, although he remarks that the reason was mostly due to unfamiliarity with the new field of genetics.
Towards the 1920s, eugenics became more widely acknowledged in Finnish academia, and the Finnish eugenicists collaborated with colleagues at the State Institute for Racial Biology in Sweden and the Kaiser Wilhelm Institute of Anthropology, Human Heredity, and Eugenics in Germany (Hietala, 2009;Mattila, 1999). Schauman, Hagelstam and Federlay were all involved in leading the committee on sterilization legislation that prepared the Finnish sterilization law passed in 1934 (Hietala, 2009;Mattila, 1999). Furthermore, eugenics was presented as a part of human heredity in Finnish schoolbooks of biology from the 1920s well into the 1940s, as was the case in both Sweden and Germany (Mattila, 1999;Wendt, 2015).
Taken together, eugenic thoughts were not uncommon in Finnish academia or school material from the 1920s to the 1940s, and therefore the lack of eugenics in the FME after 1930 is interesting. In the 1930s, Väinö Lassila, professor of anatomy, and Erkki Vala, chief editor of the periodical Tulenkantajat, criticized the sterilization law and warned how similar legislation was abused by the Nazi Party in Germany (Mattila, 1999). The right extremist Lapua Movement was quenched in 1932, and therefore we speculate that the Finnish political climate might have influenced the MEB's willingness to ask eugenic questions later in the 1930s. Nonetheless, how eugenics was manifested in Finnish secondary school teaching in the 1930s amidst both academic and political trends would need further clarification.
Interestingly, several of the physiological questions in the 1940s examined nutrition in both plants and humans, which may have to do with academic as well as political and economic factors. As for academic factors, the Finnish Nobelist A.I. Virtanen performed foundational research on nitrogen metabolism and biochemistry in the 1930s and initiated several projects on public nutrition and health together with his colleagues (Heikonen, 1990;Perko, 2014). However, this trend was not unique to Finland as public nutrition and health programs were started and also planned in other Western countries (Mayhew, 1988). As for political and economic factors, one may also speculate whether the scarcity of the wartime affected this trend.
Moreover, the wartime may have contributed to the decline in ecological questions, while biological questions with medical relevance such as genetics and physiology were emphasized. The interrelationship between physical sciences and wartime is widely recognized: science affects weapons and warfare, and warfare steers science in an applied direction to produce better weapons, the typical example being the Manhattan project (Roland, 1985). As for medicine, the relationship is more controversial, with some authors supporting the view that war may also direct and advance medical research (Cooter, 1990). Nonetheless, we find an interesting theme for further research to see whether warfare would affect biological research and teaching by emphasizing themes important for warfare, such as public nutrition and medical aspects.
The new focus on developmental biology may reflect the rise of experimental embryology epitomized by Spemann's induction experiments and followed by several embryologists in Finland (Leikola, 2003). Gunnar Ekman and Sulo Toivonen were both prominent experimental embryologists who were involved in writing school books in biology for upper secondary schools and actively popularized their field of study (Leikola, 2003). Again, further research on secondary school teaching would reveal to what extent developmental biology was also emphasized outside the FME.
In the 1950s, the establishment of the exam's biological content may reflect the higher availability of secondary education to different societal segments, leading to an increase in teachers and academics and less random selection of questions (Kaarninen & Kaarninen, 2002). This would correspond to the progress in the US, where the increasing number of students and a reaction against highly specialized courses in zoology and botany were the leading factors for establishing the modern curriculum in biology in the 1950s and 1960s (Rosenthal, 1990). Moreover, advancements in various fields of ecology of the time likely initiated the renaissance of ecology in the exam in the same way as in the US (McComas, 2002;Odum & Barrett, 1971). One can also speculate whether the increased number of examinees and teachers also forced the MEB to ask more questions in ecology and traditional natural history, as these were considered to be more familiar to both teachers and students (Kaarninen & Kaarninen, 2002). For example, Suomalainen and Segerstråle (1953) had redesigned their school book to start with ecology for this reason, suggesting that these ideas were common within the secondary school of the 1950s.
Interestingly, there was no significant change in the biological content of the exam in the 1960s in contrast to mathematics and the physical sciences (Kaarninen & Kaarninen, 2002). This may be attributable to the fact that the pressure of the technological advancement of the Soviet Union in the beginning of the decade was seen mainly in physical sciences, while no comparable pressure was evident for biology (Graham, 1993). Furthermore, the exam had already been reformed a lot in the previous decade, which was likely deemed to be sufficient.
During the study period, we see a drastic change of assessment content from classical natural history to a more varied selection of topics in evolution, genetics and ecology. This is most evident in botany, as questions relating to plant sciences and especially plant systematics diminished from a major component of the questions in the 1920s to marginal component in the 1950s and 1960s. This is mirrored in school curricula and activities, such as the gathering of student herbaria. Whereas in the 1920s each student was to collect around 200 species of plants to their personal herbarium, the number of species was lowered multiple times, and eventually in 1969, the collection of personal student herbaria was dropped from the Finnish school system (Saarinen et al., 2016;Virtanen & Kankaanrinta, 1989). A similar trend was seen in classical non-human zoology, which gets increasingly replaced by human physiology during our study period. This trend is still evident in modern biology curricula and FME. Some authors have raised concern on the poor species identification skills of contemporary students, which can in part result from this shift away from systematic botany and zoology (Immonen et al., 2006).
The time of introduction of biological novelties decreased during the study period. This illustrates the relationship between the FME and Finnish academia, and the fact that exam developers were not afraid of introducing novelties in exams soon after their academic establishment. The pattern is evident even considering that the estimation of the year of academic establishment of biological novelties is difficult and at times arguably subjective (Supplementary Material). The decrease in the lag of time between academic establishment and inclusion in the FME can in part be explained by improved technology in information distribution and eventual electronic information distribution.

Trends in educational form -high standard from the beginning
This study contradicts the statement by Kaarninen and Kaarninen (2002) and Virta (2014) that the test battery in humanities and sciences would have tested only knowledge of factual details. Apparently, this may apply to other subjects, but not biology. In contrast, the exam in the 1930s was already rich in comprehensive and analytic components, and the educational standard was established in the 1940s, after which no significant improvements were made. Rostila (2014) and Lindholm (2017) report that about 20% of the questions in the modern FME in biology (2009)(2010)(2011)(2012)(2013)(2014)(2015) were on level 3 or higher in Bloom's taxonomy, indicating that the exam from the 1930s to 1960s was mostly as cognitively demanding as the modern exam. The problem with analyzing historical exam questions is that although the question itself is cognitively demanding, it remains unclear how much cognitive input was required for a given grade in the end. Nonetheless, the inclusion of several applicative, analytic and evaluative questions in the FME proves that the MEB has been subconsciously aware of good forms of assessment before the conceptualization of Bloom's taxonomy.
The most cognitively demanding period of the exam was around 1940, which coincides with the ongoing academic debate on the Modern Synthesis. The most cognitively demanding questions were asked on evolution and genetics, which also reflects the potential influence of the Modern Synthesis on the exam. Furthermore, Lindholm (2017) did not find the cognitive demand of evolution and genetics higher than that of the other categories in contemporary exams, suggesting further that this pattern is specific to the given historical context. The science and concepts of ecology was still in its infancy before the 1950s, and before that ecology was more or less descriptive natural history, upon which it was hard to construct good analytic questions. The lack of applicative and analytic questions in the category Molecules to organisms may be explained by the lack of experimental instrumentation in schools, and probably because of the perceived technicality of the subject (Suomalainen & Segerstråle, 1953).

Conclusions
In conclusion, here we summarize for the first time the Finnish matriculation examination from a historical perspective. The FME in biology from 1921 to 1969 followed well both international and national academic trends and transferred them to the exam within 10-20 years. The data shows that the inclusion of biological questions to the exam follows a similar pattern: initial caution, excitement, and stabilization. Contrary to popular stereotypes, the old FME in biology had a high standard of assessment already from the 1930s onwards, comparable to the level of the modern exam. This shows that educators have been aware of good forms of assessment before its theoretical conceptualization. In addition, the cognitively most demanding questions were on evolution, proving that academic excitement in a given discipline may give rise to tasks of a high educational standard. The old FME questions may be used as an inspiration for devising good essay questions even for future generations of students.