Development and Validation of a High School Agricultural Literacy Assessment

The National Agricultural Literacy Outcomes (NALOs) are knowledge benchmarks for school-aged youth and are used to improve agricultural literacy (National Agriculture in the Classroom, 2014; National Center for Agricultural Literacy, 2017). Despite educational efforts, prior research indicated that high school populations remained at low or deficient literacy levels. Additionally, no agricultural literacy assessment instruments using the NALOs as a standardization benchmark have been developed for the 9-12th grades. The purpose of the study was to validate a summative NALO-centered assessment that could provide baseline data on agricultural


Introduction and Problem Statement
Agricultural literacy efforts prepare K-12 students to recognize and interpret information relevant for determining adult decisions regarding their health, global environment, public policy, and economic benefits (Hess & Trexler, 2011;Lawson & Weser, 1990;Redmond & Griffith, 2003).Agricultural literacy also influences positive perceptions and attitudes about agriculture (Specht et al., 2014).Due to its importance, literacy assessments were developed using a variety of benchmarks and methods for K-12 students (Frick, 1993;Leising et al., 1998Leising et al., , 2000;;Powell et al., 2008).While these instruments provided relevant data, Brandt (2016) and Longhurst et al. (2020) noted that older frameworks and definitions did not meet current needs.The lack of consistency in instrumentation contributed to a lack of replication and the ability to compare results with other populations nationwide.Both suggested that using the National Agricultural Literacy Outcomes (NALOs) (Spielmaker & Leising, 2013) K-12 grade-levelbanded benchmarks and the National Agricultural Literacy Logic Model (Spielmaker et al., 2014), a validated educational framework, could provide the consistency and uniformity necessary for a benchmark-based agricultural literacy assessment instrument.Moving toward uniformity can provide a pathway for agricultural literacy assessment that is more reliable, comparable, and applicable across different studies, disciplines, and contexts.
The absence of a standardized tool to measure agricultural literacy during and after high school limits the understanding of agricultural literacy levels in different populations.Without a consistent assessment, it is difficult to identify national knowledge gaps, design targeted education interventions, and track progress in educational efforts.Consequently, this study aimed to develop a summative high school (12th grade) agricultural literacy assessment using the NALOs as a foundational tool for future research.

Theoretical and Conceptual Framework
The framework for this study is based on Longhurst et al. (2020) who developed and validated a 3-5th grade NALO-based agricultural literacy assessment.Their work provided a replication model for us; their success was centered upon two essential frameworks.Cosby et al. (2022) stated that the NALOs "provide the most comprehensive learning framework across the globe against which to measure student agricultural literacy…and they provide benchmarks to increase uniformity across the national education system in the USA" (p.10).The NALO benchmarks were developed via Delphi using a rigorous integration of national grade level benchmarks and national standards for science, social studies, and healthorganized through the lens of agricultural literacy (Spielmaker & Leising, 2013) Literacy (Spielmaker et al., 2014).The NALOs align with the AAAE Research Values (American Association for Agricultural Education [AAAE], 2023), providing a measure for determining the impacts of educational outreach.Within this study, the NALOs guided instrument item development through the question: What must students know or be able to do with the information they have learned to be proficient in the NALO standards?

National Agricultural Literacy Outcomes Framework
Programme for International Student Assessment (PISA) Framework Generally, summative assessments are cumulative to determine what students do or do not know.A significant limitation is determining only a pass or failing score, where a failing score may convey that the student lacks any understanding (Boud & Falchikov, 2006).The National Research Council (2009) suggested using assessments that showed a progression sequence because it identifies what a person can do within stages of development.Therefore, the PISA framework served as a guide because it "assesses students and uses the outcomes of that assessment to produce estimates of students' proficiency in relation to the skills and knowledge being assessed in each domain" (OECD: Programme for International Student Assessment [OECD: PISA], 2016, p. 276).The PISA framework has well-defined parameters.The domains used within this framework were the five NALO themes.The domain skills (assessment items) were developed from very low levels of proficiency to very high levels.Following the structure, the easiest items focused on content knowledge and the relation to agricultural phenomena.The most difficult items drew upon interrelated ideas and concepts that required "an understanding of events, consequences, or processes" (OECD: PISA, 2016, p. 282).Within this context, a student's agricultural literacy level determined their place on a sliding proficiency scale, ranked by how frequently they answered questions correctly.This premise followed the central dogma of the PISA assessment: "If a student's proficiency level exceeds the item's difficulty, the probability that the student can complete that item is high, and if the student's proficiency is lower than what is required by the item, the probability for student success on that item is low" (OECD: PISA, 2016, p. 279).Most importantly, the NALOs were constructed in grade-banded levels that interrelated and overlapped, ensuring that students advanced from primary to advanced content, practice, and examples of complexities as they moved toward more sophisticated curricula.The integration of both frameworks established a well-defined model for (a) developing questions that represented an increase in skill and ability for better understanding student proficiency and (b) providing data that were representative of progression toward literacy.

Purpose
Guided by the conceptual frameworks, the study aimed to develop and validate a summative high school (12th grade) NALO-centered instrument that could assess proficiency levels of agricultural literacy.
The study addressed the following research question: Is the instrument a valid and reliable measure of the five Grade 12 NALO themes and the proficiency stages of agricultural literacy (i.e., exposure, factual literacy, and applied proficiency)?

Methods
There were three phases for defining the quantitative development and validation of the agricultural literacy assessment instrument.
Phase One: Instrument Construction Longhurst et al. (2020) showed effectiveness in determining instrument items via a Delphi model because of the complexity of the content.Goodman (1987) noted that if the experts participating in the development process were representative of the area of knowledge, then content validity and reliability could be assumed.Messick (1995) and Sireci (1998) clarified that content validation added verification and critical mechanisms of construct validity.Literature also indicated that committee selection was an essential part of the process because it determined the quality of the items (Jacobs, 1996;Judd, 1972;Taylor & Judd, 1989).Therefore, the consideration of experts who participated in the Delphi construction of items was paramount.Individuals were direct experts in secondary agricultural education, curriculum development, agricultural policy, communications, cooperative extension and outreach, agribusiness, and STEM education; they were selected from multiple states and possessed advanced degrees or teaching certificates.In all, twelve members participated in item construction.Delbecq et al. (1975) suggested that ten to fifteen members were sufficient if the background of the Delphi subjects were homogenous.We promoted homogeneity in the subject selection to best represent the processing capability.

Phase Two: Data Collection
The convenience sample population for validation was N = 600 Utah State University students.Convenience sampling of college students is a prevalent approach for data collection in educational research (Hanel & Vione, 2016).Undergraduate student samples can be a legitimate solution when strongly justified, and problems can be minimized through conscientious research design and execution (Bello et al., 2009;Winton & Sabol, 2022).To carefully address these parameters, we incentivized college students with extra credit because obtaining data from end-of-senior-year high school students was extremely difficult due to our state ethics review opt-in-only rules for minors and the willingness of that population to participate in a survey at that specific time.We prioritized recruiting first-year, first-semester students but allowed older students to participate to ensure that the sample size could accurately accommodate the factor analysis that Comrey and Lee (1992) estimated for factor analysis (N = 500+) to be very good or excellent.Survey items were accessed via Qualtrics.We monitored the survey for three weeks; email reminders were sent to students weekly throughout the collection period.

Phase Three: Instrument Validation
We analyzed the data following procedures and processes outlined by Longhurst et al. (2020).First, data were organized, cleaned for non-response, and dummy coded.The highest and partial scores were calculated, and then an Exploratory Factor Analysis (EFA) was conducted in SAS (Version 9.4).The frequencies of the relationships between the proficiency stages determined the latent constructs.Following EFA, item analyses were conducted on items with varying frequencies.Ultimately, the best items were identified and analyzed using Confirmatory Factor Analysis (CFA) and Discriminant Analysis (DA).We concluded our analysis by identifying the final items to construct two separate instruments.

Limitations
College samples tend to exhibit homogeneity toward diversity, and students may fall within the higher spectrum of cognitive skills (Stevens, 2011).We underscored the importance of lived experiences that could contribute to agricultural literacy proficiency over time.Additionally, the survey items were directly associated with the NALO benchmarks, resulting in correlation, lack of independence, and multicollinearity risks.Measures of covariance among the latent variables were analyzed, but CFA results should be treated with caution.Finally, using DA enabled determining whether differences existed between the proficiency stages.The use of DA defined the degree to which the instrument differentiated between the constructs.

Findings
Phase One: Instrument Construction The Delphi team developed survey items by integrating item content, relevance to the NALO demands, and effectiveness guidelines for summative assessment.Each team member was asked to create between three and five questions (including answers) for each NALO theme.The questions had to be identified by one of the three proficiency levels.The first three rounds identified which of the 64 questions best represented the NALO theme and the appropriate proficiency level.Table 1 shows an example of how a construct analysis clarified the requirements of the NALO benchmarks and the parameters for each proficiency level.It is an example of how the team combined the defined measures of the proficiency scale parameters from PISA, and the proficiency level descriptors.The process provided a point-by-point evaluation for each determinant factor required for a valid summative assessment of the 12thgrade NALOs.From there, the fourth round eliminated questions with the lowest rankings and sent the remaining items to be refined by the group for the final two iterations.Note.Proficiency levels adapted from the works of Joplin (1981), Roberts (2006), and the PISA Technical Report (OECD: PISA, 2016).
Thus, the team finalized 45 items (three questions for each proficiency level in each NALO theme).Longhurst et al. (2020) showed that 15 questions were sufficient for the final instrument, but more questions were reviewed to increase the probability of a valid question in each theme and proficiency level.
Based on the commitment to high-quality Delphi development, replication of successful methods established in prior research, and connection to the best practices for summative evaluation, we showed that the items summatively assessed the Grade-12 NALO benchmarks and provided content and construct validity for each survey item.

Phase Two: Data Collection
The undergraduate sampling resulted in 71% of participants having completed less than one year of college and 89% less than two years (n = 468), with only 11% (n = 47) having completed three to four years but being younger than 23 years old.Qualtrics reported that 580 students accessed the survey, N = 515 completed the survey, and 48 did not complete the survey (89% response rate).We proceeded toward validation based on Comrey and Lee (1992) and MacCallum et al. (2001) who determined an acceptable level of N was dependent upon (a) the commonality of the variables, (b) the degree of overdetermination of the factor, (c) the size of the loading, and (d) model fit (f).These boundaries provided a conservative measure for our sample size, with priority given to the requirements of the factor analysis due to its importance in the study.

Phase Three: Instrument Validation
We coded 1 or 0 for correct or non-correct responses.Each possible response option was also scored as correct (1) or non-correct (0) for items with more than one correct response.This allowed for the allocation of partial scores for each overall item based on the percentages of correct responses selected by a respondent.The 45 survey items were first measured for total correct response (max = 34, min = 4, M = 21.34,SD = 5.44, N = 515).A maximum score was used to determine initial participant proficiency stages based on PISA literature (OECD: PISA, 2016, pp.280-281), testing parameters, and statistical best practices.Partial total correct scoring was used to determine if a survey item was too difficult or if there were only poor or too difficult portions.An item analysis, difficulty index, and correct partial percentages were critical indicators for establishing the baseline measures before factor and item analysis.

Factor and Item Analysis
We used a structural linear equation model for EFA and CFA.Three latent factors representing the proficiency stages were analyzed against the items from each NALO theme.The factor loadings determined the influence of the proficiency groups on the scores associated with each survey item.The EFA measured the strength of the relationships between the proficiency stages (factors) and items using the percentage of correct and incorrect responses as indicator variables representing the NALO themes.Items that targeted EFA loading and frequency correct ranges were then examined with item analysis.This resulted in the construction of two 15-item assessment instruments.Each instrument contained three questions for each of the five NALO themes.Based on the allocated proficiency level, those three questions were staged from the easiest to the most challenging item.Following EFA, we determined questions that were too easy, difficult, or poor and eliminated them based on frequency results.Item analyses were then conducted on items with varying frequencies, which were also used to determine if the EFA frequencies improved when specific poor answer choices were removed.We carefully ensured that option changes did not affect the question context.
Separately, we conducted a CFA for each of the two 15-item assessment instruments to if the model fit was adequate.For each instrument, each item was loaded on its assigned factor (proficiency level) to determine if the underlying correlational structure of the independent variables (the five NALO themes) represented each latent factor.Table 2 shows that the linear structural equation estimation indicated that both instruments fit adequately.Additionally, the CFA analysis determined that indicator variables significantly loaded on their respective proficiency stage factor (all p-values below .001),indicating that differences between loadings and zero were significant.Collectively, it identified an almost non-existent shared variance among the variables-or a considerable amount of unique variance was seen among them.The Cronbach's coefficient across proficiency stages was measured for Instrument I (N = 515): Exposure (Total α = .46,Partial α = .55);Literacy (Total α = .58;Partial α = .62);Proficiency (Total α = .37;Partial α = .65)and Instrument II (N = 515): Exposure (Total α = .48,Partial α = .50);Literacy (Total α = .47;Partial α = .54);Proficiency (Total α = .29;Partial α = .38).The reliability coefficients are low; however, Taber (2018) noted that alpha values vary greatly by discipline.Additionally, high reliability may indicate that items are redundant, and the length of the instrument (less than 20 items) limits the alpha and complicates the process of unpacking internal reliability.The partial scores have higher alpha measures because they have a greater range of possible responses.They are relevant because they identify that when questions are not scored strictly right or wrong, they lead to a greater understanding of where respondent understanding is.The alpha numbers are likely low due to multiple themes for each factor.We corroborated these results with Pearson's product-moment correlation and Difficulty Indices.Results indicated an acceptable internal consistency and reliability level for both instruments because our goal was to produce non-redundant instrument items that could discriminate skill levels.Ultimately, the CFA showed enough evidence to substantiate the model as fitting adequately with a small or weak relationship between the proficiency stages.

Discriminant Analysis & Summary
DA was used to clarify the CFA results.Table 3 indicates that the cross-validation percentages for both instruments were extremely accurate and well within the range of p < .05.Equally strong re-substitution percentages were as good or better than the cross-validation results.The DA was the most definitive conclusion that the items aligned correctly for the five NALO themes, indicating that users can accurately administer either assessment to determine students' proficiency levels in agricultural literacy.Users, however, should not "mix and match" questions between instruments because both have been independently validated in this study.We concluded our analysis by finalizing items to construct two separate assessment instruments.

Findings Summary
The findings showed that both instruments were valid and reliable for measuring the 12thgrade NALO theme benchmarks and determining an agricultural literacy proficiency level.Future users need to understand how to use the instruments effectively.

Using the Instruments
Determining the proficiency level of a participant is an essential part of assessment analysis.Practitioners can identify the proficiency stages of the two instruments by listing participants with a score ≥ 12 (out of 15) as applicably proficient, those with a score of 8 ≥ 11 as factually literate, and those ≤ 7 at the exposure level.
Scores can be interpreted individually or using a group's mean, median, or mode.Total correct scores are as helpful as partial correct scores.Partial correct scores can be obtained by examining individual assessments to determine which NALO items were incorrect, then using that information to identify gaps in thematic content, misinformation, or analysis related to experience or agricultural exposure.If the NALO themes are used for program achievement goals, and students do not show consistent growth across all five themes, the score can indicate curricula or instructional gaps.These instruments were designed to show cumulative assessment for K-12 agricultural literacy development.Ideally, students who have been instructed throughout their primary and secondary education should be applicably proficient at the end of twelfth grade.Proficiency levels that are less than ideal for high school graduates give educators and agricultural stakeholders information that can be used to understand where adult consumers may need additional information to make informed agricultural decisions.Furthermore, although this study sought to provide summative assessment, educators' use of the tools as a formative measurement is encouraged.Using the instruments formatively, in combination with a qualitative interview, could be the most exact way to determine how participants perceived or misperceived a correct answer.

Conclusions, Discussion, and Recommendations
This study provided two standardized instruments that can measure agricultural literacy nationwide.There are now NALO-based assessments for elementary, middle, and high school students.By addressing the absence of a standardized tool for high school students and high school graduates, we fill a gap in existing literature and enhance the reliability, comparability, and applicability of future research in agricultural literacy.We recommend using these instruments to unify efforts to identify national knowledge gaps, better target educational initiatives, and increase study replication using consistent instrumentation.While not a comprehensive assessment, these instruments can impact how we implement and evaluate formal and nonformal agricultural education.Additionally, program planners and evaluators should use data from these assessments to determine the efficacy of their programs, hopefully leading to initiatives driven by program impacts rather than program outputs.Doerfert (2003) maintained that the true implications of agricultural literacy could only be seen as we study populations and programs over time.Instrument use within the same or similar programs can provide a roadmap of program efficacy that showcases which areas of agricultural literacy have improved over time and through which methods of instruction.
Practitioners should work with researchers to identify populations beyond K-12 students or formal classroom settings (Warnick, 2022).These instruments can improve learning opportunities for youth and adults in community-driven events associated with agritourism, 4-H, community gardening, farmer's markets, and career awareness fairs.The length of the assessment makes it digitally accessible in a variety of environments via a smartphone and may open opportunities for greater discussions on agricultural topics with event participants.Additionally, Minkler and Salvatore (2012) outlined that collaborative research and evaluation processes contributed to greater success within community-engaged programs.Researchers working in tandem with communities can assess agricultural literacy and tailor interventions based on feedback and data from community members.Land grant institutions and Cooperative Extension can fulfill pivotal roles in enhancing their communities by leveraging their resources, expertise, and outreach capabilities toward agricultural literacy assessment.Using these tools at local levels provides data on knowledge and fosters relationships that can promote trust, limit misinformation about agriculture, and encourage informed consumer choices.By bridging the gap between research and communities, assessment data can contribute to the prosperity of agricultural sectors through programs that empower individuals to make informed decisions that positively impact their well-being and the broader community.
. The NALOs reflect prior research and five cross-disciplinary themes: (a) Agriculture and the environment, (b) Plants and animals for food, fiber & energy, (c) Food, health & lifestyle, (d) STEM, and (e) Culture, society, economy & geography (National Center for Agricultural Literacy [NCAL], 2017).They were designed as a logic model component for K-20 assessment and program evaluation for National Agriculture in the Classroom programs and the National Center for Agricultural

Table 1
Construct Analysis: Examples of Theme Two Items

Table 2
Confirmatory Factor Analysis Fit Summary Based on Total Correct Items

Table 3
Discriminant Analysis: Cross-validation Summary Using Linear Discriminant Functions