Peer Review

Lessons Learned from a Decade of Authentic Assessment

In fall 2009, a reimagined faculty-developed general education curriculum was introduced at Roanoke College. The Intellectual Inquiry (INQ) curriculum was the result of five years of faculty learning, planning, and negotiating to produce a cohesive, common-core learning opportunity for students. Through the critical inquiry curriculum, students develop cross-cutting skills while being immersed in engaging themes and topics that challenge them to frame, ask, and seek answers for complex questions.

The goal of the curriculum is to produce graduates capable of addressing real world problems in diverse and collaborative settings. The capstone experience, INQ 300 Contemporary Issues, requires cross-disciplinary groups of students to problematize a contemporary issue to a local setting, apply their skills and knowledge to develop a proposal to address that problem, and then orally defend their proposal. Students synthesize skills and knowledge from their majors with those developed in earlier Intellectual Inquiry courses. All earlier INQ courses focus on skill development. The curriculum’s entry point is INQ 110, a rigorous writing seminar taught by faculty across the college, in which students must apply careful reading skills and writing skills to explore topics like Formation of the Western Mind and Black Lives Matter. In the second seminar, INQ 120, students apply oral communication skills and ethical reasoning in courses such as Matters of Life and Death and Thinking Animals. A series of 200-level courses provide foundations in quantitative reasoning as well as the questions and methods of the natural sciences, social sciences, and humanities while continuing to develop students’ communication and reasoning skills. Students learn to integrate knowledge and skills from many disciplines to address complex problems both in their education and in their lives after graduation.

The design, development, and implementation of a redesigned curriculum in which each course has a unique topic was a massive undertaking requiring hundreds of new courses to be developed and moved through faculty governance. By the end of the three-year curriculum rollout, the number of sections designed, reviewed, and approved was equal to the number of full-time faculty. The course design and development effort continued as the curriculum was launched, while faculty made intentional efforts to maintain the quality of the previous curriculum for existing students. Faculty were busy juggling their many academic responsibilities while heavily engaged in INQ tasks. To move forward with INQ implementation while planning and piloting general education assessment, faculty adopted some guiding principles to facilitate decision making: (1) the assessment process would map onto the approved curriculum (e.g., assessment would not require curricular change); (2) faculty would take lead roles in the design, pilot, and implementation of assessment tools; (3) assessment would not reduce the curriculum to “teaching to the test”; and (4) the assessment effort would be cost-effective.

A Natural Fit

The Roanoke College faculty spent considerable effort specifying how cross-cutting skills would be developed in the INQ curriculum. Although not framed in the language of learning outcomes assessment, faculty spent time considering the types of assignments expected in courses and how students would engage in critical inquiry while applying their skills. The adoption of general education assessment was aided by faculty members’ focus on common expectations and assignment types instead of wordsmithing learning outcomes. For example, in INQ 110, faculty committees specified how much and at what level they wanted students to write. Using this information, the assessment director offered language for a simple learning outcome to act as an organizational guidepost for the more detailed work of the faculty. Rather than focus on the language used to write about outcomes assessment, faculty focused on backwards course design, assignment development, and criteria for assessing students’ writing quality.

In early conversations, faculty chose to focus assessment efforts on communication skills, reasoning skills, and the foundational methods and skills of disciplines (across the natural sciences, social sciences, and humanities) rather than more specific disciplinary knowledge connected to the course topics. For example, students were assessed on their ability to appropriately apply disciplinary theory to real-world problems, allowing flexibility in the specification of content and teaching methods. The choice to assess skills over content allowed faculty to engage in curriculum mapping and subsequent assessment mapping with the approved curriculum.

Faculty-Led Assessment

In the first semester of curriculum implementation, a subset of INQ 110 instructors gathered in a breakfast meeting to design a rubric to pilot with their writing assignments. However, they had limited expertise with rubric development. The committee decided to begin with the Written Communication VALUE Rubric, which emerged from the Liberal Education and America’s Promise (LEAP) initiative of the Association of American Colleges and Universities (AAC&U). While the VALUE rubric at that time was still in draft form, it served as a helpful template and starting point for committee deliberation. Following a suggestion from the AAC&U leadership and staff, the Roanoke faculty customized the rubric to better fit the INQ 110 design by changing developmental levels, reorganizing columns, and rewording criteria. The faculty applied the rubric to students’ writing products from the last third of the course, aggregated their responses, and reconvened to discuss pilot data and suggest rubric changes. Overall, student writing data confirmed faculty members’ professional opinions about student writing skills (face validity), and it seemed the data would be helpful to identify collective strengths and weaknesses in student written communication skills. However, healthy skepticism remained, and faculty questioned the likelihood of assessment data being used to inform substantive changes in instruction either at the program or course level.

Avoiding a Reductionist Approach

In addition to concerns about effective data use, faculty did not want their curriculum reform efforts to be reduced to “teaching to the test.” Faculty engaged in the pilot group felt the assessment exercise was useful, but what would happen when it was rolled out to all of campus? Would faculty develop “throw-away” assignments to turn in rubric scores? Would faculty be forced to use the same assignment across sections even if it wasn’t relevant to the course topic? Would assessment become an exercise in collecting data to create unused reports? Would student learning be reduced to a single percentage score?

Preventing all potential problems that could stem from reformed general education assessment is impossible, but the faculty took bold steps to proactively address concerns: (1) all faculty teaching an INQ course would apply the rubric to an existing assignment in the last third of their course; (2) for the identified assignment, work from all students would be scored using the rubric; (3) scores were aggregated at the criterion level and were not averaged across the skills; and (4) the associate dean for general education managed the data and was responsible for reporting aggregate scores to the teaching faculty.

Assessment on a Budget

In the second year of INQ implementation, the written communication rubric and the oral communication rubric were used in all INQ 110 and INQ 120 classes. And while early assessment efforts in the first-year seminars moved forward relatively well, implementation decisions relating to next steps of the assessment process led to questions about unanticipated costs: How can rubric norming sessions be conducted for all faculty teaching INQ courses? What would it cost to pay summer faculty to be second scorers on student work? Could the college afford the costs of traditional reliability and validity efforts? The INQ curriculum was a costly endeavor for a small college and the newly formed assessment office had an annual operating budget under two thousand dollars. With these budget constraints, an unconventional solution was necessary.

The Review Process

Before planning, organizing, and convening groups of faculty members to act as second evaluators to establish inter-rater reliability, the assessment office conducted a statistical analysis to see how many of our faculty scored student papers significantly higher or lower than their colleagues. Using the four-point VALUE rubric structure, the office averaged each INQ section’s scores according to the five criteria (content, organization, mechanics, etc.), resulting in five average scores per section. The director of institutional effectiveness and assessment assigned each section a random number to protect faculty identities and conducted a multivariate analysis of variance (MANOVA) to explore the magnitude of instructor effects on scores. Although far from a highly controlled research design for using a MANOVA, the results illuminated some interesting findings when considering the instructors as the independent variables and the average criteria scores as dependent variables. Overall, most faculty scored their first-year students similarly on specific criteria of the writing rubric.

For all instructors, the “use of evidence” criterion was significantly lower than the other writing criteria. Additionally, there were six instructors with average scores that were at least two standard deviations higher or lower than their colleagues. The assessment office reviewed the raw data of these six instructors to see how rubrics were used. In the cases of the four “easy graders,” nearly all of their first-year students’ writing projects were scored as a three or a four in every area. The two “hard graders” scored nearly all writing products as a 1 (below basic) in every area. After reflecting on the results, we determined that instructor variance in using the rubrics was not as large as we had anticipated, and plans for a large, expansive summer scoring effort were modified and a smaller group of twelve-month faculty members scored fewer samples of student work and discussed any discrepancies.

Focusing on Collected Data

After reviewing the statistical analysis and the results of the summer scoring effort, the faculty committee decided to develop clear directions for rubric scoring rather than providing norming sessions for all faculty. In rubric use instruction sessions, held before the start of classes in August, the associate dean also shared assessment data. Rather than concentrate on a perfect assessment process, the associate dean and lead faculty instructed INQ faculty on how to use data in concert with their professional judgment about student communication skills and discussed how to help students improve their performance on these important outcomes.

As INQ implementation continued, the faculty used AAC&U’s VALUE rubrics as a framework to develop new rubrics to address nearly all the skills and reasoning outcomes of the curriculum. Additionally, other assessment tools were designed and tested by faculty to determine how to effectively capture information about student learning in ways that would lead to productive conversations and continuous renewal of a high-quality general education program.

Lessons Learned

Roanoke College administrators and faculty learned a great deal about the assessment of student learning in the first ten years of the INQ curriculum. As we reflect on the past decade, it is evident there are opportunity costs to nearly every decision related to the assessment process. Limited by constraints on faculty time and financial resources, college faculty had to make decisions about the tasks they felt were most important for obtaining and using student learning data. Giving up some assurances of inter-rater reliability and scoring precision allowed assessment efforts to focus on faculty ownership of the process and use of the data. We chose to use assessment for discourse, affirmation, and proof of a rigorous curriculum rather than to create a perfect assessment engine. And, although there is always room for improvement, it has given continued attention to the quality of learning in the general education curriculum.


Kimberly Filer, Assistant Provost for Teaching and Learning, Virginia Tech; Former Director of Institutional Effectiveness and Assessment, Roanoke College; and Gail Steehler, Associate Dean for Academic Affairs and General Education, Roanoke College

Previous Issues