Liberal Education

Using Program Evaluation to Enhance Student Success

Several years ago, when I was associate dean in the College of Humanities, Arts, and Social Sciences, a new senior administrator on campus expressed the view that one of our premier first-year experience programs in the college was too expensive and that a different model, based on an approach taken at the administrator’s previous institution, was cheaper and far superior. It was only after I engaged in a careful analysis of the program’s impact, and distributed the results to the administration and larger campus community, that the program was saved from likely extinction. I learned an important lesson from this experience, one that I carried with me as vice provost for undergraduate education, a position that placed me in charge of a number of academic support and cocurricular programs on campus. In this article, I describe the approach we take to program evaluation at the University of California–Riverside (UCR), and recount some of the ways in which we have utilized evaluation results to enhance student success.

Identifying factors that affect attainment of program goals

The starting point for an evaluation study is a clear statement of program goals. Increasing retention rates or boosting graduation rates are two commonly-cited goals of academic support and cocurricular programs. However, many student support programs have less lofty goals—such as fostering student-faculty interaction or creating awareness of the support services available to students on campus—which may align with the larger objective of improving retention or graduation rates, or may be viewed as important contributors to campus-level learning outcomes. Whatever the goals of a program, it is imperative that they be clearly articulated (and enthusiastically embraced) by program providers.

 Once the desired outcomes or goals of a program have been identified, it is useful to begin with an exploration of the various factors that influence these outcomes. For outcomes such as grades, retention rates, or time to degree, most institutions of higher education possess electronic student records that contain this information, along with student attributes such as high school grade point average, SAT score, and first-generation status that might be expected to influence these outcomes. Additional possible correlates—such as paid work hours or plans to attend graduate school—may be gleaned from student surveys and linked, via student identification numbers, to student records data to discover their impact on program goals. 

To discern the most important factors influencing student attainment of a program goal, one might begin by exploring simple associations or correlations. Are first-generation students, for example, less likely to be retained than non-first-generation students? This constitutes little more than comparing mean retention rates for the two populations, and asking whether these means are statistically significantly different. A more sophisticated approach would relate outcomes such as retention to an array of possible determinative characteristics in a multivariate framework that allows the analyst to isolate the specific contribution of each characteristic as a distinct factor. One might find in simple correlations, for instance, that Hispanic students are less likely to be retained than Caucasian students and that low-income students are similarly at risk compared to non-low-income students. However, because Hispanic students are also more likely to be low-income, it is unclear which of these two characteristics is dominant. The multivariate framework helps answer this question.

While developing an understanding of the various factors that influence program goals is only a building block in a good program evaluation study, the results of such an analysis may be of value in and of themselves. For example, at UCR, we have discovered the following:

  • For our student population, high school grade point average is a far more important determinant of student success—including, for example, first-year college grade point average, retention, and likelihood of graduating—than SAT score, a discovery that has caused us to rethink the relative weight we give to these two criteria in recruitment and admissions decisions. (One of the historical reasons for introducing the SAT is that high school grades were unreliable predictors of student success. But, with the large population of traditionally underserved students at UCR, it is the SAT that appears to be unreliable.)
  • Living in student housing as a freshman has a positive and statistically significant impact on first-year retention. This insight has led us to initiate special programs for commuter students in hopes of lowering attrition rates for that population.
  • •Students in different majors progress at very different rates toward graduation, a finding that has fostered a deeper analysis of the causes of these differences in time to degree and spurred us to add courses to summer school offerings that reduce some of the bottlenecks in progression toward degree for students in the most highly impacted majors.

Evaluating program effectiveness

Having explored the determinants of specific markers of student success, it is relatively straightforward to embark upon an analysis of program effectiveness. To begin, it must be possible to identify the participants in a particular program from among the larger student body, and then to link participation status to information on individual outcome measures, such as retention, and to determinative characteristics, such as high school grade point averages and SAT scores. Program evaluation entails a comparison of outcomes across participant (i.e., “treated”) and nonparticipant (i.e., “control”) groups of students who are similar in every relevant respect. If the two groups are indeed similar with respect to all important determinative characteristics, then the difference in mean outcome measures across the two groups constitutes an estimate of the impact of the program.

The chief challenge in program evaluation is ensuring that the treated and control populations are indeed “similar in every relevant respect.” A straightforward way of both adjusting for observed differences across the two groups and testing for program impact is to utilize the multivariate model of outcome determinants discussed above, and to add to the list of determinative variables a variable indicating whether the students participated in the specific program in question. The association between this dichotomous variable and the outcome measure in the multivariate framework constitutes an estimate of the mean difference in the outcome variable due to the program, holding all other observed determinative factors constant. In this way, by including student demographic and behavioral characteristics among the group of determinative variables, the estimated program impact is for treatment and control groups that are observationally equivalent.

However, just because two groups are equivalent with regard to “observable” characteristics does not mean that they are “similar in every relevant respect”; “unobservable” characteristics may interact with both program participation and outcome measures to confound the multivariate evaluation results. Suppose, for example, that students who are more motivated to succeed are also drawn disproportionately to an academic intervention program. Even if the treated and control comparison groups are equivalent with regard to observable characteristics such as high school grade point average and first-generation status, because we are rarely able to observe student “motivation,” the estimated association between program participation and the outcome measure will capture both program effects and the fact that program participants are simply more motivated to succeed. Of course, the latter is not a causal result of the program, and so it represents a confounding influence on the estimated program impact that is difficult to disentangle from the true program effect. In one case at UCR, we addressed this problem by randomly assigning students to one of our first-year experience programs. This ensured that the treated and control groups were indeed similar in every relevant respect (both observable and unobservable) and thus rendered a more accurate assessment of program impact.

Insights from program evaluation at UCR

At UCR we have utilized multivariate program evaluation analysis to study the effectiveness of a variety of programs, including the impact of our supplemental instruction program on grade in the course, our first-year learning communities on first-year retention, and our summer bridge program on freshman year grades and retention. (Supplemental instruction utilizes upper-division undergraduate students to provide additional academic support in courses with high D/F rates. Learning communities take a variety of forms on the UCR campus, but in every case they are freshman transition programs that offer an intimate learning environment to first-year students. Summer bridge is a prematriculation transition program for freshmen in which students take preparatory writing or mathematics courses that put them on track to succeed in college-level coursework as they begin their freshman year in the fall.) These analyses have yielded a number of important insights.

Program evaluation results can guide resource allocation decisions, providing campus units with information on how best to spend their limited resources to get “the biggest bang for the buck.” For example, after assessing the overall impact of the supplemental instruction program on grades, we proceeded to break out the results by course and found enormous variation in estimated impacts across courses. Armed with this knowledge, we reallocated the program budget toward those courses with the highest estimated impact on course grade.

Program evaluation results may be used to establish and then spread best-practice program features. For example, having analyzed the overall impact of our first-year learning communities on student retention, we then estimated the separate impacts by college. Historically, different colleges on campus have adopted different features for their learning communities, and our program evaluation findings suggested that certain features had more significant impacts on retention than did others. These findings led to conversations across the colleges, followed by experimentation and convergence around best-practice features, with the end result being improved overall effectiveness.

Program evaluation results that are disappointing may spur further research, followed by program tinkering or even radical reform in an effort to establish improvements. For example, early evaluation results suggested that our summer bridge program was not as effective as we had hoped. Grades in subsequent math and writing courses were worse for students who had taken summer bridge coursework than for students who took these same preparatory courses in the fall quarter of their freshman year. Further analysis suggested that the length of the summer courses might be at fault. After expanding the length of the summer program from five to seven weeks, summer bridge students are now doing at least as well in subsequent courses as their counterparts who begin coursework in the fall.

Program evaluation results that are positive may spare programs from budget cuts. This is illustrated by the example I offered in the opening paragraph of this article. In this case, program evaluation results suggested that the first-year learning community program was in fact enormously successful, not just in enhancing student retention, but in promoting early declaration of a major by undeclared students and improving student writing skills. The program is now warmly embraced on campus, well funded, and widely advertised in recruiting events for the college in which it resides.

Additional benefits of program assessment

There are additional benefits from engaging in program assessment, quite apart from improving budgetary efficiency and enhancing overall program effectiveness. Possessing both a commitment to program evaluation and the capacity to engage in the analysis of program effectiveness significantly enhances the chances of attaining outside funding to support student services. Whether they be private foundations or federal agencies, funding organizations want to be assured that a careful assessment of funded programs will take place. Being able to write effectively about the history of program evaluation in your unit, and thereby to signal your capacity and commitment to similar evaluations of the new programs for which you are seeking funding, can convince agencies that their money would be well spent on your campus.

Moreover, program evaluation is part of a larger “culture of evidence” approach to decision making and quality assurance that regional accrediting agencies find attractive. Thus, engaging in careful program evaluation enhances the chances that campus accreditation or reaccreditation proceedings will be successful. Program evaluation is the student support services counterpart to learning outcomes assessment in the curricular realm. Both require that clear goals are annunciated and that there is careful assessment of whether those goals are achieved.

Assessing programs should become part of the cultural fabric of a unit and campus, altering the way every decision maker thinks about the work he or she does. Key program providers should be at the table when evaluation results are discussed, and should be encouraged to comment freely on possible explanations for the findings. If the evaluation results suggest that a program is not as successful as desired, all should be involved in brainstorming solutions. This is not a time for performance appraisal or criticism of program staff. Such an approach to program evaluation yields several additional benefits.

Familiarity with the practice of program evaluation in one area may lead staff to question program effectiveness in other areas. For example, having had some experience with program evaluation in several cocurricular activities, staff involved with math and writing placement exams began to question whether these exams were adequate placement mechanisms at UCR, and especially whether additional information, such as high school grades in math and English courses or SAT scores in these subject areas, could be used to bolster the information gleaned from placement exams alone.

An understanding of the methodological features of program evaluation—such as the need, when making causal inferences, to compare groups that are “similar in every relevant respect”—affects the way staff come to understand and interpret data. For example, when it was pointed out by system-wide administrators that UCR possesses a lower student participation rate in education abroad activities compared to other campuses in the University of California system, some staff wondered whether this reflected poor marketing and staff ineffectiveness (as some alleged) or, rather, the fact that UCR students disproportionately come from low-income families and find it difficult to afford the added expense of an educational experience abroad.

Evaluating the effectiveness of cocurricular programs and academic support services is the next frontier in the effort to ensure educational quality and student success in higher education. Institutions ahead of the curve in this regard can benefit enormously.


David Fairris is professor of economics at the University of California-Riverside. He served as vice provost of education from 2007-11.

Previous Issues