An empirical analysis of factors affecting honors program completion rates.
Savage, Hallie ; Raehsler, Rod D. ; Fiedor, Joseph 等
INTRODUCTION
One of the most important issues in any educational environment is
identifying factors that promote academic success. A plethora of
research on such factors exists across most academic fields, involving a
wide range of student demographics, and the definition of student
success varies across the range of studies published. While much of the
research is devoted to looking at student performance in particular
courses and concentrates on examination scores and grades, many authors
have directed their attention to student success in the context of an
entire academic program; student success in this context usually centers
on program completion or graduation and student retention. The analysis
in this paper follows the emphasis of McKay on the importance of
conducting repeated research on student completion of honors programs at
different universities for different time periods. This paper uses a
probit regression analysis as well as the logit regression analysis
employed by McKay in order to determine predictors of student success in
the honors program at a small, public university, thus attempting to
answer McKay's call for a greater understanding of honors students
and factors influencing their success. The use of two empirical models
on completion data, employing different base distributions, provides
more robust statistical estimates than observed in similar studies
PREVIOUS LITERATURE
The early years of our research was concurrent with the work of
McKay, who studied the 2002-2005 entering honors classes at the
University of North Florida and published his work in 2009. The
development of our methodology was dependent on important previous work
in this area. Yang and Raehsler, in an article published in 2005,
described their use of an ordered probit model to show that the total
score on the Scholastic Aptitude Test (SAT), the cumulative grade point
average, and the choice of academic major significantly influenced
expected grades in an intermediate microeconomics course. The use of a
probit model, which differs in only underlying probability
distributions, is mimicked in this paper, which also uses logit model
analysis.
Research in program effectiveness rather than success in a
particular class varies across many different student cohorts. In a 2007
qualitative analysis of field research, for instance, Creighton outlines
important factors influencing graduation rates among minority student
populations. The study concentrates equally on institutional factors,
personal factors, environmental factors, individual student attributes,
and socio-cultural characteristics to explain differences in graduation
rates for underrepresented student populations. The basic issues in that
study are complex, and unfortunately no clear empirical evidence is
provided. Zhang et al. do provide an earlier (2002) empirical analysis
of student success in engineering programs across nine universities for
the years 1987 through 2000. That paper boasted a sample of 39,277
students and used a multiple logistic regression model to show that high
school grade point average and mathematics scores on the Scholastic
Aptitude Test (SAT) were positively correlated with an increase in
graduation and retention rates among engineering students.
Interestingly, verbal scores on the SAT examination were negatively
correlated with graduation and retention rates among engineering
students in the longitudinal study. In 2007, Geiser and Santelices
described expanding this work in a study of the relevance of high school
GPAs to college GPAs among 80,000 students admitted to the University of
California system. Using a linear regression model, they found that high
school GPAs were consistently the strongest predictors of college grades
across all academic disciplines and campuses in the study. They
determined that this predictive power actually became stronger after the
freshman year.
McKay used a logit regression model to study retention in the
honors program at the University of North Florida. Using a sample of
1017 students in the honors program from 2002 through 2005, he found
that high school GPA was the best predictor of program completion. The
study also found that gender was a strong predictor of student success
in the honors program while SAT scores did not display a significant
relationship with program completion. Our study builds on this work by
employing a different model and incorporating the academic discipline of
each student in the analysis. We also divide the SAT score between math
and verbal scores similar to that observed in the 2002 Zhang et al.
study.
In more recent work published in 2013, Keller and Lacey studied
student participation levels in the large honors program at Colorado
State University and found that female students and students majoring in
the liberal arts and natural sciences were more active in the program.
Male students, along with business and engineering majors, tended to be
less active in the program as measured by an index developed by the
authors. Also in 2013, Goodstein and Szarek discussed program completion
from an alternative view; rather than empirically studying factors
influencing program completion, the authors outlined common reasons why
students might not complete an honors program, especially the need for
extra time to study for professional school entrance examinations, an
inability to find a workable thesis topic, and additional coursework
required after adding another academic major. This area of inquiry is
interesting as it provides a possible future line of empirical research.
DATA
Data for this study came from Clarion University, a public
university in western Pennsylvania. Enrollment at Clarion University is
approximately 6,000, and the school is part of the Pennsylvania System
of Higher Education, a collection of fourteen universities that
collectively make up the largest higher education provider in the state
of Pennsylvania (106,000 students across all campuses). The sample of
449 individuals used for this study includes students who were admitted
to the Clarion University Honors Program for the years 2003 through
2013. Data for each student includes whether or not the student
successfully completed the Honors Program (COMP), the college
affiliation of his or her academic major (using three dummy variables
named ARTSC for the College of Arts and Sciences, BUS for the College of
Business Administration, and EDUC for the College of Education), the
student's gender (GENDER), high school grade point average (HSGPA),
and both verbal and math SAT scores (VSAT and MSAT). The size of the
entering class (SIZE) is also included in the analysis. Dummy variables
included in the model all take values of either 0 or 1 and are meant to
distinguish between different qualitative characteristics of students in
the sample. The dependent variable in this analysis, COMP, takes on a
value of 1 if the student successfully completed the Clarion University
Honors Program and 0 otherwise. Likewise, GENDER is assigned a value of
1 when the student is male and a 0 when the student is female. ARTSC is
set at 1 if the student is in the College of Arts and Sciences (0
otherwise), BUS is 1 if the student is in the College of Business
Administration (0 otherwise), and EDUC is 1 if the student is in the
College of Education (0 otherwise).
Given differences in requirements and grading practices across
academic disciplines, there is some theoretical support for including
dummy variables on academic major (or the college of the academic major)
in the analysis. McKay found gender and high school GPA to be
significant predictors of success in honors program retention using a
slightly different empirical model. As a consequence, we include these
variables in our analysis. Table 1 below provides descriptive statistics
for each variable in the sample.
Descriptive statistics results show that a little over 66% of
students in the sample completed the Clarion University Honors Program
during the sample period. Approximately 32% in the sample are males.
Academic major by college affiliation of individuals in the sample
breaks down to approximately 43% in the College of Arts and Sciences,
13% in the College of Business Administration, and 44% in the College of
Education. Students in the sample have an average high school GPA of
3.82 with an average SAT score (combining math and verbal scores) of
1240. Since students in this sample are part of a university honors
program, average grades and test scores far exceed similar statistics
for the general university student population. The SIZE variable,
measuring the number of students in each entering class, averages nearly
42 students per year. With an average 66% completion rate, one would
anticipate seeing around 28 students complete the honors program each
year.
The measure of skewness provides information on how each variable
is distributed around the mean and introduces the first statistical test
in this analysis. A value of zero indicates a perfectly symmetric
distribution; the normal distribution is the classic example. A
significantly negative skewness value suggests a long tail (or
relatively few observations) in the lower part of the distribution. A
significantly positive skewness measure suggests the reverse. Critical
analysis of skewness statistics displayed in Table 1 will be conducted
at the beginning of the results section below.
RESULTS AND DISCUSSION
Before looking at the empirical estimates of the logit and probit
models described in the appendix, it is worthwhile to look back at basic
statistics involving the distribution for the data set utilized.
Measures of skewness do not appear to provide surprising results in
Table 1. Entering high school GPA is highly skewed to the left
indicating that very few students admitted have low GPAs. In addition to
summarizing descriptive statistics for variables used in this study, we
also need to look at how the measures are correlated with each other to
obtain a sense of what variables to consider in the final empirical
model. Table 2 displays a correlation matrix of all variables collected
in the sample. A strong positive correlation exists between the high
school GPA and the completion rate for the honors program. A weaker but
statistically significant positive relation exists between the business
student dummy variable and honors program completion. As a consequence,
students with higher high school grades and who chose to be business
majors have a higher probability of completing the honors program. No
other variables are significantly correlated with completion rate.
Other values in the correlation matrix are interesting from a pure
discussion standpoint and might be worthy of more detailed analysis in
the future. For example, some gender differences occur regarding SAT
performance and choice of academic major in this sample of honors
students. Male students in the sample seem significantly more likely to
score higher on the math portion of the SAT given the positive
correlation between GENDER and MSAT. Some slight negative correlation
between GENDER and VSAT suggests that female students are more likely to
score higher on the verbal section of the SAT, but this relationship is
not statistically significant. Likewise, male students are more likely
to choose an academic major in the College of Arts and Sciences
(positive correlation between GENDER and ARTSC) while females are more
likely to choose a major in education among students in this select
sample (negative correlation between GENDER and EDUC). High school GPA
has a significant positive correlation with scores in the math section
of the SAT in this sample but not with verbal scores; this is
interesting given that the correlation matrix establishes a positive
correlation between HSGPA and COMP and between HSGPA and MSAT but not
between COMP and MSAT, seeming to indicate that a high GPA in high
school among students qualifying for the honors program helps predict
completion in the program along with higher scores on the math section
of the SAT. High scores on the math section of the SAT alone, however,
do not help predict completion rates in the honors program, suggesting
some inherent measure in high school grades that is not captured in the
math portion of the SAT. Some would argue that high school grades
incorporate a measure of effort that would positively link to completion
rates for any academic program. A specific empirical determination of
this linkage remains for future study.
Figures 1 and 2 provide an illustrative example of how completion
rates differ across academic majors and genders in the sample used for
this analysis. Figure 1 clearly indicates that the average completion
rates among students with majors in the College of Business
Administration are substantially higher than honors program completion
rates for students in other colleges. Figure 2 illustrates that
completion rates are somewhat higher among female students in the honors
program than among male students in the program. While results across
gender are similar to that seen in McKay, the results concerning
academic majors are substantially different than those observed in
Keller and Lacy.
A primary drawback to relying entirely on correlation data is that
the precise relation between program completion rate (COMP) and each of
the explanatory variables is hidden. For example, it is difficult to
predict how a change in the high school GPA will influence the
probability of honors program completion without a more detailed
empirical model. Clearly, the explanatory variables are linked, and
simple correlation will not typically provide a complete story of how
COMP is influenced by other measures in the sample. Also problematic is
a study of correlation values when the primary variable of interest is
qualitative (COMP takes on a value of either 0 or 1).
The virtues of the logit and probit models have been described
above, and in Table 3 we present maximum likelihood estimates of the
latent regression in the most relevant logit and probit model
specifications. Logit model 1 includes all the variables in the
specification while logit model 2 includes only the most statistically
significant explanatory variables (using a 0.10 significance level as a
determinant). Likewise, probit model 1 and probit model 2 use the same
model specifications for the probit model estimation procedure. In both
general specifications, high school GPA is the most important predictor
of honors program completion rates while the business college dummy
variable (BUS) is significant at the 0.10 level. No other explanatory
variables were found to be statistically significant.
From a statistical standpoint, results of the latent regression
estimates fit the data well when observing the likelihood-ratio (LR)
statistic. All p-values for LR are well below 0.01, indicating that
variations in the program completion variable (COMP) are substantially
explained by variations in the explanatory variable chosen in the
analysis. As stated above, high school GPA and the business college
dummy variables are most significant. The positive sign on the
coefficient for HSGPA indicates that a higher high school GPA predicts a
higher probability of honors program completion. Likewise, the positive
sign of BUS suggests that students with majors in the College of
Business Administration are more likely to complete the program than
students with majors in other colleges. While SAT scores are used to
screen students wishing to enter the honors program, they do not help
predict completion rate probabilities in the program. Gender is also not
a significant predictor of program completion.
For more precision, marginal effects of each variable on COMP using
the logit and probit model estimates need to be calculated. Estimates
above for the latent regression equations do not incorporate the
non-linear nature of probability. Using the cumulative exponential and
normal distributions, marginal effects are calculated for each of the
four specifications presented in Table 3. Empirical results matching the
marginal effects on program completion (COMP) with each change in
explanatory variable are presented in Table 4.
The variables that matter the most in Table 4 are high school GPA
and the business school dummy variable, so the logit model 2 and probit
model 2 are the primary specifications to consider. Results are provided
for changes in the high school GPA, including an increase of 0.2, an
increase of 0.5, and an increase of 1.0. Results for the logit model
specification show that an increase of HSGPA by 0.2 leads to an increase
in COMP of 0.067, or a 6.7% increase in the probability of program
completion. The probit model specification provides a similar estimate
of a 6.8 percent increase for the same grade point interval. When the
high school GPA is 0.5 higher, the program completion rates increase by
14.9% and 15.4% when using the logit and probit model estimates
respectively. A full increase of 1.0 points in the HSGPA variable
increases the probability of completion by 24.0% and 25.2% for logit and
probit model specifications respectively. Clearly a student's high
school GPA can effectively predict completion outcomes in the honors
program.
For the business college dummy variable (BUS), a value of 0 means
that the student is not in the business college while a value of 1 means
the student does have an academic major within the business college. The
0.111 estimate using logit model 2 means that, all else being equal, a
student deciding to select a major in the business college typically
displays an 11.1% higher completion rate than students with majors
outside the college. The estimate using probit model 2 provides an
identical 11.1 percent increase. This shows that the academic major
selection with respect to the College of Business Administration does
make a difference on predicted completion rates.
Remaining variables in the analysis are displayed in logit model 1
and probit model 1. Since results are nearly identical, a cursory
analysis can be made by just looking at the probit model results. Female
students, for example, have a completion rate that is approximately
three percent higher than males in the sample. An increase in verbal SAT
score by 100 predicts a 0.1% higher completion rate while a 100-point
increase in the math SAT score predicts a 0.9% increase in completion.
Both results are relatively small when compared to high school GPA
results. Higher class size by an increment of ten and the choice to
select an academic major in the College of Arts and Sciences lead to
decreased predicted completion rates by 1.5% and 1.4% respectively.
Again, these results are not statistically significant.
CONCLUSION
This study serves as an important addition to the existing
literature in that it provides some empirical support for previous work
with some interesting variations. As McKay observed, we find that the
high school GPA for students in the honors program emerges as the most
significant predictor of program completion. The fact that SAT scores do
not significantly help predict expected completion rates suggests that
high school GPAs may include measures beyond the basic knowledge
indicated in standardized tests. A paradox is generated in that both
high school GPAs and SAT scores are used to determine whether entering
students qualify for the Clarion University Honors Program. One
explanation is that, while SAT scores provide a basis for determining
academic potential, high school GPAs include an individual's
overall work ethic and effort. We read of students who underperform in
high school yet score high on standardized tests. These types of
students, as predicted by this analysis, would not be as likely to
complete the honors program using the same level of effort in college.
An empirical establishment of what GPA measures would be an interesting
extension of this analysis. One possible policy implication of this
result is that, if a program or college in honors wishes to increase
completion or participation rate, a director or dean should target for
special scrutiny those individuals coming in with below-average high
school GPAs as they are more likely to drop the program.
Results in this analysis showing that business college students are
more likely than students in the arts and sciences or in education to
complete the honors program are different from previous studies. The
overall discussion in Goodstein and Szarek may support these findings.
Most students from the Clarion University College of Arts and Sciences
are natural science majors, typically in biology and physics. Most of
these students study for professional (especially medical) or graduate
school exams, and the prospect of working on a thesis at the same time
can be daunting. Likewise, students in our college of education are busy
with student teaching, which takes time away from the senior project.
Business students do not consistently face these obstacles, so they may
remain in the program, but additional work needs to be done to see if
this is the case. Future analysis will attempt to determine how
completion rates are influenced by student involvement and whether
differences exist among an expanded demographic of students enrolled in
the program.
APPENDIX
Because of the discrete nature of the dependent variable in this
study (COMP takes on a value of either 0 or 1), ordinary least squares
regression would be an inappropriate model. The two most common models
utilized when the dependent variable is discrete and binary are the
logit and the probit models. The logit model utilizes the logistic or
exponential function and is the model of choice in McKay (2009). The
probit model utilizes the standard normal distribution in developing
probabilities and is the additional method utilized in this analysis.
The underlying standard normal distribution allows for a more uniform
probability of obtaining a 0 or a 1 when compared to the exponential
function, however, both models tend to provide similar results for
relatively small changes in the independent variables. It is beneficial
to report results from both the logit and probit estimation procedures
in order to observe any possible variation in results. If the empirical
results show a great deal of variation, the model specification would be
placed in question as it is dependent on the assumed distribution of the
dependent variable. On the other hand, if the marginal impacts of
changes in each variable on the probability of program completion among
honors students are consistent, a robust quantitative estimate is
verified.
The standard binary logit or probit model is widely used for this
dependent variable type and is built around a latent regression of the
following form:
(1) [??] = x'[beta] + e
where x and [beta] are standard variable and parameter matrices,
and e is a vector matrix of normally distributed error terms. The
initial model considered for the latent regression can be formulated as:
(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
The dummy variable EDUC is not included in the latent regression
model in order to avoid the dummy variable trap. For convenience, rather
that writing out the entire latent regression formula, the equation
above can also be written as:
(3) [y.sub.i] = [beta]'x
In both equation (2) and equation (3) the variable [y.sub.i] is the
COMP variable equal to 0 if student i did not finish the Clarion
University Honors Program and 1 if that student did successfully
complete the program. For the probit model, the probability that y=1 can
be calculated as
(4) [[integral].sup.[beta]'.sub.-[infinity]] [phi] dt = [phi]
([beta]'x)
where [phi] is the standard normal distribution function and [phi]
is the cumulative standard normal distribution function. For the logit
function, the same probability would be
(5) [e.sup.[beta]'x]/(1 + [e.sup.[beta]'x])
for each value of x. With a fair amount of calculation, the
coefficients on a binary logit or probit model can be easily
interpreted. Rather than treating the slope parameters in a linear
fashion, the marginal effect of each explanatory variable can be
calculated using the cumulative standard normal distribution in the case
of the probit model or the cumulative exponential function for logit
analysis. Using the notation above, the marginal effect of variable
[x.sub.i] on the dependent variable (y or COMP in this analysis), can be
calculated using the following equation for the probit analysis:
(6) [partial derivative]E(ylx)/[partial derivative][x.sub.i] =
[dF([beta]'x/d([beta]'x) x [[beta].sub.i]3 = [DELTA][PHI]
([beta]'x) [[beta].sub.i]
where [DELTA] represents the change in the cumulative logistic
distribution when [x.sub.i] is changed. Analysis of the marginal effect
of each explanatory variable provides a better empirical description of
how each variable influences the probability of a student completing the
Clarion University Honors Program given the value of all other
explanatory variables. Parameters for the probit model are attained
using standard maximum likelihood estimation. Simply put, the marginal
effects of any variable in a probit model are determined by calculating
the change observed in the cumulative normal distribution when the
variable in question incrementally changes.
Likewise, marginal values for the logit model are obtained from the
following:
(7) [partial derivative]E(y | x)/[partial derivative][x.sub.i] =
[dF([beta]'x/d([beta]'x) x [[beta].sub.i] = [DELTA]a/a +
[e.sup.-[summation][beta]x]))
Maximum likelihood estimates are calculated in a similar fashion
for the logit model. Comparative statics for each variable can be done
to determine how each measure affects the probability students will
complete the Honors Program. Again, it is important to use both logit
and probit analyses since each assumes a different base distribution in
calculating probabilities. As with the probit model, the marginal
changes are calculated by looking at changes in the cumulative
exponential function due to changes in the variable of interest.
REFERENCES
Creighton, L.M. (2007). Factors affecting the graduation rates of
university students from underrepresented populations. International
Electronic Journal for Leadership in Learning, 11(Article 7), 1-12.
Geiser, S. & Santelices, M.V. (2007). Validity of high school
grades in predicting student success beyond the freshman year:
High-school record vs. standardized tests as indicators of four-year
college outcomes. University of California-Berkeley Center for Studies
in Higher Education Research and Occasional Paper Series, CSHE.6.07.
Goodstein, L. and Szarek, P. (2013). They come but do they finish?
Journal of the National Collegiate Honors Council, Fall/Winter, 14 (2),
85-104.
Keller, R.R. and Lacy, M.G. (2013). Propensity score analysis of an
honors program's contribution. Journal of the National Collegiate
Honors Council, Fall/Winter, 14 (2), 73-84.
McKay, K. (2009). Predicting retention in honors programs. Journal
of the National Collegiate Honors Council, Spring/Summer, 10 (1), 77-87.
Yang, C.W. & Raehsler, R.D. (2005). An economic analysis on
intermediate microeconomics: An ordered probit model. Journal for
Economic Educators, 5 (3), 1-11.
Zhang, G., Anderson, T., Ohland, M., Carter, R., & Thorndyke,
B. (2002). Identifying factors influencing engineering student
graduation and retention: A longitudinal and cross-institutional study.
Proceedings of the Annual Conference and Exposition for the American
Society for Engineering Education.
HALLIE SAVAGE
Clarion University of Pennsylvania
and the National Collegiate Honor Council
ROD D. RAEHSLER
Clarion University of Pennsylvania
JOSEPH FIEDOR
Indiana University of Pennsylvania
The author may be contacted at
[email protected].
Table 1: Summary of Descriptive Statistics
Variable Mean Standard Minimum Maximum Skewness
Deviation
COMP 0.66 0.47 0 1 NA
SIZE 41.60 11.65 19 53 -0.73 ***
VSAT 620 55.95 480 800 0.06
MSAT 621 53.94 490 790 0.06
HSGPA 3.82 0.22 2.33 4.00 -2.46 ***
GENDER 0.32 0.47 0 1 NA
ARTSC 0.45 0.50 0 1 NA
BUS 0.13 0.34 0 1 NA
EDUC 0.42 0.49 0 1 NA
* significant at the 0.10 level
** significant at the 0.05 level
*** significant at the 0.01 level
Table 2: correlation Matrix of Variables
COMP SIZE VSAT MSAT HSGPA
COMP 1
SIZE -.010 1
VSAT -.006 -.019 1
MSAT .050 .063 .018 1
HSGPA .188 *** .120 *** .042 178 *** 1
GENDER -.053 .031 -.052 283 *** -146 ***
ARTSC -.051 .025 .151 *** 121 *** -.037
BUS .082* .019 - 174 *** .030 .021
EDUC -.004 -.038 -.033 -143 *** .023
GENDER ARTSC BUS EDUC
COMP
SIZE
VSAT
MSAT
HSGPA
GENDER 1
ARTSC 173 *** 1
BUS .056 -.350 *** 1
EDUC -213 *** -.768 *** - 332*** 1
* significant at the 0.10 level
** significant at the 0.05 level
*** significant at the 0.01 level
Table 3: Logit and Probit Model Equation Estimates
Variable Logit logit Probit Probit
or Measure Model 1 Model 2 Model 1 Model 2
CONSTANT -5.82 -5.58 -3.53 -3.38
(.006) (.007) (.006) (.000)
GENDER -0.14 -0.08
(.567) (.569)
SIZE (x [10.sup.2]) -0.71 -0.41
(.427) (.456)
VSAT (x [10.sup.5]) 478
(.977) (.967)
MSAT (x [10.sup.3]) 1.13 0.70
(.582) (.572)
HSGPA 1.58 1.61 0.95 0.98
(.000) (.000) (.000) (.000)
ARTSC -0.06 -0.04
(.783) (.774)
BUS 0.53 0.54 0.32 0.33
(.133) (.099) (.130) (.091)
LR STATISTIC 22.82 19.67 22.86 21.66
(.002) (.000) (.002) (.000)
p-values are in parentheses
Table 4: Marginal Probability Effects on Completion
Probability for Logit and Probit Models
Marginal change logit logit Probit Probit
Model 1 Model 2 Model 1 Model 2
GENDER 0 to 1 -0.030 -0.030
SIZE increase by 10 -0.016 -0.015
VSAT increase by 50 +0.000 +0.001
VSAT increase by 100 +0.001 +0.001
MSAT increase by 50 +0.009 +0.009
MSAT increase by 100 +0.024 +0.025
HSGPA increase by 0.2 +0.065 +0.067 +0.066 +0.068
HSGPA increase by 0.5 +0.147 +0.149 +0.150 +0.154
HSGPA increase by 1.0 +0.237 +0.240 +0.248 +0.252
ARTSC 0 to 1 -0.014 -0.014
BUS 0 to 1 +0.109 +0.111 +0.108 +0.111
Figure 1: completion by Academic Major
Arts & Science 63.682
Business Academic Major 76.271
Education 66.138
Note: Table made from bar graph.
Figure 2: Completion by Gender
Full Sample 66.370
Males Data sample 62.759
Females 68.092
Note: Table made from bar graph.