A Comparative Study of Machine Learning Techniques for College Student Success Prediction

Authors

  • Zaiyong Tang Salem State University
  • Anurag Jain Salem State University
  • Fernando E. Colina Salem State University

DOI:

https://doi.org/10.33423/jhetp.v24i1.6764

Keywords:

higher education, student success, prediction, model comparison, logistic regression, random forest

Abstract

The study aims to compare the performance of various machine learning models for student persistence prediction. The research starts with a historical review of student retention studies and the evolution of predictive models in the field. It highlights the importance of predicting student persistence for educational institutions and individuals. It then describes a dataset from ResearchGate, consisting of anonymized undergraduate student data collected between 2008 and 2018, with 37 features and 4,424 records. Ten machine learning algorithms are considered, with two popular machine learning algorithms, Logistic Regression, and Random Forest classification, being compared in more detail for their performance in predicting student persistence. Evaluation metrics such as prediction accuracy, precision, recall, and F1-score are used. Results show that the Random Forest model outperforms Logistic Regression in predicting student outcomes, particularly when using the synthetic minority oversampling technique (SMOTE) to address the class imbalance. Overall, this study contributes to student retention research and provides insights for developing targeted support measures to enhance student success in higher education.

Downloads

Published

2024-01-29

How to Cite

Tang, Z., Jain, A., & Colina, F. E. (2024). A Comparative Study of Machine Learning Techniques for College Student Success Prediction. Journal of Higher Education Theory and Practice, 24(1). https://doi.org/10.33423/jhetp.v24i1.6764

Issue

Section

Articles