Reproducing Predictive Learning Analytics in CS1

Toward Generalizable and Explainable Models for Enhancing Student Retention

Authors

DOI:

https://doi.org/10.18608/jla.2024.7979

Keywords:

predictive learning analytics, CS1, retention, privacy, self-reported data, trace data, research paper

Abstract

Predictive learning analytics has been widely explored in educational research to improve student retention and academic success in an introductory programming course in computer science (CS1). General-purpose and interpretable dropout predictions still pose a challenge. Our study aims to reproduce and extend the data analysis of a privacy-first student pass–fail prediction approach proposed by Van Petegem and colleagues (2022) in a different CS1 course. Using student submission and self-report data, we investigated the reproducibility of the original approach, the effect of adding self-reports to the model, and the interpretability of the model features. The results showed that the original approach for student dropout prediction could be successfully reproduced in a different course context and that adding self-report data to the prediction model improved accuracy for the first four weeks. We also identified relevant features associated with dropout in the CS1 course, such as timely submission of tasks and iterative problem solving. When analyzing student behaviour, submission data and self-report data were found to complement each other. The results highlight the importance of transparency and generalizability in learning analytics and the need for future research to identify other factors beyond self-reported aptitude measures and student behaviour that can enhance dropout prediction.

Author Biography

Denis Zhidkikh, University of Jyväskylä

Faculty of Information Technology

References

Abdollahi, B., & Nasraoui, O. (2018). Transparency in fair machine learning: The case of explainable recommender systems. In J. Zhou & F. Chen (Eds.), Human and machine learning: Visible, explainable, trustworthy and transparent (pp. 21–35). Springer International Publishing. https://doi.org/10.1007/978-3-319-90403-0_2

Astrachan, O., & Briggs, A. (2012). The CS principles project. ACM Inroads, 3(2), 38–42. https://doi.org/10.1145/2189835.2189849

Becker, B. A., & Quille, K. (2019). 50 years of CS1 at SIGCSE: A review of the evolution of introductory programming education research. In Proceedings of the 50th ACM technical symposium on computer science education (SIGCSE 2019), 27 February–2 March 2019, Minneapolis, Minnesota, USA (pp. 338–344). ACM. https://doi.org/10.1145/3287324.3287432

Bergin, S., & Reilly, R. (2005). Programming: Factors that influence success. In Proceedings of the 36th SIGCSE technical symposium on computer science education (SIGCSE 2005), 23–27 February 2005, St. Louis, Missouri, USA (pp. 411–415). ACM. https://doi.org/10.1145/1047344.1047480

Biderman, S., & Raff, E. (2022). Fooling MOSS detection with pretrained language models. In Proceedings of the 31st ACM international conference on information & knowledge management (CIKM 2022), 17–21 October 2022, Atlanta, Georgia, USA (pp. 2933–2943). ACM. https://doi.org/10.1145/3511808.3557079

Bosch, T. (Ed.). (2022). “You are not expected to understand this”: How 26 lines of code changed the world. Princeton University Press. https://www.degruyter.com/document/doi/10.1515/9780691230818/html

Bouvier, D., Lovellette, E., & Matta, J. (2021). Overnight feedback reduces late submissions on programming projects in CS1. In Proceedings of the 23rd australasian computing education conference (ACE 2021), 2–4 February 2021, online (pp. 176–180). ACM. https://doi.org/10.1145/3441636.3442319

Breaux, T., & Moritz, J. (2021). The 2021 software developer shortage is coming. Communications of the ACM, 64(7), 39–41. https://doi.org/10.1145/3440753

Castro-Wunsch, K., Ahadi, A., & Petersen, A. (2017). Evaluating neural networks as a method for identifying students in need of assistance. In Proceedings of the 2017 ACM SIGCSE technical symposium on computer science education (SIGCSE 2017), 8–11 March 2017, Seattle, Washington, USA (pp. 111–116). ACM. https://doi.org/10.1145/3017680.3017792

Choi, H., Winne, P. H., Brooks, C., Li, W., & Shedden, K. (2023). Logs or self-reports? Misalignment between behavioral trace data and surveys when modeling learner achievement goal orientation. In Proceedings of the 13th international learning analytics and knowledge conference (LAK 2023), 13–17 March 2023, Arlington, Texas, USA (pp. 11–21). ACM. https://doi.org/10.1145/3576050.3576052

Clow, D. (2012). The learning analytics cycle: Closing the loop effectively. In Proceedings of the second international conference on learning analytics and knowledge (LAK 2012), 29 April–2 May 2012, Vancouver, British Columbia, Canada (pp. 134–138). ACM. https://doi.org/10.1145/2330601.2330636

Dawson, S., Jovanovic, J., Gašević, D., & Pardo, A. (2017). From prediction to impact: Evaluation of a learning analytics retention program. In Proceedings of the seventh international conference on learning analytics and knowledge (LAK 2017), 13–17 March 2017, Vancouver, British Columbia, Canada (pp. 474–478). ACM. https://doi.org/10.1145/3027385.3027405

Deho, O. B., Zhan, C., Li, J., Liu, J., Liu, L., & Duy Le, T. (2022). How do the existing fairness metrics and unfairness mitigation algorithms contribute to ethical learning analytics? British Journal of Educational Technology: Journal of the Council for Educational Technology, 53(4), 822–843. https://doi.org/10.1111/bjet.13217

Denzin, N. K. (2009). The research act: A theoretical introduction to sociological methods. Routledge. https://doi.org/10.4324/9781315134543

Duncan, T. G., & McKeachie, W. J. (2005). The making of the Motivated Strategies for Learning Questionnaire. Educational Psychologist, 40(2), 117–128. https://doi.org/10.1207/s15326985ep4002_6

Ellis, R. A., Han, F., & Pardo, A. (2017). Improving learning analytics—Combining observational and self-report data on student learning. Journal of Educational Technology & Society, 20(3), 158–169.

Force, C. T. (2020). Computing curricula 2020 CC2020: Paradigms for global computing education. ACM, IEEE. https://doi.org/10.1145/3467967

Foster, E., & Siddle, R. (2020). The effectiveness of learning analytics for identifying at-risk students in higher education. Assessment & Evaluation in Higher Education, 45(6), 842–854. https://doi.org/10.1080/02602938.2019.1682118

Gašević, D., Jovanović, J., Pardo, A., & Dawson, S. (2017). Detecting learning strategies with analytics: Links with self-reported measures and academic performance. Journal of Learning Analytics, 4(2, 2), 113–128. https://doi.org/10.18608/jla.2017.42.10

Gašević, D., Kovanović, V., & Joksimović, S. (2017). Piecing the learning analytics puzzle: A consolidated model of a field of research and practice. Learning: Research and Practice, 3(1), 63–78. https://doi.org/10.1080/23735082.2017.1286142

Gonyea, R. M. (2005). Self-reported data in institutional research: Review and recommendations. New Directions for Institutional Research, 2005(127), 73–89. https://doi.org/10.1002/ir.156

Hawlitschek, A., Köppen, V., Dietrich, A., & Zug, S. (2019). Drop-out in programming courses—Prediction and prevention. Journal of Applied Research in Higher Education, 12(1), 124–136. https://doi.org/10.1108/JARHE-02-2019-0035

Heilala, V. (2022). Learning analytics with learning and analytics: Advancing student agency analytics [Phdthesis, University of Jyvaskyla / JYU Dissertations 512]. https://jyx.jyu.fi/handle/123456789/80877

Heilala, V., Jääskelä, P., Kärkkäinen, T., & Saarela, M. (2020). Understanding the study experiences of students in low agency profile: Towards a smart education approach. In A. E. Moussati, K. Kpalma, M. G. Belkasmi, M. Saber, & S. Guégan (Eds.), Advances in smart technologies: Applications and case studies (pp. 498–508). Springer International Publishing. https://doi.org/10.1007/978-3-030-53187-4_54

Heilala, V., Jääskelä, P., Saarela, M., Kuula, A.-S., Eskola, A., & Kärkkäinen, T. (2022). “Sitting at the stern and holding the rudder”: Teachers’ reflections on action in higher education based on student agency analytics. In L. Chechurin (Ed.), Digital teaching and learning in higher education: Developing and disseminating skills for blended learning (pp. 71–91). Springer International Publishing. https://doi.org/10.1007/978-3-031-00801-6_4

Herodotou, C., Rienties, B., Boroowa, A., Zdrahal, Z., & Hlosta, M. (2019). A large-scale implementation of predictive learning analytics in higher education: The teachers’ role and perspective. Educational Technology Research and Development: ETR & D, 67(5), 1273–1306. https://doi.org/10.1007/s11423-019-09685-0

Ifenthaler, D., Schumacher, C., & Kuzilek, J. (2022). Investigating students’ use of self‐assessments in higher education using learning analytics. Journal of Computer Assisted Learning, 39(1), 255–268. https://doi.org/10.1111/jcal.12744

Ifenthaler, D., & Yau, J. Y.-K. (2020). Reflections on different learning analytics indicators for supporting study success. International Journal of Learning Analytics and Artificial Intelligence for Education (iJAI), 2(2), 4–23. https://doi.org/10.3991/ijai.v2i2.15639

Ihantola, P., Vihavainen, A., Ahadi, A., Butler, M., Börstler, J., Edwards, S. H., Isohanni, E., Korhonen, A., Petersen, A., Rivers, K., Rubio, M. Á., Sheard, J., Skupas, B., Spacco, J., Szabo, C., & Toll, D. (2015). Educational data mining and learning analytics in programming: Literature review and case studies. In Proceedings of the 2015 innovation and technology in computer science education conference on working group reports (ITiCSE 2015), 4–8 July 2015, Vilnius, Lithuania (pp. 41–63). ACM. https://doi.org/10.1145/2858796.2858798

Isomöttönen, V., Lakanen, A.-J., & Lappalainen, V. (2019). Less is more! Preliminary evaluation of multi-functional document-based online learning environment. In Proceedings of the 2019 IEEE frontiers in education conference (FIE 2019), 16–19 October 2019, Cincinnati, Ohio, USA (pp. 1–5). IEEE. https://doi.org/10.1109/FIE43999.2019.9028353

Jansen, R. S., van Leeuwen, A., Janssen, J., & Kester, L. (2020). A mixed method approach to studying self-regulated learning in MOOCs: Combining trace data with interviews. Frontline Learning Research, 8(2), 35–64. https://doi.org/10.14786/flr.v8i2.539

Jarvis, P. (2010). Assessing and evaluating. In Adult education and lifelong learning (4th ed., pp. 210–226). Routledge. https://doi.org/10.4324/9780203718100

Jayaprakash, S. M., Moody, E. W., Lauría, E. J., Regan, J. R., & Baron, J. D. (2014). Early alert of academically at-risk students: An open source analytics initiative. Journal of Learning Analytics, 1(1), 6–47. https://doi.org/10.18608/jla.2014.11.3

Kinnunen, P., & Malmi, L. (2006). Why students drop out CS1 course? In Proceedings of the second international workshop on computing education research (ICER 2006), 9–10 September 2006, Canterbury, UK (pp. 97–108). ACM. https://doi.org/10.1145/1151588.1151604

Kokoç, M., Akçapınar, G., & Hasnine, M. N. (2021). Unfolding students’ online assignment submission behavioral patterns using temporal learning analytics. Educational Technology & Society, 24(1), 223–235.

Kori, K., Pedaste, M., & Must, O. (2017). Integration of Estonian higher education information technology students and its effect on graduation-related self-efficacy. In P. Zaphiris & A. Ioannou (Eds.), Learning and collaboration technologies. Technology in education (pp. 435–448). Springer International Publishing. https://doi.org/10.1007/978-3-319-58515-4_33

Kovacic, Z. (2012). Predicting student success by mining enrolment data. Research in Higher Education, 15. http://hdl.handle.net/11072/1486

Lacave, C., Molina, A. I., & Cruz-Lemus, J. A. (2018). Learning Analytics to identify dropout factors of Computer Science studies through Bayesian networks. Behaviour & Information Technology, 37(10–11), 993–1007. https://doi.org/10.1080/0144929X.2018.1485053

Lakanen, A.-J., & Isomöttönen, V. (2023). CS1: Intrinsic motivation, self-efficacy, and effort. Informatics in Education, 22(4), 651–670. https://doi.org/10.15388/infedu.2023.26

Lepri, B., Oliver, N., Letouzé, E., Pentland, A., & Vinck, P. (2018). Fair, transparent, and accountable algorithmic decision-making processes. Philosophy & Technology, 31(4), 611–627. https://doi.org/10.1007/s13347-017-0279-x

Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2018). Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 18(185), 1–52.

Liz-Dominguez, M., Caeiro-Rodriguez, M., Llamas-Nistal, M., & Mikic-Fonte, F. (2019). Predictors and early warning systems in higher education—A systematic literature review. In M. Caeiro-Rodríguez, Á. Hernández-García, & P. Muñoz-Merino (Eds.), Learning analytics summer institute spain 2019: Learning analytics in higher education. https://ceur-ws.org/Vol-2415/paper08.pdf

Maertens, R., Van Petegem, C., Strijbol, N., Baeyens, T., Jacobs, A. C., Dawyndt, P., & Mesuere, B. (2022). Dolos: Language-agnostic plagiarism detection in source code. Journal of Computer Assisted Learning, 38(4), 1046–1061. https://doi.org/10.1111/jcal.12662

Mangaroska, K., Sharma, K., Gasevic, D., & Giannakos, M. (2020). Multimodal learning analytics to inform learning design: Lessons learned from computing education. Journal of Learning Analytics, 7(3), 79–97. https://doi.org/10.18608/jla.2020.73.7

Marco-Galindo, M.-J., Minguillón, J., García-Solórzano, D., & Sancho-Vinuesa, T. (2022). Why do CS1 students become repeaters? IEEE Revista Iberoamericana de Tecnologias Del Aprendizaje, 17(3), 245–253. https://doi.org/10.1109/RITA.2022.3191288

Markets of tomorrow report 2023: Turning technologies into new sources of global growth. (2023). https://www.weforum.org/reports/markets-of-tomorrow-report-2023-turning-technologies-into-new-sources-of-global-growth/

Mathrani, A., Susnjak, T., Ramaswami, G., & Barczak, A. (2021). Perspectives on the challenges of generalizability, transparency and ethics in predictive learning analytics. Computers and Education Open, 2, 100060. https://doi.org/10.1016/j.caeo.2021.100060

Niu, L. (2020). A review of the application of logistic regression in educational research: Common issues, implications, and suggestions. Educational Review, 72(1), 41–67. https://doi.org/10.1080/00131911.2018.1483892

Olney, T., Walker, S., Wood, C., & Clarke, A. (2021). Are we living in LA (P)LA land? Journal of Learning Analytics, 8(3), 45–59. https://doi.org/10.18608/jla.2021.7261

Omer, U., Tehseen, R., Farooq, M. S., & Abid, A. (2023). Learning analytics in programming courses: Review and implications. Education and Information Technologies, 28(9), 11221–11268. https://doi.org/10.1007/s10639-023-11611-0

Petersen, A., Craig, M., Campbell, J., & Tafliovich, A. (2016). Revisiting why students drop CS1. In Proceedings of the 16th koli calling international conference on computing education research, 24–27 November 2016, Koli, Finland (pp. 71–80). ACM. https://doi.org/10.1145/2999541.2999552

Porter, L., & Zingaro, D. (2014). Importance of early performance in CS1: Two conflicting assessment stories. In Proceedings of the 45th ACM technical symposium on computer science education (SIGCSE 2014), 5–8 March 2014, Atlanta, Georgia, USA (pp. 295–300). ACM. https://doi.org/10.1145/2538862.2538912

Postareff, L., Mattsson, M., Lindblom-Ylänne, S., & Hailikari, T. (2017). The complex relationship between emotions, approaches to learning, study success and study progress during the transition to university. Higher Education, 73(3), 441–457. https://doi.org/10.1007/s10734-016-0096-7

Prenkaj, B., Velardi, P., Stilo, G., Distante, D., & Faralli, S. (2020). A survey of machine learning approaches for student dropout prediction in online courses. ACM Computing Surveys, 53(3), 1–34. https://doi.org/10.1145/3388792

Quille, K., & Bergin, S. (2016). Programming: Further factors that influence success. PPIG 2016—27th Annual Workshop, 7–10 September 2016, Cambridge, UK. https://www.ppig.org/papers/2016-ppig-27th-quille/

Quille, K., & Bergin, S. (2018). Programming: Predicting student success early in CS1. A re-validation and replication study. In Proceedings of the 23rd annual ACM conference on innovation and technology in computer science education (ITiCSE 2018), 2–4 July 2018, Larnaca, Cyprus (pp. 15–20). ACM. https://doi.org/10.1145/3197091.3197101

Quille, K., & Bergin, S. (2019). CS1: How will they do? How can we help? A decade of research and practice. Computer Science Education, 29(2–3), 254–282. https://doi.org/10.1080/08993408.2019.1612679

Roscher, R., Bohn, B., Duarte, M. F., & Garcke, J. (2020). Explainable machine learning for scientific insights and discoveries. IEEE Access : Practical Innovations, Open Solutions, 8, 42200–42216. https://doi.org/10.1109/ACCESS.2020.2976199

Roth, A., Ogrin, S., & Schmitz, B. (2016). Assessing self-regulated learning in higher education: A systematic literature review of self-report instruments. Educational Assessment, Evaluation and Accountability, 28(3), 225–250. https://doi.org/10.1007/s11092-015-9229-2

Rountree, N., Rountree, J., & Robins, A. (2002). Predictors of success and failure in a CS1 course. ACM SIGCSE Bulletin, 34(4), 121–124. https://doi.org/10.1145/820127.820182

Rousseeuw, P. J., & van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85(411), 633–639. https://doi.org/10.1080/01621459.1990.10474920

Saarela, M., Heilala, V., Jääskelä, P., Rantakaulio, A., & Kärkkäinen, T. (2021). Explainable student agency analytics. IEEE Access : Practical Innovations, Open Solutions, 9, 137444–137459. https://doi.org/10.1109/ACCESS.2021.3116664

Sghir, N., Adadi, A., & Lahmer, M. (2022). Recent advances in Predictive Learning Analytics: A decade systematic review (2012–2022). Education and Information Technologies, 28, 8299–8333. https://doi.org/10.1007/s10639-022-11536-0

Tek, F. B., Benli, K. S., & Deveci, E. (2018). Implicit theories and self-efficacy in an introductory programming course. IEEE Transactions on Education, 61(3), 218–225. https://doi.org/10.1109/TE.2017.2789183

Tempelaar, D., Rienties, B., Mittelmeier, J., & Nguyen, Q. (2018). Student profiling in a dispositional learning analytics application using formative assessment. Computers in Human Behavior, 78, 408–420. https://doi.org/10.1016/j.chb.2017.08.010

Tempelaar, D., Rienties, B., & Nguyen, Q. (2020). Subjective data, objective data and the role of bias in predictive modelling: Lessons from a dispositional learning analytics application. PloS One, 15(6), e0233977. https://doi.org/10.1371/journal.pone.0233977

Van Petegem, C., Deconinck, L., Mourisse, D., Maertens, R., Strijbol, N., Dhoedt, B., De Wever, B., Dawyndt, P., & Mesuere, B. (2022). Pass/Fail prediction in programming courses. Journal of Educational Computing Research, 61(1), 68–95. https://doi.org/10.1177/07356331221085595

Van Petegem, C., Maertens, R., Strijbol, N., Van Renterghem, J., Van der Jeugt, F., De Wever, B., Dawyndt, P., & Mesuere, B. (2023). Dodona: Learn to code with a virtual co-teacher that supports active learning. SoftwareX, 24, 101578. https://doi.org/10.1016/j.softx.2023.101578

Vatrapu, R. (2011). Cultural considerations in learning analytics. In Proceedings of the first international conference on learning analytics and knowledge (LAK 2011), 27 February–1 March 2011, Banff, Alberta, Canada (pp. 127–133). ACM. https://doi.org/10.1145/2090116.2090136

Veenman, M. V. J. (2013). Assessing metacognitive skills in computerized learning environments. In R. Azevedo & V. Aleven (Eds.), International handbook of metacognition and learning technologies (pp. 157–168). Springer. https://doi.org/10.1007/978-1-4419-5546-3_11

Viberg, O., Jivet, I., & Scheffel, M. (2023). Designing culturally aware learning analytics: A value sensitive perspective. In O. Viberg & Å. Grönlund (Eds.), Practicable learning analytics (pp. 177–192). Springer International Publishing. https://doi.org/10.1007/978-3-031-27646-0_10

Viberg, O., Mutimukwe, C., & Grönlund, Å. (2022). Privacy in LA research. Journal of Learning Analytics, 9(3), 169–182. https://doi.org/10.18608/jla.2022.7751

Waheed, H., Hassan, S.-U., Nawaz, R., Aljohani, N. R., Chen, G., & Gašević, D. (2023). Early prediction of learners at risk in self-paced education: A neural network approach. Expert Systems with Applications, 213, 118868. https://doi.org/10.1016/j.eswa.2022.118868

Watson, C., & Li, F. W. B. (2014). Failure rates in introductory programming revisited. In Proceedings of the 2014 conference on innovation & technology in computer science education (ITiCSE 2014), 21–25 June 2014, Uppsala, Sweden (pp. 39–44). ACM. https://doi.org/10.1145/2591708.2591749

Wise, A. F., Knight, S., & Ochoa, X. (2021). What makes learning analytics research matter. Journal of Learning Analytics, 8(3), 1–9. https://doi.org/10.18608/jla.2021.7647

Zhidkikh, D., Saarela, M., & Kärkkäinen, T. (2023). Measuring self‐regulated learning in a junior high school mathematics classroom: Combining aptitude and event measures in digital learning materials. Journal of Computer Assisted Learning, 39(6), 1834–1851. https://doi.org/10.1111/jcal.12842

Zhou, M., & Winne, P. H. (2012). Modeling academic achievement by self-reported versus traced goal orientation. Learning and Instruction, 22(6), 413–419. https://doi.org/10.1016/j.learninstruc.2012.03.004

Downloads

Published

2024-01-26

How to Cite

Zhidkikh, D., Heilala, V., Van Petegem, C., Dawyndt, P., Järvinen, M., Viitanen, S., De Wever, B., Mesuere, B., Lappalainen, V., Kettunen, L., & Hämäläinen, R. (2024). Reproducing Predictive Learning Analytics in CS1: Toward Generalizable and Explainable Models for Enhancing Student Retention. Journal of Learning Analytics, 11(1), 132-150. https://doi.org/10.18608/jla.2024.7979

Most read articles by the same author(s)