Reproducing Predictive Learning Analytics in CS1
Toward Generalizable and Explainable Models for Enhancing Student Retention
DOI:
https://doi.org/10.18608/jla.2024.7979Keywords:
predictive learning analytics, CS1, retention, privacy, self-reported data, trace data, research paperAbstract
Predictive learning analytics has been widely explored in educational research to improve student retention and academic success in an introductory programming course in computer science (CS1). General-purpose and interpretable dropout predictions still pose a challenge. Our study aims to reproduce and extend the data analysis of a privacy-first student pass–fail prediction approach proposed by Van Petegem and colleagues (2022) in a different CS1 course. Using student submission and self-report data, we investigated the reproducibility of the original approach, the effect of adding self-reports to the model, and the interpretability of the model features. The results showed that the original approach for student dropout prediction could be successfully reproduced in a different course context and that adding self-report data to the prediction model improved accuracy for the first four weeks. We also identified relevant features associated with dropout in the CS1 course, such as timely submission of tasks and iterative problem solving. When analyzing student behaviour, submission data and self-report data were found to complement each other. The results highlight the importance of transparency and generalizability in learning analytics and the need for future research to identify other factors beyond self-reported aptitude measures and student behaviour that can enhance dropout prediction.
References
Abdollahi, B., & Nasraoui, O. (2018). Transparency in fair machine learning: The case of explainable recommender systems. In J. Zhou & F. Chen (Eds.), Human and machine learning: Visible, explainable, trustworthy and transparent (pp. 21–35). Springer International Publishing. https://doi.org/10.1007/978-3-319-90403-0_2
Astrachan, O., & Briggs, A. (2012). The CS principles project. ACM Inroads, 3(2), 38–42. https://doi.org/10.1145/2189835.2189849
Becker, B. A., & Quille, K. (2019). 50 years of CS1 at SIGCSE: A review of the evolution of introductory programming education research. In Proceedings of the 50th ACM technical symposium on computer science education (SIGCSE 2019), 27 February–2 March 2019, Minneapolis, Minnesota, USA (pp. 338–344). ACM. https://doi.org/10.1145/3287324.3287432
Bergin, S., & Reilly, R. (2005). Programming: Factors that influence success. In Proceedings of the 36th SIGCSE technical symposium on computer science education (SIGCSE 2005), 23–27 February 2005, St. Louis, Missouri, USA (pp. 411–415). ACM. https://doi.org/10.1145/1047344.1047480
Biderman, S., & Raff, E. (2022). Fooling MOSS detection with pretrained language models. In Proceedings of the 31st ACM international conference on information & knowledge management (CIKM 2022), 17–21 October 2022, Atlanta, Georgia, USA (pp. 2933–2943). ACM. https://doi.org/10.1145/3511808.3557079
Bosch, T. (Ed.). (2022). “You are not expected to understand this”: How 26 lines of code changed the world. Princeton University Press. https://www.degruyter.com/document/doi/10.1515/9780691230818/html
Bouvier, D., Lovellette, E., & Matta, J. (2021). Overnight feedback reduces late submissions on programming projects in CS1. In Proceedings of the 23rd australasian computing education conference (ACE 2021), 2–4 February 2021, online (pp. 176–180). ACM. https://doi.org/10.1145/3441636.3442319
Breaux, T., & Moritz, J. (2021). The 2021 software developer shortage is coming. Communications of the ACM, 64(7), 39–41. https://doi.org/10.1145/3440753
Castro-Wunsch, K., Ahadi, A., & Petersen, A. (2017). Evaluating neural networks as a method for identifying students in need of assistance. In Proceedings of the 2017 ACM SIGCSE technical symposium on computer science education (SIGCSE 2017), 8–11 March 2017, Seattle, Washington, USA (pp. 111–116). ACM. https://doi.org/10.1145/3017680.3017792
Choi, H., Winne, P. H., Brooks, C., Li, W., & Shedden, K. (2023). Logs or self-reports? Misalignment between behavioral trace data and surveys when modeling learner achievement goal orientation. In Proceedings of the 13th international learning analytics and knowledge conference (LAK 2023), 13–17 March 2023, Arlington, Texas, USA (pp. 11–21). ACM. https://doi.org/10.1145/3576050.3576052
Clow, D. (2012). The learning analytics cycle: Closing the loop effectively. In Proceedings of the second international conference on learning analytics and knowledge (LAK 2012), 29 April–2 May 2012, Vancouver, British Columbia, Canada (pp. 134–138). ACM. https://doi.org/10.1145/2330601.2330636
Dawson, S., Jovanovic, J., Gašević, D., & Pardo, A. (2017). From prediction to impact: Evaluation of a learning analytics retention program. In Proceedings of the seventh international conference on learning analytics and knowledge (LAK 2017), 13–17 March 2017, Vancouver, British Columbia, Canada (pp. 474–478). ACM. https://doi.org/10.1145/3027385.3027405
Deho, O. B., Zhan, C., Li, J., Liu, J., Liu, L., & Duy Le, T. (2022). How do the existing fairness metrics and unfairness mitigation algorithms contribute to ethical learning analytics? British Journal of Educational Technology: Journal of the Council for Educational Technology, 53(4), 822–843. https://doi.org/10.1111/bjet.13217
Denzin, N. K. (2009). The research act: A theoretical introduction to sociological methods. Routledge. https://doi.org/10.4324/9781315134543
Duncan, T. G., & McKeachie, W. J. (2005). The making of the Motivated Strategies for Learning Questionnaire. Educational Psychologist, 40(2), 117–128. https://doi.org/10.1207/s15326985ep4002_6
Ellis, R. A., Han, F., & Pardo, A. (2017). Improving learning analytics—Combining observational and self-report data on student learning. Journal of Educational Technology & Society, 20(3), 158–169.
Force, C. T. (2020). Computing curricula 2020 CC2020: Paradigms for global computing education. ACM, IEEE. https://doi.org/10.1145/3467967
Foster, E., & Siddle, R. (2020). The effectiveness of learning analytics for identifying at-risk students in higher education. Assessment & Evaluation in Higher Education, 45(6), 842–854. https://doi.org/10.1080/02602938.2019.1682118
Gašević, D., Jovanović, J., Pardo, A., & Dawson, S. (2017). Detecting learning strategies with analytics: Links with self-reported measures and academic performance. Journal of Learning Analytics, 4(2, 2), 113–128. https://doi.org/10.18608/jla.2017.42.10
Gašević, D., Kovanović, V., & Joksimović, S. (2017). Piecing the learning analytics puzzle: A consolidated model of a field of research and practice. Learning: Research and Practice, 3(1), 63–78. https://doi.org/10.1080/23735082.2017.1286142
Gonyea, R. M. (2005). Self-reported data in institutional research: Review and recommendations. New Directions for Institutional Research, 2005(127), 73–89. https://doi.org/10.1002/ir.156
Hawlitschek, A., Köppen, V., Dietrich, A., & Zug, S. (2019). Drop-out in programming courses—Prediction and prevention. Journal of Applied Research in Higher Education, 12(1), 124–136. https://doi.org/10.1108/JARHE-02-2019-0035
Heilala, V. (2022). Learning analytics with learning and analytics: Advancing student agency analytics [Phdthesis, University of Jyvaskyla / JYU Dissertations 512]. https://jyx.jyu.fi/handle/123456789/80877
Heilala, V., Jääskelä, P., Kärkkäinen, T., & Saarela, M. (2020). Understanding the study experiences of students in low agency profile: Towards a smart education approach. In A. E. Moussati, K. Kpalma, M. G. Belkasmi, M. Saber, & S. Guégan (Eds.), Advances in smart technologies: Applications and case studies (pp. 498–508). Springer International Publishing. https://doi.org/10.1007/978-3-030-53187-4_54
Heilala, V., Jääskelä, P., Saarela, M., Kuula, A.-S., Eskola, A., & Kärkkäinen, T. (2022). “Sitting at the stern and holding the rudder”: Teachers’ reflections on action in higher education based on student agency analytics. In L. Chechurin (Ed.), Digital teaching and learning in higher education: Developing and disseminating skills for blended learning (pp. 71–91). Springer International Publishing. https://doi.org/10.1007/978-3-031-00801-6_4
Herodotou, C., Rienties, B., Boroowa, A., Zdrahal, Z., & Hlosta, M. (2019). A large-scale implementation of predictive learning analytics in higher education: The teachers’ role and perspective. Educational Technology Research and Development: ETR & D, 67(5), 1273–1306. https://doi.org/10.1007/s11423-019-09685-0
Ifenthaler, D., Schumacher, C., & Kuzilek, J. (2022). Investigating students’ use of self‐assessments in higher education using learning analytics. Journal of Computer Assisted Learning, 39(1), 255–268. https://doi.org/10.1111/jcal.12744
Ifenthaler, D., & Yau, J. Y.-K. (2020). Reflections on different learning analytics indicators for supporting study success. International Journal of Learning Analytics and Artificial Intelligence for Education (iJAI), 2(2), 4–23. https://doi.org/10.3991/ijai.v2i2.15639
Ihantola, P., Vihavainen, A., Ahadi, A., Butler, M., Börstler, J., Edwards, S. H., Isohanni, E., Korhonen, A., Petersen, A., Rivers, K., Rubio, M. Á., Sheard, J., Skupas, B., Spacco, J., Szabo, C., & Toll, D. (2015). Educational data mining and learning analytics in programming: Literature review and case studies. In Proceedings of the 2015 innovation and technology in computer science education conference on working group reports (ITiCSE 2015), 4–8 July 2015, Vilnius, Lithuania (pp. 41–63). ACM. https://doi.org/10.1145/2858796.2858798
Isomöttönen, V., Lakanen, A.-J., & Lappalainen, V. (2019). Less is more! Preliminary evaluation of multi-functional document-based online learning environment. In Proceedings of the 2019 IEEE frontiers in education conference (FIE 2019), 16–19 October 2019, Cincinnati, Ohio, USA (pp. 1–5). IEEE. https://doi.org/10.1109/FIE43999.2019.9028353
Jansen, R. S., van Leeuwen, A., Janssen, J., & Kester, L. (2020). A mixed method approach to studying self-regulated learning in MOOCs: Combining trace data with interviews. Frontline Learning Research, 8(2), 35–64. https://doi.org/10.14786/flr.v8i2.539
Jarvis, P. (2010). Assessing and evaluating. In Adult education and lifelong learning (4th ed., pp. 210–226). Routledge. https://doi.org/10.4324/9780203718100
Jayaprakash, S. M., Moody, E. W., Lauría, E. J., Regan, J. R., & Baron, J. D. (2014). Early alert of academically at-risk students: An open source analytics initiative. Journal of Learning Analytics, 1(1), 6–47. https://doi.org/10.18608/jla.2014.11.3
Kinnunen, P., & Malmi, L. (2006). Why students drop out CS1 course? In Proceedings of the second international workshop on computing education research (ICER 2006), 9–10 September 2006, Canterbury, UK (pp. 97–108). ACM. https://doi.org/10.1145/1151588.1151604
Kokoç, M., Akçapınar, G., & Hasnine, M. N. (2021). Unfolding students’ online assignment submission behavioral patterns using temporal learning analytics. Educational Technology & Society, 24(1), 223–235.
Kori, K., Pedaste, M., & Must, O. (2017). Integration of Estonian higher education information technology students and its effect on graduation-related self-efficacy. In P. Zaphiris & A. Ioannou (Eds.), Learning and collaboration technologies. Technology in education (pp. 435–448). Springer International Publishing. https://doi.org/10.1007/978-3-319-58515-4_33
Kovacic, Z. (2012). Predicting student success by mining enrolment data. Research in Higher Education, 15. http://hdl.handle.net/11072/1486
Lacave, C., Molina, A. I., & Cruz-Lemus, J. A. (2018). Learning Analytics to identify dropout factors of Computer Science studies through Bayesian networks. Behaviour & Information Technology, 37(10–11), 993–1007. https://doi.org/10.1080/0144929X.2018.1485053
Lakanen, A.-J., & Isomöttönen, V. (2023). CS1: Intrinsic motivation, self-efficacy, and effort. Informatics in Education, 22(4), 651–670. https://doi.org/10.15388/infedu.2023.26
Lepri, B., Oliver, N., Letouzé, E., Pentland, A., & Vinck, P. (2018). Fair, transparent, and accountable algorithmic decision-making processes. Philosophy & Technology, 31(4), 611–627. https://doi.org/10.1007/s13347-017-0279-x
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2018). Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 18(185), 1–52.
Liz-Dominguez, M., Caeiro-Rodriguez, M., Llamas-Nistal, M., & Mikic-Fonte, F. (2019). Predictors and early warning systems in higher education—A systematic literature review. In M. Caeiro-Rodríguez, Á. Hernández-García, & P. Muñoz-Merino (Eds.), Learning analytics summer institute spain 2019: Learning analytics in higher education. https://ceur-ws.org/Vol-2415/paper08.pdf
Maertens, R., Van Petegem, C., Strijbol, N., Baeyens, T., Jacobs, A. C., Dawyndt, P., & Mesuere, B. (2022). Dolos: Language-agnostic plagiarism detection in source code. Journal of Computer Assisted Learning, 38(4), 1046–1061. https://doi.org/10.1111/jcal.12662
Mangaroska, K., Sharma, K., Gasevic, D., & Giannakos, M. (2020). Multimodal learning analytics to inform learning design: Lessons learned from computing education. Journal of Learning Analytics, 7(3), 79–97. https://doi.org/10.18608/jla.2020.73.7
Marco-Galindo, M.-J., Minguillón, J., García-Solórzano, D., & Sancho-Vinuesa, T. (2022). Why do CS1 students become repeaters? IEEE Revista Iberoamericana de Tecnologias Del Aprendizaje, 17(3), 245–253. https://doi.org/10.1109/RITA.2022.3191288
Markets of tomorrow report 2023: Turning technologies into new sources of global growth. (2023). https://www.weforum.org/reports/markets-of-tomorrow-report-2023-turning-technologies-into-new-sources-of-global-growth/
Mathrani, A., Susnjak, T., Ramaswami, G., & Barczak, A. (2021). Perspectives on the challenges of generalizability, transparency and ethics in predictive learning analytics. Computers and Education Open, 2, 100060. https://doi.org/10.1016/j.caeo.2021.100060
Niu, L. (2020). A review of the application of logistic regression in educational research: Common issues, implications, and suggestions. Educational Review, 72(1), 41–67. https://doi.org/10.1080/00131911.2018.1483892
Olney, T., Walker, S., Wood, C., & Clarke, A. (2021). Are we living in LA (P)LA land? Journal of Learning Analytics, 8(3), 45–59. https://doi.org/10.18608/jla.2021.7261
Omer, U., Tehseen, R., Farooq, M. S., & Abid, A. (2023). Learning analytics in programming courses: Review and implications. Education and Information Technologies, 28(9), 11221–11268. https://doi.org/10.1007/s10639-023-11611-0
Petersen, A., Craig, M., Campbell, J., & Tafliovich, A. (2016). Revisiting why students drop CS1. In Proceedings of the 16th koli calling international conference on computing education research, 24–27 November 2016, Koli, Finland (pp. 71–80). ACM. https://doi.org/10.1145/2999541.2999552
Porter, L., & Zingaro, D. (2014). Importance of early performance in CS1: Two conflicting assessment stories. In Proceedings of the 45th ACM technical symposium on computer science education (SIGCSE 2014), 5–8 March 2014, Atlanta, Georgia, USA (pp. 295–300). ACM. https://doi.org/10.1145/2538862.2538912
Postareff, L., Mattsson, M., Lindblom-Ylänne, S., & Hailikari, T. (2017). The complex relationship between emotions, approaches to learning, study success and study progress during the transition to university. Higher Education, 73(3), 441–457. https://doi.org/10.1007/s10734-016-0096-7
Prenkaj, B., Velardi, P., Stilo, G., Distante, D., & Faralli, S. (2020). A survey of machine learning approaches for student dropout prediction in online courses. ACM Computing Surveys, 53(3), 1–34. https://doi.org/10.1145/3388792
Quille, K., & Bergin, S. (2016). Programming: Further factors that influence success. PPIG 2016—27th Annual Workshop, 7–10 September 2016, Cambridge, UK. https://www.ppig.org/papers/2016-ppig-27th-quille/
Quille, K., & Bergin, S. (2018). Programming: Predicting student success early in CS1. A re-validation and replication study. In Proceedings of the 23rd annual ACM conference on innovation and technology in computer science education (ITiCSE 2018), 2–4 July 2018, Larnaca, Cyprus (pp. 15–20). ACM. https://doi.org/10.1145/3197091.3197101
Quille, K., & Bergin, S. (2019). CS1: How will they do? How can we help? A decade of research and practice. Computer Science Education, 29(2–3), 254–282. https://doi.org/10.1080/08993408.2019.1612679
Roscher, R., Bohn, B., Duarte, M. F., & Garcke, J. (2020). Explainable machine learning for scientific insights and discoveries. IEEE Access : Practical Innovations, Open Solutions, 8, 42200–42216. https://doi.org/10.1109/ACCESS.2020.2976199
Roth, A., Ogrin, S., & Schmitz, B. (2016). Assessing self-regulated learning in higher education: A systematic literature review of self-report instruments. Educational Assessment, Evaluation and Accountability, 28(3), 225–250. https://doi.org/10.1007/s11092-015-9229-2
Rountree, N., Rountree, J., & Robins, A. (2002). Predictors of success and failure in a CS1 course. ACM SIGCSE Bulletin, 34(4), 121–124. https://doi.org/10.1145/820127.820182
Rousseeuw, P. J., & van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85(411), 633–639. https://doi.org/10.1080/01621459.1990.10474920
Saarela, M., Heilala, V., Jääskelä, P., Rantakaulio, A., & Kärkkäinen, T. (2021). Explainable student agency analytics. IEEE Access : Practical Innovations, Open Solutions, 9, 137444–137459. https://doi.org/10.1109/ACCESS.2021.3116664
Sghir, N., Adadi, A., & Lahmer, M. (2022). Recent advances in Predictive Learning Analytics: A decade systematic review (2012–2022). Education and Information Technologies, 28, 8299–8333. https://doi.org/10.1007/s10639-022-11536-0
Tek, F. B., Benli, K. S., & Deveci, E. (2018). Implicit theories and self-efficacy in an introductory programming course. IEEE Transactions on Education, 61(3), 218–225. https://doi.org/10.1109/TE.2017.2789183
Tempelaar, D., Rienties, B., Mittelmeier, J., & Nguyen, Q. (2018). Student profiling in a dispositional learning analytics application using formative assessment. Computers in Human Behavior, 78, 408–420. https://doi.org/10.1016/j.chb.2017.08.010
Tempelaar, D., Rienties, B., & Nguyen, Q. (2020). Subjective data, objective data and the role of bias in predictive modelling: Lessons from a dispositional learning analytics application. PloS One, 15(6), e0233977. https://doi.org/10.1371/journal.pone.0233977
Van Petegem, C., Deconinck, L., Mourisse, D., Maertens, R., Strijbol, N., Dhoedt, B., De Wever, B., Dawyndt, P., & Mesuere, B. (2022). Pass/Fail prediction in programming courses. Journal of Educational Computing Research, 61(1), 68–95. https://doi.org/10.1177/07356331221085595
Van Petegem, C., Maertens, R., Strijbol, N., Van Renterghem, J., Van der Jeugt, F., De Wever, B., Dawyndt, P., & Mesuere, B. (2023). Dodona: Learn to code with a virtual co-teacher that supports active learning. SoftwareX, 24, 101578. https://doi.org/10.1016/j.softx.2023.101578
Vatrapu, R. (2011). Cultural considerations in learning analytics. In Proceedings of the first international conference on learning analytics and knowledge (LAK 2011), 27 February–1 March 2011, Banff, Alberta, Canada (pp. 127–133). ACM. https://doi.org/10.1145/2090116.2090136
Veenman, M. V. J. (2013). Assessing metacognitive skills in computerized learning environments. In R. Azevedo & V. Aleven (Eds.), International handbook of metacognition and learning technologies (pp. 157–168). Springer. https://doi.org/10.1007/978-1-4419-5546-3_11
Viberg, O., Jivet, I., & Scheffel, M. (2023). Designing culturally aware learning analytics: A value sensitive perspective. In O. Viberg & Å. Grönlund (Eds.), Practicable learning analytics (pp. 177–192). Springer International Publishing. https://doi.org/10.1007/978-3-031-27646-0_10
Viberg, O., Mutimukwe, C., & Grönlund, Å. (2022). Privacy in LA research. Journal of Learning Analytics, 9(3), 169–182. https://doi.org/10.18608/jla.2022.7751
Waheed, H., Hassan, S.-U., Nawaz, R., Aljohani, N. R., Chen, G., & Gašević, D. (2023). Early prediction of learners at risk in self-paced education: A neural network approach. Expert Systems with Applications, 213, 118868. https://doi.org/10.1016/j.eswa.2022.118868
Watson, C., & Li, F. W. B. (2014). Failure rates in introductory programming revisited. In Proceedings of the 2014 conference on innovation & technology in computer science education (ITiCSE 2014), 21–25 June 2014, Uppsala, Sweden (pp. 39–44). ACM. https://doi.org/10.1145/2591708.2591749
Wise, A. F., Knight, S., & Ochoa, X. (2021). What makes learning analytics research matter. Journal of Learning Analytics, 8(3), 1–9. https://doi.org/10.18608/jla.2021.7647
Zhidkikh, D., Saarela, M., & Kärkkäinen, T. (2023). Measuring self‐regulated learning in a junior high school mathematics classroom: Combining aptitude and event measures in digital learning materials. Journal of Computer Assisted Learning, 39(6), 1834–1851. https://doi.org/10.1111/jcal.12842
Zhou, M., & Winne, P. H. (2012). Modeling academic achievement by self-reported versus traced goal orientation. Learning and Instruction, 22(6), 413–419. https://doi.org/10.1016/j.learninstruc.2012.03.004
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Journal of Learning Analytics
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
TEST