Using Keystroke Analytics to Improve Pass-Fail Classifiers

Authors

  • Kevin Casey Maynooth University

DOI:

https://doi.org/10.18608/jla.2017.42.14

Keywords:

Learning analytics, keystroke analytics, data mining, virtual learning environments, student behaviour, early intervention

Abstract

Learning analytics offers insights into student behaviour and the potential to detect poor performers before they fail exams. If the activity is primarily online (for example computer programming), a wealth of low-level data can be made available that allows unprecedented accuracy in predicting which students will pass or fail. In this paper, we present a classification system for early detection of poor performers based on student effort data, such as the complexity of the programs they write, and show how it can be improved by the use of low-level keystroke analytics.

Author Biography

Kevin Casey, Maynooth University

Kevin is a lecturer at Maynooth University. Before his current position he served as a lecturer at Dublin City University for 3 years and at Griffith College Dublin for 9 years. Having completed a BSc and MSc in Computer Science at University College, Dublin, he began lecturing at GCD in 1994. He left in 2002 to take up a position as researcher in the School of Computer Science and Statistics at Trinity College Dublin. He continues his involvement with his research group at DCU in the area of data analytics (including Learning Analytics) and IoT. Kevin holds a PhD in Computer Science from Trinity College, Dublin.

References

Ahadi, A., Lister, R., Haapala, H., & Vihavainen, A. (2015). Exploring machine learning methods to automatically identify students in need of assistance. Proceedings of the 11th Annual International Conference on International Computing Education Research (ICER ’15), 9–13 July 2015, Omaha, Nebraska, USA (pp. 121–130). New York: ACM. http://dx.doi.org/10.1145/2787622.2787717

Anderson, M. R., Antenucci, D., Bittorf, V., Burgess, M., Cafarella, M. J., Kumar, A., Niu, F., Park, Y., Ré, C., & Zhang, C. (2013). Brainwash: A data system for feature engineering. Proceedings of the 6th Biennial Conference on Innovative Data Systems Research (CIDR ’13) 6–9 January 2013, Asilomar, California, USA.

http://www.cs.stanford.edu/people/chrismre/papers/mythical_man.pdf

Arnold, K. E., & Pistilli, M. D. (2012). Course Signals at Purdue: Using learning analytics to increase student success. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (LAK ʼ12), 29 April–2 May 2012, Vancouver, BC, Canada (pp. 267–270). New York: ACM. http://dx.doi.org/10.1145/2330601.2330666

Baker, R. S., Gowda, S., & Corbett, A. (2010). Automatically detecting a student’s preparation for future learning: Help use is key. In M. Pechenizkiy et al. (Eds.), Proceedings of the 4th Annual Conference on Educational Data Mining (EDM2011), 6–8 July 2011, Eindhoven, Netherlands (pp. 179–188). International Educational Data Mining Society.

Baker, R. S., Gowda, S. M., & Corbett, A. T. (2011). Towards predicting future transfer of learning. International Conference on Artificial Intelligence in Education (pp. 23–30). Lecture Notes in Computer Science vol. 6738. Springer Berlin Heidelberg. doi:10.1007/978-3-642-21869-9_6

Baur, N. (2006). Microprocessor simulator for students. Available at: http://tinyurl.com/5pyhnk

Beal, C. R., Walles, R., Arroyo, I., & Woolf, B. P. (2007). On-line tutoring for math achievement testing: A controlled evaluation. Journal of Interactive Online Learning, 6(1), 43–55.

Beaubouef, T., & Mason, J. (2005). Why the high attrition rate for computer science students: Some thoughts and observations. ACM SIGCSE Bulletin, 37(2), 103–106. http://dx.doi.org/10.1145/1083431.1083474

Becker, B. A. (2015). An exploration of the effects of enhanced compiler error messages for computer programming novices (Master’s dissertation). Dublin Institute of Technology.

Becker, B. A., Glanville, G., Iwashima, R., McDonnell, C., Goslin, K., & Mooney, C. (2016). Effective compiler error message enhancement for novice programming students. Computer Science Education 26(2), 148–175. http://dx.doi.org/10.1080/08993408.2016.1225464

Bergadano, F., Gunetti, D., & Picardi, C. (2003). Identity verification through dynamic keystroke analysis. Intelligent Data Analysis, 7(5), 469–496.

Berland, M., Martin, T., Benton, T., Petrick Smith, C., & Davis, D. (2013). Using learning analytics to understand the learning pathways of novice programmers. Journal of the Learning Sciences, 22(4), 564–599. http://dx.doi.org/10.1080/10508406.2013.836655

Biggers, M., Brauer, A., & Yilmaz, T. (2008). Student perceptions of computer science: A retention study comparing graduating seniors with CS leavers. ACM SIGCSE Bulletin, 40(1), 402–406.

Breiman, L., & Cutler, A. (2008). Random forests. http://www.stat.berkeley.edu/ ~breiman/RandomForests

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees (CART) Belmont, CA: Wadsworth International Group.

Breslow, L., Pritchard, D. E., DeBoer, J., Stump, G. S., Ho, A. D., & Seaton, D. T. (2013). Studying learning in the worldwide classroom: Research into edX’s first MOOC. Research & Practice in Assessment, 8, 13–25. http://www.rpajournal.com/dev/wp-content/uploads/2013/05/SF2.pdf

Brown, N. C. C., Kölling, M., McCall, D., & Utting, I. (2014). Blackbox: A large scale repository of novice programmers’ activity. Proceedings of the 45th ACM Technical Symposium on Computer Science Education (SIGCSE ’14), 5–8 March 2014, Atlanta, Georgia, USA (pp. 223–228). New York: ACM. http://dx.doi.or/10.1145/2538862.2538924

Casey, K., & Gibson, P. (2010). Mining Moodle to understand student behaviour. International Conference on Engaging Pedagogy (ICEP10), National University of Ireland Maynooth. Retrieved from http://www-public.tem-tsp.eu/~gibson/Research/Publications/E-Copies/ICEP10.pdf

Caspersen, M. E., & Bennedsen, J. (2007). Instructional design of a programming course: A learning theoretic approach. Proceedings of the 3rd International Workshop on Computing Education Research (ICER ’07), 15–16 September 2007, Atlanta, Georgia, USA (pp. 111–122). New York: ACM. http://dx.doi.org/10.1145/1288580.1288595

Champaign, J., Colvin, K. F., Liu, A., Fredericks, C., Seaton, D., & Pritchard, D. E. (2014). Correlating skill and improvement in 2 MOOCs with a student’s time on tasks. Proceedings of the 1st ACM Conference on Learning @ Scale (L@S 2014), 4–5 March 2014, Atlanta, Georgia, USA (pp. 11–20). New York: ACM. http://dx.doi.org/10.1145/2556325.2566250

Davis, J. (2011). CompTIA: 400K IT jobs unfilled. Channel Insider, 2 August 2011. http://tinyurl.com/ca699dr

Dowland, P. S., & Furnell, S. M. (2004). A long-term trial of keystroke profiling using digraph, trigraph and keyword latencies. IFIP International Information Security Conference (pp. 275–289). Springer US. http://dx.doi.org/10.1007/1-4020-8143-X_18

Edwards, S. (2013). Continuous Data-driven Learning Assessment. In Future Directions in Computing Education Summit White Papers (SC1186). Stanford, CA: Special Collections and University Archives, Stanford University Libraries. http://tinyurl.com/jep5vgt

Epp, C., Lippold, M., & Mandryk, R. L. (2011). Identifying emotional states using keystroke dynamics. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ʼ11), 7–12 May 2011, Vancouver, BC, Canada (pp. 715–724). New York: ACM. http://dx.doi.org/10.1145/1978942.1979046

Feng, M., Heffernan, N. T., & Koedinger, K. R. (2006). Predicting state test scores better with intelligent tutoring systems: Developing metrics to measure assistance required. In M. Ikeda, K. Ashlay, T.-W. Chan (Eds.), Proceedings of the 8th International Conference on Intelligent Tutoring Systems (ITS 2006), 26–30 June 2006, Jhongli, Taiwan (pp. 31–40). Springer Berlin Heidelberg.

Ferreira, D. (2013). Instant HTML5 Presentations How-to. Birmingham, UK: Packt Publishing.

Garner, S. (2002). Reducing the cognitive load on novice programmers. In P. Barker & S. Rebelsky (Eds.), Proceedings of the 14th World Conference on Educational Multimedia, Hypermedia & Telecommunications (ED-MEDIA 2002), 24–29 June 2002, Denver, Colorado, USA (pp. 578–583). Association for the Advancement of Computing in Education (AACE).

Jbara, A., & Feitelson, D. G. (2014). Quantification of code regularity using preprocessing and compression. http://www.cs.huji.ac.il/~feit/papers/RegMet14.pdf

Kelly, D., & Thorn, K. (2013, March). Should instructional designers care about the Tin Can API? eLearn Magazine. http://elearnmag.acm.org/archive.cfm?aid=2446579.

Kramer, O. (2016). Machine learning in evolution strategies (Vol. 20). Springer Berlin Heidelberg.

Lang, C., McKay, J., & Lewis, S. (2007). Seven factors that influence ICT student achievement. ACM SIGCSE Bulletin, 39(3), 221–225). http://dx.doi.org/10.1145/1268784.1268849

Lister, R. (2008). After the gold rush: Toward sustainable scholarship in computing. Proceedings of the 10th Conference on Australasian Computing Education (ACE ’08), Vol. 78, 1 January 2008, Wollongong, NSW, Australia (pp. 3–17). Darlinghurst, Australia: Australian Computer Society.

Liu, D., & Xu, S. (2011). An Empirical Study of Programming Performance Based on Keystroke Characteristics. Computer and Information Science, 2011 (pp. 59–72). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-21378-6_5

Longi, K., Leinonen, J., Nygren, H., Salmi, J., Klami, A., & Vihavainen, A. (2015). Identification of programmers from typing patterns. Proceedings of the 15th Koli Calling International Conference on Computing Education Research (Koli Calling ’15), 19–22 November 2015, Koli, Finland (pp. 60–67), New York: ACM. http://dx.doi.org/10.1145/2828959.2828960

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81.

Ochoa, X., Chiluiza, K., Méndez, G., Luzardo, G., Guamán, B., & Castells, J. (2013). Expertise estimation based on simple multimodal features. Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI ’13), 9–13 December 2013, Sydney, Australia (pp. 583–590). New York: ACM. http://dx.doi.org/10.1145/2522848.2533789

O’Kelly, J., Bergin, S., Dunne, S., Gaughran, P., Ghent, J., & Mooney, A. (2004a). Initial findings on the impact of an alternative approach to problem based learning in computer science. Proceedings of the PBL International Conference, Cancun, Mexico, June, 2004.

O’Kelly, J., Mooney, A., Bergin, S., Gaughran, P., & Ghent, J. (2004b). An overview of the integration of problem based learning into an existing computer science programming module. Proceedings of the PBL International Conference, Cancun, Mexico, June, 2004.

Pardos, Z., Bergner, Y., Seaton, D., & Pritchard, D. (2013, July). Adapting Bayesian knowledge tracing to a massive open online course in edx. In S. K. DʼMello et al. (Eds.), Proceedings of the 6th International Conference on Educational Data Mining (EDM2013), 6–9 July 2013, Memphis, TN, USA (pp. 137–144). International Educational Data Mining Society/Springer.

Quinlan, J. R. (1996). Bagging, boosting, and C4.5. Proceedings of the 13th National Conference on Artificial Intelligence (AAAI’96), 4–8 August 1996, Portland, Oregon, USA (Vol. 1, pp. 725–730). Palo Alto, CA: AAAI Press. http://dx.doi.org/10.1243/095440505X32274

Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. Encyclopedia of Database Systems, pp. 532–538. Springer.

Ragan-Kelley, M., Perez, F., Granger, B., Kluyver, T., Ivanov, P., Frederic, J., & Bussonier, M. (2014). The Jupyter/IPython architecture: A unified view of computational research, from interactive exploration to communication and publication. American Geophysical Union, Fall Meeting Abstracts, #H44D-07 (Vol. 1, p. 7).

Romero-Zaldivar, V. A., Pardo, A., Burgos, D., & Kloos, C. D. (2012). Monitoring student progress using virtual appliances: A case study. Computers & Education, 58(4), 1058–1067. https://dx.doi.org/10.1016/j.compedu.2011.12.003

Scheffel, M., Niemann, K., Leony, D., Pardo, A., Schmitz, H. C., Wolpers, M., & Kloos, C. D. (2012). Key action extraction for learning analytics. Proceedings of the 7th European Conference on Technology Enhanced Learning (EC-TEL 2012), 18–21 September 2012, Saarbrücken, Germany (pp. 320–333). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-33263-0_25

Siemens, G., & Long, P. (2011). Penetrating the fog: Analytics in learning and education. EDUCAUSE Review, 46(5), 30.

Slonim, J., Scully, S., & McAllister, M. (2008). Crossroads for Canadian CS enrollment. Communications of the ACM, 51(10), 66–70. http://dx.doi.org/10.1145/1400181.1400199

Teague, D., & Roe, P. (2007). Learning to program: Going pair-shaped. Innovation in Teaching and Learning in Information and Computer Sciences, 6(4), 4–22. http://dx.doi.org/10.11120/ital.2007.06040004

Thibodeau, P. (2011). Romney sees tech skills shortage: More H-1B visas needed. Computer World, 7 September 2011. http://tinyurl.com/76l4qxo

Thomas, R. C., Karahasanovic, A., & Kennedy, G. E. (2005). An investigation into keystroke latency metrics as an indicator of programming performance. Proceedings of the 7th Australasian Conference on Computing Education (ACE ’05), Vol. 42, January/February 2005, Newcastle, New South Wales, Australia (pp. 127–134). Darlinghurst, Australia: Australian Computer Society.

Yousoof, M., Sapiyan, M., & Kamaluddin, K. (2007). Measuring cognitive load: A solution to ease learning of programming. World Academy of Science, Engineering and Technology, 26, 216–219.

Yuan, K., Steedle, J., Shavelson, R., Alonzo, A., & Oppezzo, M. (2006). Working memory, fluid intelligence, and science learning. Educational Research Review, 1(2), 83–98. https://dx.doi.org/10.1016/j.edurev.2006.08.005

Downloads

Published

2017-07-05

How to Cite

Casey, K. (2017). Using Keystroke Analytics to Improve Pass-Fail Classifiers. Journal of Learning Analytics, 4(2), 189–211. https://doi.org/10.18608/jla.2017.42.14