The Effects of Explanations in Automated Essay Scoring Systems on Student Trust and Motivation
DOI:
https://doi.org/10.18608/jla.2023.7801Keywords:
explainable artificial intelligence, automated essays scoring systems, trust, motivation, academic writing, research paperAbstract
Ethical considerations, including transparency, play an important role when using artificial intelligence (AI) in education. Explainable AI has been coined as a solution to provide more insight into the inner workings of AI algorithms. However, carefully designed user studies on how to design explanations for AI in education are still limited. The current study aimed to identify the effect of explanations of an automated essay scoring system on students’ trust and motivation. The explanations were designed using a needs-elicitation study with students in combination with guidelines and frameworks of explainable AI. Two types of explanations were tested: full-text global explanations and an accuracy statement. The results showed that both explanations did not have an effect on student trust or motivation compared to no explanations. Interestingly, the grade provided by the system, and especially the difference between the student’s self-estimated grade and the system grade, showed a large influence. Hence, it is important to consider the effects of the outcome of the system (here: grade) when considering the effect of explanations of AI in education.
References
Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052
Alan, A., Costanza, E., Fischer, J., Ramchurn, S. D., Rodden, T., & Jennings, N. R. (2014). A field study of human–agent interaction for electricity tariff switching. Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014), 5–9 May 2014, Paris, France (pp. 965–972).
Alikaniotis, D., Yannakoudakis, H., & Rei, M. (2016). Automatic text scoring using neural networks. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), 7–12 August 2016, Berlin, Germany (Vol. 1: Long papers, pp. 715–725). Association for Computational Linguistics. https://doi.org/10.18653/V1/P16-1068
Allen, L. K., Jacovina, M. E., & McNamara, D. S. (2016). Computer-based writing instruction. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research, 2nd ed. (pp. 316–329). Guildford Press. https://eric.ed.gov/?id=ED586512
Alonso, J. M., & Casalino, G. (2019). Explainable artificial intelligence for human-centric data analysis in virtual learning environments. In D. Burgos, M. Cimitile, P. Ducange, R. Pecori, P. Picerno, P. Raviolo, & C. M. Stracke (Eds.), Higher education learning methodologies and technologies online (pp. 125–138). Springer. https://doi.org/10.1007/978-3-030-31284-8_10
Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Bejamins, R., Chatila, R., & Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012
Ashoori, M., & Weisz, J. D. (2019). In AI we trust? Factors that influence trustworthiness of AI-infused decision-making processes. https://doi.org/10.48550/arXiv.1912.02675
Attali, Y., & Burstein, J. (2004). Automated essay scoring with e-rater® V.2.0. ETS Research Report Series, 2004(2). https://doi.org/10.1002/j.2333-8504.2004.tb01972.x
Barria-Pineda, J., & Brusilovsky, P. (2019). Making educational recommendations transparent through a fine-grained open learner model. IUI Workshops ’19, 20 March 2019, Los Angeles, CA, USA. https://ceur-ws.org/Vol-2327/IUI19WS-IUIATEC-6.pdf
Bellotti, V., & Edwards, K. (2001). Intelligibility and accountability: Human considerations in context-aware systems. Human–Computer Interaction, 16(2–4), 193–212. https://doi.org/10.1207/S15327051HCI16234_05
Bodily, R., Kay, J., Aleven, V., Jivet, I., Davis, D., Xhakaj, F., & Verbert, K. (2018). Open learner models and learning analytics dashboards: A systematic review. Proceedings of the 8th International Conference on Learning Analytics and Knowledge (LAK ’18), 5–9 March 2018, Sydney, NSW, Australia (pp. 41–50). ACM Press. https://doi.org/10.1145/3170358.3170409
Bull, S., & Kay, J. (2016). SMILI☺: A framework for interfaces to learning data in open learner models, learning analytics and related fields. International Journal of Artificial Intelligence in Education, 26, 293–331. https://doi.org/10.1007/s40593-015-0090-8
Bussone, A., Stumpf, S., & O’Sullivan, D. (2015). The role of explanations on trust and reliance in clinical decision support systems. Proceedings of the 2015 IEEE International Conference on Healthcare Informatics (ICHI 2015), 21–23 October 2015, Dallas, TX, USA (pp. 160–169). IEEE Computer Society. https://doi.org/10.1109/ICHI.2015.26
Cacioppo, J. T., Petty, R. E., Feinstein, J. A., & Jarvis, W. B. G. (1996). Dispositional differences in cognitive motivation: The life and times of individuals varying in need for cognition. Psychological Bulletin, 119(2), 197–253. https://doi.org/10.1037/0033-2909.119.2.197
Campagna, R. L., Mislin, A. A., Dirks, K. T., & Elfenbein, H. A. (2022). The (mostly) robust influence of initial trustworthiness beliefs on subsequent behaviors and perceptions. Human Relations, 75(7), 1383–1411. https://doi.org/10.1177/00187267211002905
Cerratto Pargman, T. C., & McGrath, C. (2021). Mapping the ethics of learning analytics in higher education: A systematic literature review of empirical research. Journal of Learning Analytics, 8(2), 123–139. https://doi.org/10.18608/jla.2021.1
Chao, C.-Y., Chang, T.-C., Wu, H.-C., Lin, Y.-S., & Chen, P.-C. (2016). The interrelationship between intelligent agents’ characteristics and users’ intention in a search engine by making beliefs and perceived risks mediators. Computers in Human Behavior, 64, 117–125. https://doi.org/10.1016/j.chb.2016.06.031
Choi, S., Jang, Y., & Kim, H. (2023). Influence of pedagogical beliefs and perceived trust on teachers’ acceptance of educational artificial intelligence tools. International Journal of Human–Computer Interaction, 39(4), 910–922. https://doi.org/10.1080/10447318.2022.2049145
Clancey, W. J., & Hoffman, R. R. (2021). Methods and standards for research on explainable artificial intelligence: Lessons from intelligent tutoring systems. Applied AI Letters, 2(4). https://doi.org/10.1002/ail2.53
Conati, C., Barral, O., Putnam, V., & Rieger, L. (2021). Toward personalized XAI: A case study in intelligent tutoring systems. Artificial Intelligence, 298, 103503. https://doi.org/10.1016/j.artint.2021.103503
Conati, C., Porayska-Pomsta, K., & Mavrikis, M. (2018). AI in education needs interpretable machine learning: Lessons from open learner modelling. Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), 14 July 2018, Stockholm, Sweden (pp. 21–27). https://doi.org/10.48550/arXiv.1807.00154
Cramer, H., Evers, V., Ramlal, S., van Someren, M., Rutledge, L., Stash, N., Aroyo, L., & Wielinga, B. (2008). The effects of transparency on trust in and acceptance of a content-based art recommender. User Modeling and User-Adapted Interaction, 18, 455–496. https://doi.org/10.1007/s11257-008-9051-3
Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114–126. https://doi.org/10.1037/xge0000033
Dietvorst, B. J., Simmons, J. P., & Massey, C. (2018). Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Science, 64(3), 1155–1170. https://doi.org/10.1287/mnsc.2016.2643
Dikli, S. (2006). An overview of automated scoring of essays. The Journal of Technology, Learning and Assessment, 5(1). http://ejournals.bc.edu/ojs/index.php/jtla/article/view/1640
Dzindolet, M. T., Peterson, S. A., Pomranky, R. A., Pierce, L. G., & Beck, H. P. (2003). The role of trust in automation reliance. International Journal of Human–Computer Studies, 58(6), 697–718. https://doi.org/10.1016/S1071-5819(03)00038-7
Eiband, M., Schneider, H., Bilandzic, M., Fazekas-Con, J., Haug, M., & Hussmann, H. (2018). Bringing transparency design into practice. Proceedings of the 23rd International Conference on Intelligent User Interfaces (IUI ’18), 7–11 March 2018, Tokyo, Japan (pp. 211–223). https://doi.org/10.1145/3172944.3172961
Esterwood, C., & Robert, L. J. (2021, August 12). Do you still trust me? Human–robot trust repair strategies. Proceedings of the 30th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2021), 8–12 August 2021, Virtual. IEEE Computer Society.
Fenster, M., Zuckerman, I., & Kraus, S. (2012). Guiding user choice during discussion by silence, examples and justifications. Frontiers in Artificial Intelligence and Applications, 242, 330–335. https://doi.org/10.3233/978-1-61499-098-7-330
Ferguson, R. (2019). Ethical challenges for learning analytics. Journal of Learning Analytics, 6(3), 25–30. https://doi.org/10.18608/jla.2019.63.5
Ferguson, R., Hoel, T., Scheffel, M., & Drachsler, H. (2016). Guest editorial: Ethics and privacy in learning analytics. Journal of Learning Analytics, 3(1), 5–15. https://doi.org/10.18608/jla.2016.31.2
Hattie, J. (2008). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487
Herlocker, J. L., Konstan, J. A., & Riedl, J. (2000). Explaining collaborative filtering recommendations. Proceedings of the 2000 Conference on Computer Supported Cooperative Work (CSCW ’00), 2–6 December 2000, Philadelphia, PA, USA (pp. 241–250). ACM Press. https://doi.org/10.1145/358916.358995
Hussein, M. A., Hassan, H., & Nassef, M. (2019). Automated language essay scoring systems: A literature review. PeerJ Computer Science, 5, e208. https://doi.org/10.7717/PEERJ-CS.208
Hütter, M., & Ache, F. (2016). Seeking advice: A sampling approach to advice taking. Judgement and Decision Making, 11(4), 401–415. https://journal.sjdm.org/15/151110a/jdm151110a.pdf
Jessup, S. A., Schneider, T. R., Alarcon, G. M., Ryan, T. J., & Capiola, A. (2019). The measurement of the propensity to trust automation. In J. Y. C. Chen & G. Fragomeni (Eds.), Virtual, augmented and mixed reality: Applications and case studies (pp. 476–489). Lecture Notes in Computer Science, vol. 11575. Springer. https://doi.org/10.1007/978-3-030-21565-1_32
Jian, J.-Y., Bisantz, A. M., & Drury, C. G. (2010). Foundations for an empirically determined scale of trust in automated systems. International Journal of Cognitive Ergonomics, 4(1), 53–71. https://doi.org/10.1207/S15327566IJCE0401_04
Kamath, U., & Liu, J. (2021). Explainable artificial intelligence: An introduction to interpretable machine learning. Springer. https://doi.org/10.1007/978-3-030-83356-5
Khosravi, H., Buckingham Shum, S., Chen, G., Conati, C., Tsai, Y.-S., Kay, J., Knight, S., Martinez-Maldonado, R., Sadiq, S., & Gašević, D. (2022). Explainable artificial intelligence in education. Computers and Education: Artificial Intelligence, 3, 100074. https://doi.org/10.1016/j.caeai.2022.100074
Kim, P. H., Ferrin, D. L., Cooper, C. D., & Dirks, K. T. (2004). Removing the shadow of suspicion: The effects of apology versus denial for repairing competence- versus integrity-based trust violations. Journal of Applied Psychology, 89(1), 104–118. https://doi.org/10.1037/0021-9010.89.1.104
Kim, T., & Song, H. (2021). How should intelligent agents apologize to restore trust? Interaction effects between anthropomorphism and apology attribution on trust repair. Telematics and Informatics, 61, 101595. https://doi.org/10.1016/j.tele.2021.101595
Kizilcec, R. F. (2016). How much information? Effects of transparency on trust in an algorithmic interface. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ʼ16), 7–12 May 2016, San Jose, CA, USA (pp. 2390–2395). ACM Press. https://doi.org/10.1145/2858036.2858402
Knight, S., Buckingham Shum, S., Ryan, P., Sándor, Á., & Wang, X. (2018). Designing academic writing analytics for civil law student self-assessment. International Journal of Artificial Intelligence in Education, 28, 1–28. https://doi.org/10.1007/s40593-016-0121-0
Kulesza, T., Burnett, M., Wong, W.-K., & Stumpf, S. (2015). Principles of explanatory debugging to personalize interactive machine learning. Proceedings of the 20th International Conference on Intelligent User Interfaces (IUI ’15), 29 March–1 April 2015, Atlanta, GA, USA (pp. 126–137). ACM Press. https://doi.org/10.1145/2678025.2701399
Kulesza, T., Stumpf, S., Burnett, M., Yang, S., Kwan, I., & Wong, W.-K. (2013). Too much, too little, or just right? Ways explanations impact end users’ mental models. Proceedings of the 2013 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’13), 15–19 September 2013, San Jose, CA, USA (pp. 3–10). https://doi.org/10.1109/VLHCC.2013.6645235
Lee, J. D., & See, K. A. (2004). Trust in automation: Designing for appropriate reliance. Human Factors, 46(1), 50–80. https://doi.org/10.1518/hfes.46.1.50_30392
Lim, B. Y., & Dey, A. K. (2010). Toolkit to support intelligibility in context-aware applications. Proceedings of the 2010 ACM International Conference on Ubiquitous Computing (UbiComp ’10), 26–29 September 2010, Copenhagen, Denmark (pp. 13–22). ACM Press. https://doi.org/10.1145/1864349.1864353
Lins de Holanda Coelho, G., Hanel, P. H. P., & Wolf, L. J. (2020). The very efficient assessment of need for cognition: Developing a six-item version. Assessment, 27(8), 1870–1885. https://doi.org/10.1177/1073191118793208
Long, Y., & Aleven, V. (2013). Supporting students’ self-regulated learning with an open learner model in a linear equation tutor. In H. C. Lane, K. Yacef, J. Mostow, & P. Pavlik (Eds.), Proceedings of the 16th International Conference on Artificial Intelligence in Education (AIED ʼ13), 9–13 July 2013, Memphis, TN, USA (pp. 219–228). Springer. https://doi.org/10.1007/978-3-642-39112-5_23
Manzey, D., Reichenbach, J., & Onnasch, L. (2012). Human performance consequences of automated decision aids: The impact of degree of automation and system experience. Journal of Cognitive Engineering and Decision Making, 6(1), 57–87. https://doi.org/10.1177/1555343411433844
Matzat, U., & Snijders, C. (2012). Rebuilding trust in online shops on consumer review sites: Sellers’ responses to user-generated complaints. Journal of Computer-Mediated Communication, 18(1), 62–79. https://doi.org/10.1111/J.1083-6101.2012.01594.X
McAuley, E., Duncan, T., Tammen, V. V. (1989). Psychometric properties of the intrinsic motivation inventory in a competitive sport setting: A confirmatory factor analysis. Research Quarterly for Exercise and Sport, 60(1), 48–58. https://doi.org/10.1080/02701367.1989.10607413
Meuwissen, M., & Bollen, L. (2021). Transparency versus explainability in AI. https://doi.org/10.13140/RG.2.2.27466.90561
Möhlmann, M., & Zalmanson, L. (2017). Hands on the wheel: Navigating algorithmic management and Uber driver’s autonomy. Proceedings of the 38th International Conference on Information Systems (ICIS 2017), 10–13 December 2017, Seoul, South Korea. https://www.researchgate.net/publication/319965259
Mohseni, S., Zarei, N., & Ragan, E. D. (2021). A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Transactions on Interactive Intelligent Systems, 11(3–4), 1–45. https://doi.org/10.1145/3387166
Mueller, S. T., Hoffman, R. R., Clancey, W., Emrey, A., & Klein, G. (2019). Explanation in human–AI systems: A literature meta-review, synopsis of key ideas and publications, and bibliography for explainable AI. DARPA XAI Literature Review. https://doi.org/10.48550/arXiv.1902.01876
Mueller, S. T., Veinott, E. S., Hoffman, R. R., Klein, G., Alam, L., Mamun, T., & Clancey, W. J. (2021). Principles of explanation in human–AI systems. Proceedings of the 35th Conference on Artificial Intelligence (AAAI-21), 8–9 February 2021, Virtual. https://doi.org/10.48550/arXiv.2102.04972
Nazaretsky, T., Ariely, M., Cukurova, M., & Alexandron, G. (2022). Teachers’ trust in AI-powered educational technology and a professional development program to improve it. British Journal of Educational Technology, 53(4), 914–931. https://doi.org/10.1111/bjet.13232
Nazaretsky, T., Cukurova, M., & Alexandron, G. (2022). An instrument for measuring teachers’ trust in AI-based educational technology. Proceedings of the 12th International Conference on Learning Analytics and Knowledge (LAK ’22), 21–25 March 2022, Online (pp. 56–66). ACM Press. https://doi.org/10.1145/3506860.3506866
Ooge, J., Kato, S., & Verbert, K. (2022). Explaining recommendations in e-learning: Effects on adolescents’ trust. Proceedings of the 27th International Conference on Intelligent User Interfaces (IUI ’22), 22–25 March 2022, Helsinki, Finland (pp. 93–105). https://doi.org/10.1145/3490099.3511140
Papenmeier, A., Englebienne, G., & Seifert, C. (2019). How model accuracy and explanation fidelity influence user trust. Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19), 10–16 August 2019, Macao, China (pp. 94–100). https://doi.org/10.48550/arXiv.1907.12652
Prahl, A., & van Swol, L. (2017). Understanding algorithm aversion: When is advice from automation discounted? Journal of Forecasting, 36(6), 691–702. https://doi.org/10.1002/FOR.2464
Prinsloo, P., & Slade, S. (2018). Mapping responsible learning analytics: A critical proposal. In B. H. Khan, J. R. Corbeil, & M. E. Corbeil (Eds.), Responsible analytics and data mining in education: Global perspectives on quality, support, and decision-making. Routledge. http://oro.open.ac.uk/55827/
Qin, F., Li, K., & Yan, J. (2020). Understanding user trust in artificial intelligence‐based educational systems: Evidence from China. British Journal of Educational Technology, 51(5), 1693–1710. https://doi.org/10.1111/bjet.12994
Rawal, A., McCoy, J., Rawat, D. B., Sadler, B. M., & St. Amant, R. (2022). Recent advances in trustworthy explainable artificial intelligence: Status, challenges and perspectives. IEEE Transactions on Artificial Intelligence, 3(6), 852–866. https://doi.org/10.1109/TAI.2021.3133846
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), 13–17 August 2016, San Francisco, CA, USA (pp. 1135–1144). ACM Press. https://doi.org/10.1145/2939672.2939778
Rosenthal, S. L., & Dey, A. K. (2010). Towards maximizing the accuracy of human-labeled sensor data. Proceedings of the 15th International Conference on Intelligent User Interfaces (IUI ’10), 7–10 February 2010, Hong Kong, China (pp. 259–268). ACM Press. https://doi.org/10.1145/1719970.1720006
Samek, W., & Müller, K.-R. (2019). Towards explainable artificial intelligence. In W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, & K.-R. Müller (Eds.), Explainable AI: Interpreting, explaining and visualizing deep learning (pp. 5–22). Springer. https://doi.org/10.1007/978-3-030-28954-6_1
Sclater, N. (2016). Developing a code of practice for learning analytics. Journal of Learning Analytics, 3(1), 16–42. https://doi.org/10.18608/jla.2016.31.3
Selwyn, N. (2019). What’s the problem with learning analytics? Journal of Learning Analytics, 6(3), 11–19. https://doi.org/10.18608/jla.2019.63.3
Shin, D. (2021). The effects of explainability and causability on perception, trust, and acceptance: Implications for explainable AI. International Journal of Human–Computer Studies, 146, 102551. https://doi.org/10.1016/j.ijhcs.2020.102551
Slade, S., & Prinsloo, P. (2013). Learning analytics: Ethical issues and dilemmas. American Behavioral Scientist, 57(10), 1510–1529. https://doi.org/10.1177/0002764213479366
Snijders, C., Bober, M., & Matzat, U. (2017). Online reputation in eBay auctions: Damaging and rebuilding trustworthiness through feedback comments from buyers and sellers. In B. Jann & W. Przepiorka (Eds.), Social dilemmas, institutions, and the evolution of cooperation (pp. 421–444). De Gruyter Oldenbourg. https://doi.org/10.1515/9783110472974-020
Tzimas, D., & Demetriadis, S. (2021). Ethical issues in learning analytics: A review of the field. Educational Technology Research and Development, 69, 1101–1133. https://doi.org/10.1007/S11423-021-09977-4
Wang, W., Qiu, L., Kim, D., & Benbasat, I. (2016). Effects of rational and social appeals of online recommendation agents on cognition- and affect-based trust. Decision Support Systems, 86, 48–60. https://doi.org/10.1016/j.dss.2016.03.007
Warren, G., Keane, M. T., & Byrne, R. M. J. (2022). Features of explainability: How users understand counterfactual and causal explanations for categorical and continuous features in XAI. https://doi.org/10.48550/arXiv.2204.10152
Wilson, J., & Czik, A. (2016). Automated essay evaluation software in English language arts classrooms: Effects on teacher feedback, student motivation, and writing quality. Computers & Education, 100, 94–109. https://doi.org/10.1016/J.COMPEDU.2016.05.004
Wisniewski, B., Zierer, K., & Hattie, J. (2020). The power of feedback revisited: A meta-analysis of educational feedback research. Frontiers in Psychology, 10. https://doi.org/10.3389/FPSYG.2019.03087
Yang, L. W., Aggarwal, P., & McGill, A. L. (2020). The 3 C’s of anthropomorphism: Connection, comprehension, and competition. Consumer Psychology Review, 3(1), 3–19. https://doi.org/10.1002/arcp.1054
Yin, M., Wortman Vaughan, J., & Wallach, H. (2019). Understanding the effect of accuracy on trust in machine learning models. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19), 4–9 May 2019, Glasgow, Scotland, UK (pp. 1–12). ACM Press. https://doi.org/10.1145/3290605.3300509
Zumbrunn, S., Marrs, S., & Mewborn, C. (2016). Toward a better understanding of student perceptions of writing feedback: A mixed methods study. Reading and Writing, 29, 349–370. https://doi.org/10.1007/S11145-015-9599-3
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Journal of Learning Analytics
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
TEST