NLP-Based Management of Large Multiple-Choice Test Item Repositories
DOI:
https://doi.org/10.18608/jla.2023.7897Keywords:
natural language processing, multiple choice question management, deep learning, similarity computation, graph visualisation, learning analytics, research paperAbstract
Multiple-choice questions (MCQs) are widely used in educational assessments and professional certification exams. Managing large repositories of MCQs, however, poses several challenges due to the high volume of questions and the need to maintain their quality and relevance over time. One of these challenges is the presence of questions that duplicate concepts but are formulated differently. Such questions can indeed elude syntactic controls but provide no added value to the repository.
In this paper, we focus on this specific challenge and propose a workflow for the discovery and management of potential duplicate questions in large MCQ repositories. Overall, the workflow comprises three main steps: MCQ preprocessing, similarity computation, and finally a graph-based exploration and analysis of the obtained similarity values. For the preprocessing phase, we consider three main strategies: (i) removing the list of candidate answers from each question, (ii) augmenting each question with the correct answer, or (iii) augmenting each question with all candidate answers. Then, we use deep learning–based natural language processing (NLP) techniques, based on the Transformers architecture, to compute similarities between MCQs based on semantics. Finally, we propose a new approach to graph exploration based on graph communities to analyze the similarities and relationships between MCQs in the graph. We illustrate the approach with a case study of the Competenze Digitali program, a large-scale assessment project by the Italian government.
References
Albano, V., Firmani, D., Laura, L., Paoletti, A. L., & Torrente, I. (2022). Managing large multiple-choice test item repositories. In Proceedings of the 26th International Conference on Information Visualisation (IV 2022), 19–22 July 2022, Vienna, Austria (pp. 275–279). IEEE. https://doi.org/10.1109/IV56949.2022.00054
Ausiello, G., Firmani, D., & Laura, L. (2012). Real-time monitoring of undirected networks: Articulation points, bridges, and connected and biconnected components. Networks, 59(3), 275–288. https://doi.org/10.1002/net.21450
Ausiello, G., Firmani, D., & Laura, L. (2013). The (betweenness) centrality of critical nodes and network cores. In Proceedings of the Ninth International Wireless Communications and Mobile Computing Conference (IWCMC 2013), 1–5 July 2013, Cagliari, Sardinia, Italy (pp. 90–95). IEEE. https://doi.org/10.1109/IWCMC.2013.6583540
Azevedo, J. M., Oliveira, E. P., & Damas Beites, P. (2019). Using learning analytics to evaluate the quality of multiple-choice questions: A perspective with classical test theory and item response theory. The International Journal of Information and Learning Technology, 36(4), 322–341. https://doi.org/10.1108/IJILT-02-2019-0023
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051
Brunner, U., & Stockinger, K. (2020). Entity matching with transformer architectures—A step forward in data integration. In Proceedings of the 23rd International Conference on Extending Database Technology (EDBT 2020), 30 March–2 April 2020, Copenhagen, Denmark (pp. 463–473). https://doi.org/10.5441/002/edbt.2020.58
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., VanderPlas, J., Joly, A., Holt, B., & Varoquaux, G. (2013). API design for machine learning software: Experiences from the Scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 23 September 2013, Prague, Czechia (pp. 108–122). https://www.ecmlpkdd2013.org/wp-content/uploads/2013/09/lml2013_api_sklearn.pdf
Ch, D. R., & Saha, S. K. (2018). Automatic multiple choice question generation from text: A survey. IEEE Transactions on Learning Technologies, 13(1), 14–25. https://doi.org/10.1109/TLT.2018.2889100
Chaturvedi, A., Pandit, O., & Garain, U. (2018). CNN for text-based multiple choice question answering. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), 15–20 July 2018, Melbourne, Australia (pp. 272–277). Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-2044
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), 13–17 August 2016, San Francisco, California, USA (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785
Clauset, A., Newman, M. E. J., & Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6). https://doi.org/10.1103/physreve.70.066111
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzma´n, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint: 1911.02116. https://doi.org/10.48550/arXiv.1911.02116
Ebraheem, M., Thirumuruganathan, S., Joty, S. R., Ouzzani, M., & Tang, N. (2018). Distributed representations of tuples for entity resolution. Proceedings of the VLDB Endowment, 11(11), 1454–1467. https://doi.org/10.14778/3236187.3236198
Efthymiou, V., Stefanidis, K., Pitoura, E., & Christophides, V. (2021). FairER: Entity resolution with fairness constraints. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM 2021), 1–5 November 2021, Queensland, Australia (online) (pp. 3004–3008). ACM. https://doi.org/10.1145/3459637.3482105
Firmani, D., Galhotra, S., Saha, B., & Srivastava, D. (2018). Robust entity resolution using a CrowdOracle. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 41(2), 91–103. http://sites.computer.org/debull/A18june/p91.pdf
Galhotra, S., Firmani, D., Saha, B., & Srivastava, D. (2021). Efficient and effective ER with progressive blocking. The VLDB Journal, 30(4), 537–557. https://doi.org/10.1007/s00778-021-00656-7
Getoor, L., & Machanavajjhala, A. (2012). Entity resolution: Theory, practice & open challenges. Proceedings of the VLDB Endowment, 5(12), 2018–2019. https://doi.org/10.14778/2367502.2367564
Goldberg, Y., & Levy, O. (2014). Word2vec explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint: 1402.3722. https://doi.org/10.48550/arXiv.1402.3722
Ha, L., & Yaneva, V. (2018). Automatic distractor suggestion for multiple-choice tests using concept embeddings and information retrieval. In J. Tetreault, J. Burstein, E. Kochmar, C. Leacock, & H. Yannakoudakis (Eds.), Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications, 5 June 2018, New Orleans, Louisiana, USA (pp. 389–398). Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-0548
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Jain, S., Khangarot, H., & Singh, S. (2019). Journal recommendation system using content-based filtering. In J. Kalita, V. E. Balas, S. Borah, & R. Pradhan (Eds.), Recent developments in machine learning and data analytics (pp. 99–108). Springer. https://doi.org/10.1007/978-981-13-1280-9_9
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169–15211. https://doi.org/10.1007/s11042-018-6894-4
Kamienski, A., Hindle, A., & Bezemer, C.-P. (2023). Analyzing techniques for duplicate question detection on Q&A websites for game developers. Empirical Software Engineering, 28(1), 17. https://doi.org/10.1007/s10664-022-10256-w
Kanwal, S., Nawaz, S., Malik, M. K., & Nawaz, Z. (2021). A review of text-based recommendation systems. IEEE Access, 9, 31638–31661. https://doi.org/10.1109/ACCESS.2021.3059312
Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., & Hajishirzi, H. (2020, November). UNIFIEDQA: Crossing format boundaries with a single QA system. In Findings of the Association for Computational Linguistics (EMNLP 2020), 16–20 November 2020, online (pp. 1896–1907). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.171
Kumar, A. P., Nayak, A., Ghosh, K., et al. (2023). A novel framework for the generation of multiple choice question stems using semantic and machine-learning techniques. International Journal of Artificial Intelligence in Education, 1–44. https://doi.org/10.1007/s40593-023-00333-6
Li, Y., Yao, L., Du, N., Gao, J., Li, Q., Meng, C., Zhang, C., & Fan, W. (2018). Finding similar medical questions from question answering websites. arXiv preprint: 1810.05983. https://doi.org/10.48550/arXiv.1810.05983
Long, P., Siemens, G., Conole, G., & Gasevic, D. (Eds.). (2011). Proceedings of the First International Conference on Learning Analytics and Knowledge (LAK 2011), 27 February–1 March 1, 2011, Banff, Alberta, Canada. ACM. https://doi.org/10.1145/2090116
Martinez-Gil, J., Freudenthaler, B., & Tjoa, A. M. (2019). Multiple choice question answering in the legal domain using reinforced co-occurrence. In S. Hartmann, J. Ku¨ng, S. Chakravarthy, G. Anderst-Kotsis, A. M. Tjoa, & I. Khalil (Eds.), Proceedings of the 30th International Conference on Database and Expert Systems Applications (DEXA 2019), 26–29 August 2019, Linz, Austria (pp. 138–148). Springer. https://doi.org/10.1007/978-3-030-27615-7_10
Mia, M. R., & Latiful Hoque, A. S. M. (2019). Question bank similarity searching system (QB3S) using LP and information retrieval technique. In Proceedings of the First International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT 2019), 3–5 May 2019, Dhaka, Bangladesh (pp. 1–7). https://doi.org/10.1109/ICASERT.2019.8934449
Mitkov, R., Le An, H., & Karamanis, N. (2006). A computer-aided environment for generating multiple-choice test items. Natural Language Engineering, 12(2), 177–194. https://doi.org/10.1017/S1351324906004177
Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., Deep, R., Arcaute, E., & Raghavendra, V. (2018). Deep learning for entity matching: A design space exploration. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD 2018), 10–15 June 2018, Houston, Texas, USA (pp. 19–34). ACM. https://doi.org/10.1145/3183713.3196926
Mukherjee, S., & Kumar, N. S. (2019). Duplicate question management and answer verification system. In Proceedings of the IEEE 10th International Conference on Technology for Education (T4E 2019), 9–11 December 2019, Goa, India (pp. 266–267). https://doi.org/10.1109/T4E.2019.00067
Papasalouros, A., Kanaris, K., & Kotis, K. (2008). Automatic generation of multiple choice questions from domain ontologies. In IADIS International Conference on e-Learning 2008, 22–25 July 2008, Amsterdam, Netherlands (pp. 427–434, Vol. 1). IADIS. https://www.iadisportal.org/digital-library/automatic-generation-of-multiple-choice-questions-from-domain-ontologies
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://www.jmlr.org/papers/v12/pedregosa11a.html
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 3–7 November 2019, Hong Kong, China. Association for Computational Linguistics. https://doi.org/10.48550/arXiv.1908.10084
Reimers, N., & Gurevych, I. (2020). Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), 16–20 November 2020, online (pp. 4512–4525). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.365
Salloum, S. A., Khan, R., & Shaalan, K. (2020). A survey of semantic analysis approaches. In A.-E. Hassanien, A. T. Azar, T. Gaber, D. Oliva, & F. M. Tolba (Eds.), Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV 2020), 8–10 April 2020, Cairo, Egypt (pp. 61–70). Springer. https://doi.org/10.1007/978-3-030-44289-7_6
Samarinas, C., & Zafeiriou, S. (2019). Personalized high quality news recommendations using word embeddings and text classification models. EasyChair Preprint, 1254, 2019.
Song, K., Tan, X., Qin, T., Lu, J., & Liu, T.-Y. (2020). Mpnet: Masked and permuted pre-training for language understanding. In Proceedings of the 34th Conference on Advances in Neural Information Processing Systems (NeurIPS 2020), 6–12 December 2020, online (pp. 16857–16867, Vol. 33). https://proceedings.neurips.cc/paper files/paper/2020/hash/c3a690be93aa602ee2dc0ccab5b7b67e-Abstract.html
Tarrant, M., & Ware, J. (2012). A framework for improving the quality of multiple-choice assessments. Nurse Educator, 37(3), 98–104. https://doi.org/10.1097/NNE.0b013e31825041d0
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. arXiv preprint: 2002.10957. https://doi.org/10.48550/arXiv.2002.10957
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., . . . Rush, A. (2020). Transformers: State-of-the-art natural language processing. In Q. Liu & D. Schlangen (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP 2020), 16–20 November 2020, online (pp. 38–45). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-demos.6
Wood, T. J. (2009). The effect of reused questions on repeat examinees. Advances in Health Sciences Education, 14, 465–473. https://doi.org/10.1007/s10459-008-9129-z
Yang, Y., Cer, D., Ahmad, A., Guo, M., Law, J., Constant, N., Abrego, G. H., Yuan, S., Tar, C., Sung, Y.-H., Strope, B., & Kurzweil, R. (2019). Multilingual universal sentence encoder for semantic retrieval. arXiv preprint: 1907.04307. https://doi.org/10.48550/arXiv.1907.04307
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Journal of Learning Analytics
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
TEST