NLP-Based Management of Large Multiple-Choice Test Item Repositories

Valentina Albano; Donatella Firmani; Luigi Laura; Jerin George Mathew; Anna Lucia Paoletti; Irene Torrente

doi:10.18608/jla.2023.7897

Authors

Valentina Albano Dip. Funzione Pubblica
Donatella Firmani Sapienza University https://orcid.org/0000-0003-0358-3208
Luigi Laura Uninettuno University https://orcid.org/0000-0001-6880-8477
Jerin George Mathew Sapienza University https://orcid.org/0000-0002-4626-826X
Anna Lucia Paoletti Dip. Funzione Pubblica
Irene Torrente Formez

DOI:

https://doi.org/10.18608/jla.2023.7897

Keywords:

natural language processing, multiple choice question management, deep learning, similarity computation, graph visualisation, learning analytics, research paper

Abstract

Multiple-choice questions (MCQs) are widely used in educational assessments and professional certification exams. Managing large repositories of MCQs, however, poses several challenges due to the high volume of questions and the need to maintain their quality and relevance over time. One of these challenges is the presence of questions that duplicate concepts but are formulated differently. Such questions can indeed elude syntactic controls but provide no added value to the repository.

In this paper, we focus on this specific challenge and propose a workflow for the discovery and management of potential duplicate questions in large MCQ repositories. Overall, the workflow comprises three main steps: MCQ preprocessing, similarity computation, and finally a graph-based exploration and analysis of the obtained similarity values. For the preprocessing phase, we consider three main strategies: (i) removing the list of candidate answers from each question, (ii) augmenting each question with the correct answer, or (iii) augmenting each question with all candidate answers. Then, we use deep learning–based natural language processing (NLP) techniques, based on the Transformers architecture, to compute similarities between MCQs based on semantics. Finally, we propose a new approach to graph exploration based on graph communities to analyze the similarities and relationships between MCQs in the graph. We illustrate the approach with a case study of the Competenze Digitali program, a large-scale assessment project by the Italian government.

References

Albano, V., Firmani, D., Laura, L., Paoletti, A. L., & Torrente, I. (2022). Managing large multiple-choice test item repositories. In Proceedings of the 26th International Conference on Information Visualisation (IV 2022), 19–22 July 2022, Vienna, Austria (pp. 275–279). IEEE. https://doi.org/10.1109/IV56949.2022.00054

Ausiello, G., Firmani, D., & Laura, L. (2012). Real-time monitoring of undirected networks: Articulation points, bridges, and connected and biconnected components. Networks, 59(3), 275–288. https://doi.org/10.1002/net.21450

Ausiello, G., Firmani, D., & Laura, L. (2013). The (betweenness) centrality of critical nodes and network cores. In Proceedings of the Ninth International Wireless Communications and Mobile Computing Conference (IWCMC 2013), 1–5 July 2013, Cagliari, Sardinia, Italy (pp. 90–95). IEEE. https://doi.org/10.1109/IWCMC.2013.6583540

Azevedo, J. M., Oliveira, E. P., & Damas Beites, P. (2019). Using learning analytics to evaluate the quality of multiple-choice questions: A perspective with classical test theory and item response theory. The International Journal of Information and Learning Technology, 36(4), 322–341. https://doi.org/10.1108/IJILT-02-2019-0023

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051

Brunner, U., & Stockinger, K. (2020). Entity matching with transformer architectures—A step forward in data integration. In Proceedings of the 23rd International Conference on Extending Database Technology (EDBT 2020), 30 March–2 April 2020, Copenhagen, Denmark (pp. 463–473). https://doi.org/10.5441/002/edbt.2020.58

Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., VanderPlas, J., Joly, A., Holt, B., & Varoquaux, G. (2013). API design for machine learning software: Experiences from the Scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 23 September 2013, Prague, Czechia (pp. 108–122). https://www.ecmlpkdd2013.org/wp-content/uploads/2013/09/lml2013_api_sklearn.pdf

Ch, D. R., & Saha, S. K. (2018). Automatic multiple choice question generation from text: A survey. IEEE Transactions on Learning Technologies, 13(1), 14–25. https://doi.org/10.1109/TLT.2018.2889100

Chaturvedi, A., Pandit, O., & Garain, U. (2018). CNN for text-based multiple choice question answering. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), 15–20 July 2018, Melbourne, Australia (pp. 272–277). Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-2044

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), 13–17 August 2016, San Francisco, California, USA (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785

Clauset, A., Newman, M. E. J., & Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6). https://doi.org/10.1103/physreve.70.066111

Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzma´n, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint: 1911.02116. https://doi.org/10.48550/arXiv.1911.02116

Ebraheem, M., Thirumuruganathan, S., Joty, S. R., Ouzzani, M., & Tang, N. (2018). Distributed representations of tuples for entity resolution. Proceedings of the VLDB Endowment, 11(11), 1454–1467. https://doi.org/10.14778/3236187.3236198

Efthymiou, V., Stefanidis, K., Pitoura, E., & Christophides, V. (2021). FairER: Entity resolution with fairness constraints. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM 2021), 1–5 November 2021, Queensland, Australia (online) (pp. 3004–3008). ACM. https://doi.org/10.1145/3459637.3482105

Firmani, D., Galhotra, S., Saha, B., & Srivastava, D. (2018). Robust entity resolution using a CrowdOracle. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 41(2), 91–103. http://sites.computer.org/debull/A18june/p91.pdf

Galhotra, S., Firmani, D., Saha, B., & Srivastava, D. (2021). Efficient and effective ER with progressive blocking. The VLDB Journal, 30(4), 537–557. https://doi.org/10.1007/s00778-021-00656-7

Getoor, L., & Machanavajjhala, A. (2012). Entity resolution: Theory, practice & open challenges. Proceedings of the VLDB Endowment, 5(12), 2018–2019. https://doi.org/10.14778/2367502.2367564

Goldberg, Y., & Levy, O. (2014). Word2vec explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint: 1402.3722. https://doi.org/10.48550/arXiv.1402.3722

Ha, L., & Yaneva, V. (2018). Automatic distractor suggestion for multiple-choice tests using concept embeddings and information retrieval. In J. Tetreault, J. Burstein, E. Kochmar, C. Leacock, & H. Yannakoudakis (Eds.), Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications, 5 June 2018, New Orleans, Louisiana, USA (pp. 389–398). Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-0548

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Jain, S., Khangarot, H., & Singh, S. (2019). Journal recommendation system using content-based filtering. In J. Kalita, V. E. Balas, S. Borah, & R. Pradhan (Eds.), Recent developments in machine learning and data analytics (pp. 99–108). Springer. https://doi.org/10.1007/978-981-13-1280-9_9

Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169–15211. https://doi.org/10.1007/s11042-018-6894-4

Kamienski, A., Hindle, A., & Bezemer, C.-P. (2023). Analyzing techniques for duplicate question detection on Q&A websites for game developers. Empirical Software Engineering, 28(1), 17. https://doi.org/10.1007/s10664-022-10256-w

Kanwal, S., Nawaz, S., Malik, M. K., & Nawaz, Z. (2021). A review of text-based recommendation systems. IEEE Access, 9, 31638–31661. https://doi.org/10.1109/ACCESS.2021.3059312

Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., & Hajishirzi, H. (2020, November). UNIFIEDQA: Crossing format boundaries with a single QA system. In Findings of the Association for Computational Linguistics (EMNLP 2020), 16–20 November 2020, online (pp. 1896–1907). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.171

Kumar, A. P., Nayak, A., Ghosh, K., et al. (2023). A novel framework for the generation of multiple choice question stems using semantic and machine-learning techniques. International Journal of Artificial Intelligence in Education, 1–44. https://doi.org/10.1007/s40593-023-00333-6

Li, Y., Yao, L., Du, N., Gao, J., Li, Q., Meng, C., Zhang, C., & Fan, W. (2018). Finding similar medical questions from question answering websites. arXiv preprint: 1810.05983. https://doi.org/10.48550/arXiv.1810.05983

Long, P., Siemens, G., Conole, G., & Gasevic, D. (Eds.). (2011). Proceedings of the First International Conference on Learning Analytics and Knowledge (LAK 2011), 27 February–1 March 1, 2011, Banff, Alberta, Canada. ACM. https://doi.org/10.1145/2090116

Martinez-Gil, J., Freudenthaler, B., & Tjoa, A. M. (2019). Multiple choice question answering in the legal domain using reinforced co-occurrence. In S. Hartmann, J. Ku¨ng, S. Chakravarthy, G. Anderst-Kotsis, A. M. Tjoa, & I. Khalil (Eds.), Proceedings of the 30th International Conference on Database and Expert Systems Applications (DEXA 2019), 26–29 August 2019, Linz, Austria (pp. 138–148). Springer. https://doi.org/10.1007/978-3-030-27615-7_10

Mia, M. R., & Latiful Hoque, A. S. M. (2019). Question bank similarity searching system (QB3S) using LP and information retrieval technique. In Proceedings of the First International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT 2019), 3–5 May 2019, Dhaka, Bangladesh (pp. 1–7). https://doi.org/10.1109/ICASERT.2019.8934449

Mitkov, R., Le An, H., & Karamanis, N. (2006). A computer-aided environment for generating multiple-choice test items. Natural Language Engineering, 12(2), 177–194. https://doi.org/10.1017/S1351324906004177

Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., Deep, R., Arcaute, E., & Raghavendra, V. (2018). Deep learning for entity matching: A design space exploration. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD 2018), 10–15 June 2018, Houston, Texas, USA (pp. 19–34). ACM. https://doi.org/10.1145/3183713.3196926

Mukherjee, S., & Kumar, N. S. (2019). Duplicate question management and answer verification system. In Proceedings of the IEEE 10th International Conference on Technology for Education (T4E 2019), 9–11 December 2019, Goa, India (pp. 266–267). https://doi.org/10.1109/T4E.2019.00067

Papasalouros, A., Kanaris, K., & Kotis, K. (2008). Automatic generation of multiple choice questions from domain ontologies. In IADIS International Conference on e-Learning 2008, 22–25 July 2008, Amsterdam, Netherlands (pp. 427–434, Vol. 1). IADIS. https://www.iadisportal.org/digital-library/automatic-generation-of-multiple-choice-questions-from-domain-ontologies

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://www.jmlr.org/papers/v12/pedregosa11a.html

Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 3–7 November 2019, Hong Kong, China. Association for Computational Linguistics. https://doi.org/10.48550/arXiv.1908.10084

Reimers, N., & Gurevych, I. (2020). Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), 16–20 November 2020, online (pp. 4512–4525). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.365

Salloum, S. A., Khan, R., & Shaalan, K. (2020). A survey of semantic analysis approaches. In A.-E. Hassanien, A. T. Azar, T. Gaber, D. Oliva, & F. M. Tolba (Eds.), Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV 2020), 8–10 April 2020, Cairo, Egypt (pp. 61–70). Springer. https://doi.org/10.1007/978-3-030-44289-7_6

Samarinas, C., & Zafeiriou, S. (2019). Personalized high quality news recommendations using word embeddings and text classification models. EasyChair Preprint, 1254, 2019.

Song, K., Tan, X., Qin, T., Lu, J., & Liu, T.-Y. (2020). Mpnet: Masked and permuted pre-training for language understanding. In Proceedings of the 34th Conference on Advances in Neural Information Processing Systems (NeurIPS 2020), 6–12 December 2020, online (pp. 16857–16867, Vol. 33). https://proceedings.neurips.cc/paper files/paper/2020/hash/c3a690be93aa602ee2dc0ccab5b7b67e-Abstract.html

Tarrant, M., & Ware, J. (2012). A framework for improving the quality of multiple-choice assessments. Nurse Educator, 37(3), 98–104. https://doi.org/10.1097/NNE.0b013e31825041d0

Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. arXiv preprint: 2002.10957. https://doi.org/10.48550/arXiv.2002.10957

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., . . . Rush, A. (2020). Transformers: State-of-the-art natural language processing. In Q. Liu & D. Schlangen (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP 2020), 16–20 November 2020, online (pp. 38–45). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-demos.6

Wood, T. J. (2009). The effect of reused questions on repeat examinees. Advances in Health Sciences Education, 14, 465–473. https://doi.org/10.1007/s10459-008-9129-z

Yang, Y., Cer, D., Ahmad, A., Guo, M., Law, J., Constant, N., Abrego, G. H., Yuan, S., Tar, C., Sung, Y.-H., Strope, B., & Kurzweil, R. (2019). Multilingual universal sentence encoder for semantic retrieval. arXiv preprint: 1907.04307. https://doi.org/10.48550/arXiv.1907.04307

NLP-Based Management of Large Multiple-Choice Test Item Repositories

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License