De-identification is Insufficient to Protect Student Privacy, or – What Can a Field Trip Reveal?

Authors

DOI:

https://doi.org/10.18608/jla.2021.7353

Keywords:

learning analytics, privacy, re-identification

Abstract

Learning analytics have the potential to improve teaching and learning in K–12 education, but as student data is increasingly being collected and transferred for the purpose of analysis, it is important to take measures that will protect student privacy. A common approach to achieve this goal is the de-identification of the data, meaning the removal of personal details that can reveal student identity. However, as we demonstrate, de-identification alone is not a complete solution. We show how we can discover sensitive information about students by linking de-identified datasets with publicly available school data, using unsupervised machine learning techniques. This underlines that de-identification alone is insufficient if we wish to further learning analytics in K–12 without compromising student privacy.

References

Barbaro, M., & Zeller, T. (2006, 01). A face is exposed for AOL searcher no. 4417749. New York Times. (Accessed May 20, 2021) Retrieved from http://shawndra.pbworks.com/f/A+Face+Is+Exposed+for+AOL+Searcher+No.+4417749+-+New+York+T.pdf

Daries, J. P., Reich, J., Waldo, J., Young, E. M., Whittinghill, J., Ho, A. D., . . . Chuang, I. (2014). Privacy, anonymity, and big data in the social sciences. Communications of the ACM, 57(9), 56–63. https://doi.org/10.1145/2643132

Dwork, C. (2008). Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation (TAMC 2008), 25–29 April, Xi’an, China (pp. 1–19). Springer. https://doi.org/10.1007/978-3-540-79228-4-1

EDP. (2020). Education During COVID-19; Moving towards e-Learning. (EUROPEAN DATA PORTAL; accessed May 12, 2021) Retrieved from https://www.europeandataportal.eu/en/impact-studies/covid-19/education-during-covid-19-moving-towards-e-learning

EDUCAUSE. (2015). Guidelines for Data De-identification or Anonymization. (Accessed May 12, 2021) Retrieved from https://www.educause.edu/focus-areas-and-initiatives/policy-and-security/cybersecurity-program/resources/information-security-guide/

Henriksen-Bulmer, J., & Jeary, S. (2016). Re-identification attacks—A systematic literature review. International Journal of Information Management, 36, 1184–1192. https://doi.org/10.1016/j.ijinfomgt.2016.08.002

Hoel, T., & Chen,W. (2016). Privacy-driven design of learning analytics applications: Exploring the design space of solutions for data sharing and interoperability. Journal of Learning Analytics, 3(1), 139–158. https://doi.org/10.18608/jla.2016.31.9

Hoel, T., & Chen, W. (2019). Privacy engineering for learning analytics in a global market: Defining a point of reference. The International Journal of Information and Learning Technology, 36(4), 288–298. https://doi.org/10.1108/IJILT-02-2019-0025

Hoel, T., Griffiths, D., & Chen,W. (2017). The influence of data protection and privacy frameworks on the design of learning analytics systems. In Proceedings of the Seventh International Conference on Learning Analytics and Knowledge (LAK 2017), 13–17 March 2017, Vancouver, BC, Canada (pp. 243–252). New York: ACM. https://doi.org/10.1145/3027385.3027414

Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. https://doi.org/10.1007/BF01908075

Kabir, S., Wagner, C., Havens, T. C., Anderson, D. T., & Aickelin, U. (2017). Novel similarity measure for interval-valued data based on overlapping ratio. In Proceedings of the 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2017), 9–12 July 2017, Naples, Italy (pp. 1–6). IEEE. https://doi.org/10.1109/FUZZ-IEEE.2017.8015623

Kay, J., & Kummerfeld, B. (2019). From data to personal user models for life-long, life-wide learners. British Journal of Educational Technology, 50(6), 2871–2884. https://doi.org/10.1111/bjet.12878

Khalil, M., & Ebner, M. (2016). De-identification in learning analytics. Journal of Learning Analytics, 3(1), 129–138. https://doi.org/10.18608/jla.2016.31.8

Kitto, K., & Knight, S. (2019). Practical ethics for building learning analytics. British Journal of Educational Technology, 50(6), 2855–2870. https://doi.org/10.1111/bjet.12868

Krueger, K. R., & Moore, B. (2015). New technology “clouds” student data privacy. Phi Delta Kappan, 96(5), 19–24. https://doi.org/10.1177/0031721715569464

Li, C., & Lalani, F. (2020). The COVID-19 Pandemic Has Changed Education Forever. This Is How. (Accessed May 12, 2021) Retrieved from https://www.weforum.org/agenda/2020/04/coronavirus-education-global-covid19-online-digital-learning/

Macfadyen, L. (2017). What does a learning analytics practitioner need to know? In Proceedings of the Workshop on Methodology in Learning Analytics and the Workshop on Building the Learning Analytics Curriculum (LAK 2017), 13–17 March 2017, Vancouver, BC, Canada.

Narayanan, A. R. V., & Felten, E. W. (2014). No Silver Bullet: De-identification Still Doesn’t Work. (Accessed May 12, 2021) Retrieved from http://www.randomwalker.info/publications/no-silver-bullet-de-identification.pdf

Nissenbaum, H. (2009). Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford University Press. https://doi.org/10.1515/9780804772891

OECD. (2005). Glossary of Statistical Terms: Quasi-identifier. (Accessed May 12, 2021) Retrieved from https://stats.oecd.org/glossary/detail.asp?ID=6961

Ohm, P. (2009). Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57, 1701–1777.

Pardo, A., & Siemens, G. (2014). Ethical and privacy principles for learning analytics. British Journal of Educational Technology, 45(3), 438–450. https://doi.org/10.1111/bjet.12152

Peddy, A. M. (2017). Dangerous classroom “app”-titude: Protecting student privacy from third-party educational service providers. Brigham Young University Education and Law Journal, 2017(1), 125–159. (Accessed May 12, 2021) Retrieved from https://digitalcommons.law.byu.edu/cgi/viewcontent.cgi?article=1395context=elj

Peterson, D. (2016). Edtech and student privacy: California law as a model. Berkeley Technology Law Journal, 31, 961–996. Retrieved from https://btlj.org/data/articles2016/vol31/31ar=09610996PetersonWEB:pdf

Reidenberg, J. R. (2015). Hearing testimony on how emerging technology affects student privacy. In Hearing before the U.S. Congress, House Committee on Education and the Workforce, Subcommittee on Early Childhood, Elementary and Secondary Education, 114th Congress, 12 February 2015, Washington, DC, USA. Retrieved from https://www.govinfo.gov/content/pkg/CHRG-114hhrg93208/pdf/CHRG-114hhrg93208.pdf

Reidenberg, J. R., & Schaub, F. (2018). Achieving big data privacy in education. Theory and Research in Education, 16(3), 263–279. https://doi.org/10.1177/1477878518805308

Roy, S., & Singh, S. N. (2017). Emerging trends in applications of big data in educational data mining and learning analytics. In Proceedings of the Seventh International Conference on Cloud Computing, Data Science Engineering—Confluence, 12–13 January 2017, Noida, India (pp. 193–198). IEEE. https://doi.org/10.1109/CONFLUENCE.2017.7943148

Rubel, A., & Jones, K. (2016). Student privacy in learning analytics: An information ethics perspective. The Information Society, 32, 143–159. https://doi.org/10.1080/01972243.2016.1130502

Siemens, G. (2013). Learning analytics: The emergence of a discipline. American Behavioral Scientist, 57(10), 1380–1400. https://doi.org/10.1177/0002764213498851

Singer, N. (2015). Data security gaps in an industry student privacy pledge. (Accessed May 12, 2021) Retrieved from https://bits.blogs.nytimes.com/2015/02/11/data-security-gaps-in-an-industry-student-privacy-pledge/?_r=0

Solove, D. (2005). A taxonomy of privacy. University of Pennsylvania Law Review, 154(3), 477–564. https://doi.org/10.2307/40041279

Strauss, V. (11 April 2014). $100 million Gates-funded student data project ends in failure. Washington Post. (Accessed May 12, 2021) Retrieved from https://www.washingtonpost.com/news/answer-sheet/wp/2014/04/21/100-million-gates-funded-student-data-project-ends-in-failure/

Sweeney, L. (2000). Simple demographics often identify people uniquely. Health (San Francisco), 671, 1–34. Retrieved from https://doi.org/10.1184/R1/6625769.v1

Sweeney, L. (2015). Only you, your doctor, and many others may know. Technology Science. Retrieved from https://techscience.org/a/2015092903/

Taylor, L., Floridi, L., & Sloot, B. (2017). Group Privacy: New Challenges of Data Technologies. Springer. https://doi.org/10.1007/978-3-319-46608-8

Tibshirani, R.,Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411–423. https://doi.org/10.1111/1467-9868.00293

UNESCO. (2020). National Education Responses to COVID-19: Summary Report of UNESCO’s Online Survey. (Accessed May 12, 2021) Retrieved from https://unesdoc.unesco.org/ark:/48223/pf0000373322

Zeide, E., & Nissenbaum, H. (2018). Learner privacy in MOOCs and virtual education. Theory and Research in Education, 16(3), 280–307. https://doi.org/10.19173/irrodl.v21i4.4643

Downloads

Published

2021-09-03

How to Cite

Yacobson, E., Fuhrman, O., Hershkovitz, S., & Alexandron, G. (2021). De-identification is Insufficient to Protect Student Privacy, or – What Can a Field Trip Reveal?. Journal of Learning Analytics, 8(2), 83-92. https://doi.org/10.18608/jla.2021.7353

Issue

Section

Special Section: Learning Analytics for Primary and Secondary Schools