A User’s Guide to Topological Data Analysis
DOI:
https://doi.org/10.18608/jla.2017.42.6Keywords:
Topological Data Analysis, Persistent Homology, MapperAbstract
Topological data analysis (TDA) is a collection of powerful tools that can quantify shape and structure in data in order to answer questions from the data’s domain. This is done by representing some aspect of the structure of the data in a simplified topological signature. In this article, we introduce two of the most commonly used topological signatures. First, the persistence diagram represents loops and holes in the space by considering connectivity of the data points for a continuum of values rather than a single fixed value. The second topological signature, the mapper graph, returns a 1-dimensional structure representing the shape of the data, and is particularly good for exploration and visualization of the data. While these techniques are based on very sophisticated mathematics, the current ubiquity of available software means that these tools are more accessible than ever to be applied to data by researchers in education and learning, as well as all domain scientists.
References
Ayasdi. (2016). Ayasdi, Inc. http://www.ayasdi.com.
Adams, H., & Carlsson, G. (2015). Evasion paths in mobile sensor networks. The International Journal of Robotics Research, 34(1):90–104.
Adcock, A., Carlsson, E., & Carlsson, G. (2016). The ring of algebraic functions on persistence bar codes. Homology, Homotopy and Applications, 18(1):381–402.
Bauer, U. (2016). Ripser. https://github.com/Ripser/ripser.
Biasotti, S., Giorgi, D., Spagnuolo, M., & Falcidieno, B. (2008). Reeb graphs for shape analysis and applications. Theoretical Computer Science: Computational Algebraic Geometry and Applications, 392(13):5 – 22.
Boriah, S., Chandola, V., & Kumar, V. (2008). Similarity Measures for Categorical Data: A Comparative Evaluation, chapter 21, pages 243–254.
Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. Journal of Machine Learning Research, 16:77–102.
(Carlsson, G. (2009). Topology and data. Bulletin of the American Mathematical Society, 46(2):255–308. Survey.
Carlsson, G., Ishkhanov, T., de Silva, V., & Zomorodian, A. (2008). On the local behavior of spaces of natural images. International Journal of Computer Vision, 76:1–12. 10.1007/s11263-007-0056-x.
Carlsson, G., & Zomorodian, A. (2009). The theory of multidimensional persistence. Discrete & Computational Geometry, 42(1):71–93.
Chan, J. M., Carlsson, G., & Rabadan, R. (2013). Topology of viral evolution. Proceedings of the National Academy of Sciences.
Chazal, F., & Sun, J. (2014). Gromov-hausdorff approximation of filament structure using reeb-type graph. In Proceedings of the Thirtieth Annual Symposium on Computational Geometry, SOCG’14, pages 491:491–491:500, New York, NY, USA. ACM.
Cohen-Steiner, D., Edelsbrunner, H., & Harer, J. (2007). Stability of persistence diagrams. Discrete & Computational Geometry, 37(1):103–120.
Dabaghian, Y., Mémoli, F., Frank, L., & Carlsson., G. (2012). A topological paradigm for hippocampal spatial map formation using persistent homology. PLoS Comput Biol, 8(8):e1002581.
de Silva, V., & Ghrist, R. (2007). Coverage in sensor networks via persistent homology. Algebraic & Geometric Topology, 7:339–358.
de Silva, V., Morozov, D., & Vejdemo-Johansson, M. (2011). Persistent cohomology and circular coordinates. Discrete & Computational Geometry, 45(4):737–759.
de Silva, V., Munch, E., & Patel, A. (2016). Categorified Reeb graphs. Discrete & Computational Geometry, pages 1–53.
Edelsbrunner, Letscher, & Zomorodian (2002). Topological persistence and simplification. Discrete & Computational Geometry, 28(4):511–533.
Edelsbrunner, H., & Harer, J. (2010). Computational Topology: An Introduction. American Mathematical Society.
Emrani, S., Gentimis, T., & Krim, H. (2014). Persistent homology of delay embeddings and its application to wheeze detection. Signal Processing Letters, IEEE, 21(4):459–463.
Escolano, F., Hancock, E., & Biasotti, S. (2013). Complexity fusion for indexing Reeb digraphs. In Wilson, R., Hancock, E., Bors, A., & Smith, W., editors, Computer Analysis of Images and Patterns, volume 8047 of Lecture Notes in Computer Science, pages 120–127. Springer Berlin Heidelberg.
Fasy, B. T., Kim, J., Lecci, F., & Maria, C. (2014a). Introduction to the R package TDA. arXiv: 1411.1830.
Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S., & Singh, A. (2014b). Confidence sets for persistence diagrams. Annals of Statistics, 42(6):2301–2339.
Ge, X., Safa, I. I., Belkin, M., & Wang, Y. (2011). Data skeletonization via Reeb graphs. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K., editors, Advances in Neural Information Processing Systems 24, pages 837–845.
Ghrist, R. (2008). Barcodes: The persistent topology of data. Bulletin of the American Mathematical Society, 45:61–75. Survey.
Ghrist, R. (2014). Elementary Applied Topology.
Giusti, C., Pastalkova, E., Curto, C., & Itskov, V. (2015). Clique topology reveals intrinsic geometric structure in neural correlations. Proceedings of the National Academy of Sciences.
Hatcher, A. (2002). Algebraic Topology. Cambridge University Press.
Hilaga, M., Shinagawa, Y., Kohmura, T., & Kunii, T. L. (2001). Topology matching for fully automatic similarity estimation of 3D shapes. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’01, pages 203–212, New York, NY, USA. ACM.
Kaczynski, T., Mischaikow, K., & Mrozek, M. (2006). Computational homology, volume 157. Springer Science & Business Media.
Kerber, M., Morozov, D., & Nigmetov, A. (2016). Geometry Helps to Compare Persistence Diagrams, chapter 8, pages 103–112.
Khasawneh, F. A., & Munch, E. (2016). Chatter detection in turning using persistent homology. Mechanical Systems and Signal Processing, 70-71:527–541.
Lesnick, M., & Wright, M. (2015). Interactive visualization of 2-D persistence modules. arXiv preprint arXiv:1512.00180.
Lesnick, M., & Wright, M. (2016). Rivet: the rank invariant visualization and exploration tool. http://rivet.online/.
Li, L., Cheng, W.-Y., Glicksberg, B. S., Gottesman, O., Tamler, R., Chen, R., Bottinger, E. P., & Dudley, J. T. (2015). Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Science Translational Medicine, 7(311):311ra174–311ra174.
Mischaikow, K., & Nanda, V. (September 2013). Morse theory for filtrations and efficient computation of persistent homology. Discrete & Computational Geometry, 50(2):330–353.
Morozov, D. (2015). Dionysus. http://www.mrzv.org/software/dionysus/.
Müllner, D., & Babu, A. (2013). Python mapper: An open-source toolchain for data exploration, analysis and visualization. http://danifold.net/mapper.
Munch, E., Shapiro, M., & Harer, J. (2012). Failure filtrations for fenced sensor networks. The International Journal of Robotics Research, 31(9):1044–1056.
Munch, E., Turner, K., Bendich, P., Mukherjee, S., Mattingly, J., & Harer, J. (2015). Probabilistic Fréchet means for time varying persistence diagrams. Electron. J. Statist., 9:1173–1204.
Munch, E., & Wang, B. (2016). Convergence between categorical representations of Reeb space and Mapper. In Fekete, S. and Lubiw, A., editors, 32nd International Symposium on Computational Geometry (SoCG 2016), volume 51 of Leibniz International Proceedings in Informatics (LIPIcs), pages 53:1–53:16, Dagstuhl, Germany. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
Munkres, J. R. (1993). Elements of Algebraic Topology. Addison Wesley.
Nanda, V. (2015). Perseus. http://www.sas.upenn.edu/ ~ vnanda/perseus/.
Nicolau, M., Levine, A. J., & Carlsson, G. (2011). Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences, 108(17):7265–7270.
Nielson, J. L., Paquette, J., Liu, A. W., Guandique, C. F., Tovar, C. A., Inoue, T., Irvine, K.-A., Gensel, J. C., Kloke, J., Petrossian, T. C., et al. (2015). Topological data analysis for discovery in preclinical spinal cord injury and traumatic brain injury. Nature communications, 6.
Niyogi, P., Smale, S., & Weinberger, S. (2011). A topological view of unsupervised learning from noisy data. SIAM Journal on Computing, 40:646–663.
Otter, N., Porter, M. A., Tillmann, U., Grindrod, P., & Harrington, H. A. (2015). A roadmap for the computation of persistent homology. arXiv: 1506.08903.
Perea, J. A., & Carlsson, G. (2014). A Klein-bottle-based dictionary for texture representation. International journal of computer vision, 107(1):75–97.
Perea, J. A., Deckard, A., Haase, S. B., & Harer, J. (2015). SW1PerS: Sliding windows and 1-persistence scoring; discovering periodicity in gene expression time series data. BMC Bioinformatics, 16(1).
Robins, V., Wood, P. J., & Sheppard, A. P. (2011). Theory and algorithms for constructing discrete Morse complexes from grayscale digital images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1646–1658.
Singh, G., Mémoli, F., & Carlsson, G. (2007). Topological methods for the analysis of high dimensional data sets and 3D object recognition. In Eurographics Symposium on Point-Based Graphics.
Torres, B. Y., Oliveira, J. H. M., Thomas Tate, A., Rath, P., Cumnock, K., & Schneider, D. S. (2016). Tracking resilience to infections by mapping disease space. PLoS Biol, 14(4):1–19.
Tralie, C. (2016). Fast Rips in the browser. http://www.ctralie.com/Software/jsTDA/.
Turner, K., Mileyko, Y., Mukherjee, S., & Harer, J. (2014). Fréchet means for distributions of persistence diagrams. Discrete & Computational Geometry, 52(1):44–70.
Vejdemo-Johansson, M. & Skraba, P. (2016). Big Data Optimization: Recent Developments and Challenges, chapter Topology, Big Data and Optimization, pages 147–176. Springer International Publishing, Cham.
Wood, Z., Hoppe, H., Desbrun, M., & Schröder, P. (2004). Removing excess topology from isosurfaces. ACM Trans. Graph., 23(2):190–208.
Yao, Y., Sun, J., Huang, X., Bowman, G. R., Singh, G., Lesnick, M., Guibas, L. J., Pande, V. S., & Carlsson, G. (2009). Topological methods for exploring low-density states in biomolecular folding pathways. The Journal of Chemical Physics, 130(14):–.
Zomorodian, A., & Carlsson, G. (2004). Computing persistent homology. Discrete & Computational Geometry, 33(2):249–274.
Published
How to Cite
Issue
Section
License
Copyright (c) 2017 Journal of Learning Analytics
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
TEST