Accelerating Seismic Redatuming Using Tile Low-Rank Approximations on NEC SX-Aurora TSUBASA

Authors

  • Yuxi Hong king abdullah university of science and technology
  • Hatem Ltaief king abdullah university of science and technology
  • Matteo Ravasi king abdullah university of science and technology
  • Laurent Gatineau NEC Deutschland GmbH, HPC Division.
  • David Keyes king abdullah university of science and technology

DOI:

https://doi.org/10.14529/jsfi210201

Abstract

With the aim of imaging subsurface discontinuities, seismic data recorded at the surface of the Earth must be numerically re-positioned inside the subsurface where reflections have originated, a process referred to as redatuming. The recently developed Marchenko method is able to handle full-wavefield data including multiple arrivals. A downside of this approach is that a multi-dimensional convolution operator must be repeatedly evaluated to solve an expensive inverse problem. As such an operator applies multiple dense matrix-vector multiplications (MVM), we identify and leverage the data sparsity structure for each frequency matrix and propose to accelerate the MVM step using tile low-rank (TLR) matrix approximations. We study the TLR impact on time-to-solution for the MVM using different accuracy thresholds whilst at the same time assessing the quality of the resulting subsurface seismic wavefields and show that TLR leads to a minimal degradation in terms of signal-to-noise ratio on a 3D synthetic dataset. We mitigate the load imbalance overhead and provide performance evaluation on two distributed-memory systems. Our MPI+OpenMP TLR-MVM implementation reaches up to 3X performance speedup against the dense MVM counterpart from NEC scientific library on 128 NEC SX-Aurora TSUBASA cards. Thanks to the second generation of high bandwidth memory technology, it further attains up to 67X performance speedup compared to the dense MVM from Intel MKL when running on 128 dual-socket 20-core Intel Cascade Lake nodes with DDR4 memory. This corresponds to 110 TB/s of aggregated sustained bandwidth for our TLR-MVM implementation, without suffering deterioration in the quality of the reconstructed seismic wavefields.

References

Abdelfattah, A., Haidar, A., Tomov, S., Dongarra, J.J.: Performance, design, and autotuning of batched GEMM for GPUs. In: Kunkel, J.M., Balaji, P., Dongarra, J.J. (eds.) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science, vol. 9697, pp. 21–38. Springer (2016). https://doi.org/10.1007/978-3-319-41321-1_2

Akbudak, K., Ltaief, H., Mikhalev, A., Keyes, D.: Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures. In: High Performance Computing. ISC 2017. Lecture Notes in Computer Science, vol. 10266, pp. 22–40. Springer (2017). https://doi.org/10.1007/978-3-319-58667-0_2

Akbudak, K., Ltaief, H., Mikhalev, A., et al.: Exploiting data sparsity for large-scale matrix computations. In: Aldinucci, M., Padovani, L., Torquati, M. (eds.) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science, vol. 11014, pp. 721–734. Springer (2018). https://doi.org/10.1007/978-3-319-96983-1_51

Al-Harthi, N., Alomairy, R., Akbudak, K., et al.: Solving Acoustic Boundary Integral Equations Using High Performance Tile Low-Rank LU Factorization. In: High Performance Computing. ISC High Performance 2020. Springer (2020). https://doi.org/10.1007/978-3-030-50743-5_11

Amestoy, P., Ashcraft, C., Boiteau, O., et al.: Improving Multifrontal Methods by Means of Block Low-Rank Representations. SIAM Journal on Scientific Computing 37(3), A1451–A1474 (2015). https://doi.org/10.1137/120903476

Amundsen, L.: Elimination of Free-surface Related Multiples Without Need of a Source Wavelet. Geophysics 66, 327–341 (2001). https://doi.org/10.1190/1.1444912

Berryhill, J.R.: Wave-equation Datuming Before Stack. Geophysics 49, 2064–2066 (1984). https://doi.org/10.1190/1.1441620

Börm, S.: Efficient Numerical Methods for Non-Local Operators: H2-matrix Compression, Algorithms and Analysis, vol. 14. European Mathematical Society (2010). https://doi.org/10.4171/091

Börm, S., Grasedyck, L., Hackbusch, W.: Introduction to Hierarchical Matrices with Applications. Engineering Analysis with Boundary Elements 27(5), 405–422 (2003). https://doi.org/10.1016/S0955-7997(02)00152-2

Boukaram, W.H., Turkiyyah, G., Ltaief, H., Keyes, D.E.: Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression. Parallel Computing 74(C), 19–33 (2018). https://doi.org/10.1016/j.parco.2017.09.001

Brackenhoff, J., Thorbecke, J., Koehne, V., et al.: Implementation of the 3D Marchenko method (2020). https://doi.org/10.1190/geo2017-0108.1

Broggini, F., Snieder, R., Wapenaar, K.: Focusing the Wavefield Inside an Unknown 1D Medium: Beyond Seismic Interferometry. Geophysics 77(5), A25–A28 (2012). https://doi.org/10.1190/geo2012-0060.1

Cao, Q., Pei, Y., Akbudak, K., et al.: Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications. In: Proceedings of the Platform for Advanced Scientific Computing Conference. pp. 2:1–2:11. ACM (2020). https://doi.org/10.1145/3394277.3401846

Charara, A., Keyes, D., Ltaief, H.: Tile Low-Rank GEMM Using Batched Operations on GPUs. In: Aldinucci, M., Padovani, L., Torquati, M. (eds.) Euro-Par 2018: Parallel Processing. Lecture Notes in Computer Science, vol. 11014, pp. 811–825. Springer (2018). https://doi.org/10.1007/978-3-319-96983-1_57

Charara, A., Keyes, D., Ltaief, H.: Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs. ACM Transactions on Mathematical Software 45(2) (2019). https://doi.org/10.1145/3267101

Corona, E., Martinsson, P.G., Zorin, D.: An O(N) Direct Solver for Integral Equations on the Plane. Applied and Computational Harmonic Analysis 38(2), 284–317 (2015). https://doi.org/10.1016/j.acha.2014.04.002

Goreinov, S., Tyrtyshnikov, E., Yeremin, A.Y.: Matrix-Free Iterative Solution Strategies for Large Dense Linear Systems. Numerical Linear Algebra with Applications 4(4), 273–294 (1997)

Grasedyck, L., Kressner, D., Tobler, C.: A Literature Survey of Low-Rank Tensor Approximation Techniques. GAMM-Mitteilungen 36(1), 53–78 (2013). https://doi.org/10.1002/gamm.201310004

van Groenestijn, G.J., Verschuur, D.J.: Estimating Primaries by Sparse Inversion and Application to Near-offset Data Reconstruction. Geophysics 74(3), 1MJ–Z54 (2009). https://doi.org/10.1190/1.3111115

Hackbusch, W.: A Sparse Matrix Arithmetic Based on H-matrices. Part I: Introduction to H-Matrices. Computing 62(2), 89–108 (1999). https://doi.org/10.1007/s006070050015

Halko, N., Martinsson, P.G., Tropp, J.A.: Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM Review 53(2), 217–288 (2011). https://doi.org/10.1137/090771806

Jumah, B., Herrmann, F.J.: Dimensionality-reduced Estimation of Primaries by Sparse Inversion. Geophysical Prospecting 62(5), 972–993 (2014). https://doi.org/10.1111/1365-2478.12113

Keyes, D.E., Ltaief, H., Turkiyyah, G.: Hierarchical Algorithms on Hierarchical Architectures. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 378(2166), 20190055 (2020). https://doi.org/10.1098/rsta.2019.0055

Kriemann, R.: H-LU Factorization on Many-Core Systems. Computing and Visualization in Science 16(3), 105–117 (2013). https://doi.org/10.1007/s00791-014-0226-7

Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Transactions on Visualization and Computer Graphics 20(12), 2674–2683 (2014). https://doi.org/10.1109/TVCG.2014.2346458

Ltaief, H., Cranney, J., Gratadour, D., et al.: Meeting the Real-Time Challenges of Ground-Based Telescopes Using Low-Rank Matrix Computations (2021), http://hdl.handle.net/10754/669813

van der Neut, J., Thorbecke, J., Wapenaar, K., Slob, E.: Inversion of the Multidimensional Marchenko Equation. In: 77th Conference and Exhibition, EAGE, Extended Abstracts. vol. 2015, pp. 1–5. European Association of Geoscientists & Engineers (2015). https://doi.org/10.3997/2214-4609.201412939

Ravasi, M., Vasconcelos, I.: PyLops – A Linear-operator Python Library for Scalable Algebra and Optimization. SoftwareX 11, 100361 (2020). https://doi.org/10.1016/j.softx.2019.100361

Ravasi, M., Vasconcelos, I.: An Open-source Framework for the Implementation of Largescale Integral Operators with Flexible, Modern HPC Solutions - Enabling 3D Marchenko

Imaging by Least Squares Inversion. Geophysics pp. 1–74 (2021). https://doi.org/10.1190/geo2020-0796.1

Ravasi, M., Vasconcelos, I., Kritski, A., et al.: Target-oriented Marchenko Imaging of a North Sea Field. Geophysical Journal International 205(1), 99–104 (2016). https://doi.org/10.1093/gji/ggv528

Rouet, F.H., Li, X.S., Ghysels, P., Napov, A.: A Distributed-memory Package for Dense Hierarchically Semi-separable Matrix Computations Using Randomization. ACM Transactions on Mathematical Software (TOMS) 42(4), 27 (2016). https://doi.org/10.1145/2930660

Verschuur, D.J.: Surface-related Multiple Elimination in Terms of Huygens Sources. Journal of Seismic Exploration 1, 49–59 (1992)

Wapenaar, K., Thorbecke, J., van der Neut, J., et al.: Marchenko Imaging. Geophysics 79(3), WA39–WA57 (2014). https://doi.org/10.1190/geo2013-0302.1

Williams, S., Waterman, A., Patterson, D.: Roofline: An Insightful Visual Performance Model for Multicore Architectures. Communications of the ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785

Yilmaz, O.: Seismic Data Analysis. Society of Exploration Geophysicists (2001)

Downloads

Published

2021-09-14

How to Cite

Hong, Y., Ltaief, H., Ravasi, M., Gatineau, L., & Keyes, D. (2021). Accelerating Seismic Redatuming Using Tile Low-Rank Approximations on NEC SX-Aurora TSUBASA. Supercomputing Frontiers and Innovations, 8(2), 6–26. https://doi.org/10.14529/jsfi210201

Most read articles by the same author(s)