On the Inversion of Multiple Matrices on GPU in Batched Mode

Authors

  • Nikolay M. Evstigneev Federal Research Center ''Informatics and Control'', Institute for System Analysis, Russian Academy of Science
  • Oleg I. Ryabkov Federal Research Center ''Informatics and Control'', Institute for System Analysis, Russian Academy of Science
  • Eugene A. Tsatsorin The faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University.

DOI:

https://doi.org/10.14529/jsfi180203

Abstract

In this research we are considering the benchmarking of batched matrix inversion and solution of linear systems. The problem of multiple matrix inversion with the same fill sparsity is usually considered in problems of fluid mechanics with chemistry. In this case the system is stiff, and an implicit method is required to solve the problem. The core of such method is the multiple matrix inversion. We benchmark different methods based on cuSPARSE and MAGMA libraries and CPU LAPACK version depending on the matrix filling. We also provide our own experimental code that implements GaussJordan elimination on GPU using register shuffle. It is shown that the fastest method is the QR matrix inversion for single precision calculations. We also show that the suggested Gauss–Jordan elimination method looks promising being about 8–10 times faster than cuSPARSE QR method. We also demonstrate the application of batch solvers in the coupled reactive flow problem.

References

Abdelfattah, A., Haidar, A., Tomov, S., Dongarra, J.: Performance, Design, and Autotuning of Batched GEMM for GPUs. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.)

High Performance Computing, pp. 21–38, Springer International Publishing, Cham (2016), DOI: 10.1007/978-3-319-41321-1_2

Abdelfattah, A., Haidar, A., Tomov, S., Dongarra, J.: Factorization and Inversion of a

Million Matrices using GPUs: Challenges and Countermeasures. Procedia Computer Science 108, 606–615, (2017), DOI: 10.1016/j.procs.2017.05.250

Anzt, H., Dongarra, J., Flegar, G., Quintana-Orti, E.S.: Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs. In: PMAM’17 Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 04–08 USA–February, Austin, TX, pp. 1–10 (2017), DOI: 10.1145/3026937.3026940

Asaithambi, R., Muppidi, S., Mahesh, K.: A numerical method for DNS of turbulent reacting flows using complex chemistry. 42nd AIAA Fluid Dynamics Conference and Exhibit, AIAA 2012–3252, (2012), DOI: 10.2514/6.2012-3252

cuSOLVER CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/cusolver/index.html, accessed: 2018-04-01

Demouth, J.: Shuffle: Tips and Tricks. GPU Technology conference, (2013). http://ondemand.gputechconf.com/gtc/2013/presentations/S3174-Kepler-Shuffle-Tips-Tricks.pdf, accessed: 2018-04-01

Dong, T., Haidar, A., Luszczek, P., Harris, J.A., Tomov, S., Dongarra, J.: LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU. In: IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), (2014), DOI: 10.1109/HPCC.2014.30

Dong, T., Haidar, A.,Tomov, S., Dongarra, J.: A Fast Batched Cholesky Factorization on

a GPU. In: Proc of 43-rd International Conference on Parallel Processing, 432–440 (2014), DOI: 10.1016/j.jocs.2016.12.009

Doring, W.: On detonation processes in gases. Annals of Physics 43, 421–436, (1943), DOI: 10.1002/andp.19434350605

Evstigneev, N.M., Ryabkov, O.I.: On The Development of High-Order Discontinuous

Galerkin Method on 3D Unstructured Grid for Hyperbolic and Parabolic Problems Using Graphics Processors. In: Short Articles and Posters of the XI International Conference on Parallel Computational Technologies (PCT’2017), Kazan, 3–7 April 2017, pp. 63–77. Chelyabinsk, Publishing Center of the South Ural State University (2017)

Fickett, W., Davis, W.C.: Detonation, Theory and Experiment. Dover Publications (2000)

Geßner, T.: Dynamic Mesh Adaption for Supersonic Combustion Waves modeled with Detailed Reaction Mechanisms. Doctoral Dissertation, Universitat Freiburg im Breisgau (2001)

GIT authors repository. https://github.com/oryabkov/cuda batch linsolvers test.git, accessed: 2018-04-01

MAGMA Library documentation. icl.cs.utk.edu/magma, accessed: 2018-04-01

Masliah, I., Abdelfattah, A., Haidar, A., Tomov, S., Baboulin, M., et al.: High-Performance Matrix-Matrix Multiplications of Very Small Matrices. 2nd International Conference on Parallel and Distributed Computing (Euro–Par 2016), Aug 2016, Grenoble, France. Springer, Lecture Notes in Computer Science, vol. 9833, pp. 659–671, (2016), DOI: 10.1007/978-3-319-43659-3

Rosenbrock, H.H.: Some general implicit processes for the numerical solution of differential equations. The Computer Journal 5(4), 329–330, (1963), DOI: 10.1093/comjnl/5.4.329

von Neumann, J.: Theory of detonation waves. In: A. J. Taub, editor, John von Neumann, Collected Works, vol. 6. Macmillan, New York (1942)

Zeldovich, Y. B.: On the theory of the propagation of detonation in gaseous systems. Journal of Experimental and Theoretical Physics, 10, 542–568 (1940). Engl. transl.: NACA TM 1261 (1960)

Downloads

Published

2018-07-16

How to Cite

Evstigneev, N. M., Ryabkov, O. I., & Tsatsorin, E. A. (2018). On the Inversion of Multiple Matrices on GPU in Batched Mode. Supercomputing Frontiers and Innovations, 5(2), 23–42. https://doi.org/10.14529/jsfi180203