On the Inversion of Multiple Matrices on GPU in Batched Mode
DOI:
https://doi.org/10.14529/jsfi180203Abstract
In this research we are considering the benchmarking of batched matrix inversion and solution of linear systems. The problem of multiple matrix inversion with the same fill sparsity is usually considered in problems of fluid mechanics with chemistry. In this case the system is stiff, and an implicit method is required to solve the problem. The core of such method is the multiple matrix inversion. We benchmark different methods based on cuSPARSE and MAGMA libraries and CPU LAPACK version depending on the matrix filling. We also provide our own experimental code that implements GaussJordan elimination on GPU using register shuffle. It is shown that the fastest method is the QR matrix inversion for single precision calculations. We also show that the suggested Gauss–Jordan elimination method looks promising being about 8–10 times faster than cuSPARSE QR method. We also demonstrate the application of batch solvers in the coupled reactive flow problem.
References
Abdelfattah, A., Haidar, A., Tomov, S., Dongarra, J.: Performance, Design, and Autotuning of Batched GEMM for GPUs. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.)
High Performance Computing, pp. 21–38, Springer International Publishing, Cham (2016), DOI: 10.1007/978-3-319-41321-1_2
Abdelfattah, A., Haidar, A., Tomov, S., Dongarra, J.: Factorization and Inversion of a
Million Matrices using GPUs: Challenges and Countermeasures. Procedia Computer Science 108, 606–615, (2017), DOI: 10.1016/j.procs.2017.05.250
Anzt, H., Dongarra, J., Flegar, G., Quintana-Orti, E.S.: Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs. In: PMAM’17 Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 04–08 USA–February, Austin, TX, pp. 1–10 (2017), DOI: 10.1145/3026937.3026940
Asaithambi, R., Muppidi, S., Mahesh, K.: A numerical method for DNS of turbulent reacting flows using complex chemistry. 42nd AIAA Fluid Dynamics Conference and Exhibit, AIAA 2012–3252, (2012), DOI: 10.2514/6.2012-3252
cuSOLVER CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/cusolver/index.html, accessed: 2018-04-01
Demouth, J.: Shuffle: Tips and Tricks. GPU Technology conference, (2013). http://ondemand.gputechconf.com/gtc/2013/presentations/S3174-Kepler-Shuffle-Tips-Tricks.pdf, accessed: 2018-04-01
Dong, T., Haidar, A., Luszczek, P., Harris, J.A., Tomov, S., Dongarra, J.: LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU. In: IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), (2014), DOI: 10.1109/HPCC.2014.30
Dong, T., Haidar, A.,Tomov, S., Dongarra, J.: A Fast Batched Cholesky Factorization on
a GPU. In: Proc of 43-rd International Conference on Parallel Processing, 432–440 (2014), DOI: 10.1016/j.jocs.2016.12.009
Doring, W.: On detonation processes in gases. Annals of Physics 43, 421–436, (1943), DOI: 10.1002/andp.19434350605
Evstigneev, N.M., Ryabkov, O.I.: On The Development of High-Order Discontinuous
Galerkin Method on 3D Unstructured Grid for Hyperbolic and Parabolic Problems Using Graphics Processors. In: Short Articles and Posters of the XI International Conference on Parallel Computational Technologies (PCT’2017), Kazan, 3–7 April 2017, pp. 63–77. Chelyabinsk, Publishing Center of the South Ural State University (2017)
Fickett, W., Davis, W.C.: Detonation, Theory and Experiment. Dover Publications (2000)
Geßner, T.: Dynamic Mesh Adaption for Supersonic Combustion Waves modeled with Detailed Reaction Mechanisms. Doctoral Dissertation, Universitat Freiburg im Breisgau (2001)
GIT authors repository. https://github.com/oryabkov/cuda batch linsolvers test.git, accessed: 2018-04-01
MAGMA Library documentation. icl.cs.utk.edu/magma, accessed: 2018-04-01
Masliah, I., Abdelfattah, A., Haidar, A., Tomov, S., Baboulin, M., et al.: High-Performance Matrix-Matrix Multiplications of Very Small Matrices. 2nd International Conference on Parallel and Distributed Computing (Euro–Par 2016), Aug 2016, Grenoble, France. Springer, Lecture Notes in Computer Science, vol. 9833, pp. 659–671, (2016), DOI: 10.1007/978-3-319-43659-3
Rosenbrock, H.H.: Some general implicit processes for the numerical solution of differential equations. The Computer Journal 5(4), 329–330, (1963), DOI: 10.1093/comjnl/5.4.329
von Neumann, J.: Theory of detonation waves. In: A. J. Taub, editor, John von Neumann, Collected Works, vol. 6. Macmillan, New York (1942)
Zeldovich, Y. B.: On the theory of the propagation of detonation in gaseous systems. Journal of Experimental and Theoretical Physics, 10, 542–568 (1940). Engl. transl.: NACA TM 1261 (1960)
Downloads
Published
How to Cite
Issue
License
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-Non Commercial 3.0 License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.