The High-Q Club: Experience with Extreme-scaling Application Codes

Authors

  • Dirk Brömmel Jülich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Jülich GmbH
  • Wolfgang Frings Jülich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Jülich GmbH
  • Brian J. N. Wylie Jülich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Jülich GmbH
  • Bernd Mohr Jülich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Jülich GmbH
  • Paul Gibbon Jülich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Jülich GmbH
  • Thomas Lippert Jülich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Jülich GmbH

DOI:

https://doi.org/10.14529/jsfi180104

Abstract

Jülich Supercomputing Centre (JSC) started running (extreme) scaling workshops with its first IBM Blue Gene supercomputer, finally spanning three generations each seeing an increase in the number of cores and available threads. Over the years, this workshop series attracted numerous international code teams and resulted in many applications capable of running on all available cores of each system.
This article reviews some of the knowledge gained with running and tuning highly-scalable applications, focussing on JUQUEEN, the IBM Blue Gene/Q at JSC. The ability to execute successfully on all 458752 cores with up to 1.8 million processes or threads may qualify codes for the High-Q Club, which serves as a showcase for diverse codes scaling to the entire 28 racks, effectively defining a collection of the highest scaling codes on JUQUEEN. The intention was to encourage other developers to invest in tuning and scaling their codes while identifying the necessary key aspects for that goal.
As this era closes, it is timely to compare the characteristics of the 32 High-Q Club member codes, considering their strong and/or weak scaling, exploitation of hardware threading, and whether/how intra-node multi-threading is employed combined with message-passing. We also identify the obstacles for scaling such as inefficient use of limited compute node memory and file I/O as key governing factors. Overall, the analysis provides guidance as to how applications may (need to) be designed in future to exploit expected exa-scale computer systems.

References

Allock, W., Bacon, C., Bailey, A., Bair, R., et. al.: Blue Gene/Q: Sequoia and Mira. In: Vetter, J. (ed.) Contemporary High Performance Computing: From Petascale toward Exascale. pp. 225–282. Chapman & Hall/CRC (2013)

Attig, N., Docter, J., Frings, W., Grotendorst, J., Gutheil, I., Janetzko, F., Mextorf, O., Mohr, B., Stephan, M., Wolkersdorfer, K., Wollschlager, L., Krieg, S., Lippert, T.: Blue Gene/P: JUGENE. In: Vetter, J. (ed.) Contemporary High Performance Computing: From Petascale toward Exascale. pp. 153–188. Chapman & Hall/CRC (2013)

Brömmel, D., Frings, W., Wylie, B..J.N.: MAXI – Multi-system Application Extreme-scaling Imperative. In: Joubert, G.R., Leather, H., Parsons, M., Peters, F., Sawyer, M. (eds.) Parallel Computing: On the Road to Exascale. vol. 27, pp. 765–766. IOS Press (2016), DOI: 10.3233/978-1-61499-621-7-765

Brömmel, D., Frings, W., Wylie, B.J.N.: JUQUEEN Extreme Scaling Workshop 2015. Tech. Rep. FZJ-JSC-IB-2015-01 (2015), http://juser.fz-juelich.de/record/188191, accessed: 2018-03-21

Brömmel, D., Frings, W., Wylie, B.J.N.: Extreme-scaling Applications En Route to Exascale. In: Proceedings of the Exascale Applications and Software Conference 2016. pp. 1:1–1:10. EASC ’16, ACM, New York, NY, USA (2016), DOI: 10.1145/2938615.2938616

Brömmel, D., Frings, W., Wylie, B.J.N.: JUQUEEN Extreme Scaling Workshop 2016. Tech. Rep. FZJ-JSC-IB-2016-01 (2016), http://juser.fz-juelich.de/record/283461, accessed: 2018-03-21

Brömmel, D., Frings, W., Wylie, B.J.N.: JUQUEEN Extreme Scaling Workshop 2017. Tech. Rep. FZJ-JSC-IB-2017-01 (2017), http://juser.fz-juelich.de/record/828084, accessed: 2018-03-21

Burstedde, C., Fonseca, J.A., Kollet, S.: Enhancing speed and scalability of the ParFlow simulation code. Computational Geosciences 22(1), 347–361 (2018), DOI: 10.1007/s10596-017-9696-2

Freche, J., Frings, W., Sutmann, G.: High-Throughput Parallel-I/O using SIONlib for Mesoscopic Particle Dynamics Simulations on Massively Parallel Computers. In: Chapman, B., Desprez, F., Joubert, G.R., Lichnewsky, A., Peters, F., Priol, T. (eds.) Parallel Computing:

From Multicores and GPU’s to Petascale. vol. 19, pp. 371–378. IOS Press (2010), DOI: 10.3233/978-1-60750-530-3-371

Frings, W., Mohr, B., Orth, B.: Report on the Jülich Blue Gene/L Scaling Workshop 2006. Tech. Rep. FZJ-ZAM-IB-2007-02, Jülich (2007), http://juser.fz-juelich.de/record/55967, accessed: 2018-03-21

Frings, W.: Efficient Task-Local I/O Operations of Massively Parallel Applications. Ph.D. thesis, RWTH Aachen University, Jülich (2016), http://juser.fz-juelich.de/record/811621, accessed: 2018-03-21

Frings, W., Ahn, D.H., LeGendre, M., Gamblin, T., de Supinski, B.R., Wolf, F.: Massively Parallel Loading. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. pp. 389–398. ICS ’13, ACM, New York, NY, USA (2013), DOI: 10.1145/2464996.2465020

Frings, W., Wolf, F., Petkov, V.: Scalable Massively Parallel I/O to Task-local Files. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. pp. 17:1–17:11. SC ’09, ACM, New York, NY, USA (2009), DOI: 10.1145/1654059.1654077

Gageik, M., Klioutchnikov, I., Olivier, H.: Mesh study for a direct numerical simulation of the transonic flow at Rec=500, 000 around a NACA 0012 airfoil. Computers & Fluids 122, 153–164 (2015), DOI: 10.1016/j.compfluid.2015.08.030

Geimer, M., Wolf, F., Wylie, B.J.N., Abraham, E., Becker, D., Mohr, B.: The Scalasca Performance Toolset Architecture. Concurrency and Computation: Practice and Experience 22(6), 702–719 (2010), DOI: 10.1002/cpe.1556

Göbbert, J.H., Bode, M., Wylie, B.J.N.: Extreme-Scale In Situ Visualization of Turbulent Flows on IBM Blue Gene/Q JUQUEEN. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) High Performance Computing. pp. 45–55. Springer International Publishing, Cham (2016), DOI: 10.1007/978-3-319-46079-6_4

Göbbert, J., Gauding, M., Ansorge, C., Hentschel, B., Kuhlen, T., Pitsch, H.: Direct numerical simulation of fluid turbulence at extreme scale with psOpen. In: Gerhard, R., Leather, H., Parsons, M., Peters, F., Sawyer, M. (eds.) Parallel Computing: On the Road to Exascale. vol. 27, pp. 777–785. IOS Press (2016), DOI: 10.3233/978-1-61499-621-7-777

Hammer, N., Jamitzky, F., Satzger, H., et al.: Extreme scale-out SuperMUC phase 2 – lessons learned. In: Gerhard, R., Leather, H., Parsons, M., Peters, F., Sawyer, M. (eds.) Parallel Computing: On the Road to Exascale. vol. 27, pp. 827–836. IOS Press (2016), DOI: 10.3233/978-1-61499-621-7-827

Heinzeller, D., Duda, M.G., Kunstmann, H.: Towards convection resolving, global atmospheric simulations with the Model for Prediction Across Scales (MPAS) v3.1: an extreme scaling experiment. Geoscientific Model Development 9(1), 77–110 (2016), DOI: 10.5194/gmd-9-77-2016

IBM Corporation: IBM System Blue Gene Solution Blue Gene/Q application development. http://www.redbooks.ibm.com/, accessed: 2018-03-21

Jordan, J., Ippen, T., Helias, M., Kitayama, I., Sato, M., Igarashi, J., Diesmann, M., Kunkel, S.: Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers. Frontiers in Neuroinformatics 12 (2018), DOI: 10.3389/fninf.2018.00002

Jülich Supercomputing Centre: The High-Q Club. http://www.fz-juelich.de/ias/jsc/high-q-club, accessed: 2018-03-21

Klawonn, A., Lanser, M., Rheinbach, O.: FE2TI: Computational scale bridging for dualphase steels. In: Joubert, G.R., Leather, H., Parsons, M., Peters, F., Sawyer, M. (eds.)Parallel Computing: On the Road to Exascale. vol. 27, pp. 797–806. IOS Press (2016), DOI: 10.3233/978-1-61499-621-7-797

Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A., Nagel, W., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Tools for High Performance Computing 2011. pp. 79–91. Springer, Berlin, Heidelberg (2012), DOI: 10.1007/978-3-642-31476-6_7

Mohr, B., Frings, W. (eds.): Jülich Blue Gene/P Porting, Tuning & Scaling Workshop 2008, Innovatives Supercomputing in Deutschland (InSiDE), vol. 6 (2008), http://inside.hlrs.de/_old/htm/Edition_02_08/article_28.html, accessed: 2018-03-21

Mohr, B., Frings, W.: Jülich Blue Gene/P Extreme Scaling Workshop 2009. Tech. Rep. FZJ-JSC-IB-2010-02, Jülich (2010), http://juser.fz-juelich.de/record/8924, accessed: 2018-03-21

Mohr, B., Frings, W.: Jülich Blue Gene/P Extreme Scaling Workshop 2010. Tech. Rep. FZJ-JSC-IB-2010-03, Jülich (2010), http://juser.fz-juelich.de/record/9600, accessed: 2018-03-21

Mohr, B., Frings, W.: Jülich Blue Gene/P Extreme ScalingWorkshop 2011. Tech. Rep. FZJJSC-IB-2011-02, Jülich (2011), http://juser.fz-juelich.de/record/15866, accessed: 2018-03-21

Ovcharenko, A., Kumbhar, P., Hines, M., Cremonesi, F., Ewart, T., Yates, S., Schürmann, F., Delalondre, F.: Simulating morphologically detailed neuronal networks at extreme scale. In: Joubert, G.R., Leather, H., Parsons, M., Peters, F., Sawyer, M. (eds.) Parallel Computing:

On the Road to Exascale. vol. 27, pp. 787–796. IOS Press (2016), DOI: 10.3233/978-1-61499-621-7-787

Qi, J., Jain, K., Klimach, H., Roller, S., Schürmann, F., Delalondre, F.: Performance evaluation of the LBM solver Musubi on various HPC architectures. In: Joubert, G.R., Leather, H., Parsons, M., Peters, F., Sawyer, M. (eds.) Parallel Computing: On the Road to Exascale. vol. 27, pp. 807–816. IOS Press (2016), DOI: 10.3233/978-1-61499-621-7-807

Rohe, D.: Hierarchical Parallelisation of Functional Renormalisation Group Calculations – hp-fRG. Computer Physics Communications 207, 160–169 (2015), DOI: 10.1016/j.cpc.2016.05.024

Schumann, T., Frings, W., Peyser, A., Schenck, W., Thust, K., Eppler, J.M.: Modeling the I/O behavior of the NEST simulator using a proxy. In: Conference Proceedings of the YIC GACM 2015 / ed.: Stefanie Elgeti; Jaan-Willem Simon. 3rd ECCOMAS Young Investigators Conference, Aachen (Germany), 20–23 July 2015, RWTH Aachen University (2015), http://juser.fz-juelich.de/record/202952, accessed: 2018-03-21

Springer, P., Ismail, A.E., Bientinesi, P.: A Scalable, Linear-Time Dynamic Cutoff Algorithm for Molecular Dynamics. In: Kunkel, J.M., Ludwig, T. (eds.) High Performance Computing. pp. 155–170. Springer International Publishing, Cham (2015), DOI: 10.1007/978-3-319-20119-1_12

Stephan, M., Doctor, J.: JUQUEEN: IBM Blue Gene/Q supercomputer system at the Jülich Supercomputing Centre. Journal of Large-Scale Research Facilities 1, 1–5 (2015), DOI: 10.17815/jlsrf-1-18

Downloads

Published

2018-04-23

How to Cite

Brömmel, D., Frings, W., Wylie, B. J. N., Mohr, B., Gibbon, P., & Lippert, T. (2018). The High-Q Club: Experience with Extreme-scaling Application Codes. Supercomputing Frontiers and Innovations, 5(1), 59–78. https://doi.org/10.14529/jsfi180104