Long Distance Geographically Distributed InfiniBand Based Computing
DOI:
https://doi.org/10.14529/jsfi200202Abstract
Collaboration between multiple computing centres, referred as federated computing is becoming important pillar of High Performance Computing (HPC) and will be one of its key components in the future. To test technical possibilities of future collaboration using 100Gb optic fiber link (Connection was 900 km in length with 9ms RTT time) we prepared two scenarios of operation.In the first one, Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) in Warsaw and Centre of Informatics - Tricity Academic Supercomputer & networK (CI-TASK) in Gdańsk prepared a long distance geographically distributed computing cluster. System consisted of 14 nodes (10 nodes at ICM facility and 4 at TASK facility) connected using InfiniBand. Our tests demonstrate that it is possible to perform computationally intensive data analysis on systems of this class without substantial drop in performance for a certain type of workloads. Additionally, we show that it is feasible to use High Performance Parallex [1], high level abstraction libraries for distributed computing, to develop software for such geographically distributed computing resources and maintain desired efficiency.
In the second scenario, we prepared distributed simulation-postprocessing-visualization workflow using ADIOS2 [2] and two programming languages (C++ and python). In this test we prove capabilities of performing different parts of analysis in seperate sites.
References
Kaiser, H., Lelbachaka wash, B.A., Heller, T., Berge, A., et al.: STEllAR-GROUP/hpx: HPX V1.3.0: The C++ Standards Library for Parallelism and Concurrency (2019), DOI: 10.5281/zenodo.3189323
The Adaptable Input Output System version 2, https://github.com/ornladios/ADIOS2/, accessed: 2020-02-08
Orlowski, L., Deng, Y., Michalewicz, M.: Galaxies of supercomputers and their underlying interconnect topologies hierarchies. In: International Supercomputer Conference, Leipzig, Germany (2014), DOI: 10.13140/2.1.4798.2728
Michalewicz, M., Southwell, D., Tan, T., Poppe, Y., et al.: InfiniCortex: concurrent supercomputing across the globe utilising trans-continental InfiniBand and Galaxy of Supercomputers. In: Supercomputing 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis, At New Orleans, LA, USA (2014), DOI: 10.13140/2.1.3267.7444
Michalewicz, M.T., Lian, T.G., Seng, L., Low, J., et al.: InfiniCortex: Present and Future Invited Paper. In: Proceedings of the ACM International Conference on Computing Frontiers, May 2016, Como, Italy. pp. 267–273. Association for Computing Machinery, New York, NY, USA (2016), DOI: 10.1145/2903150.2912887
Noaje, G., Davis, A., Low, J., Lim, S., et al.: InfiniCortex – From Proof-of-concept to Production. Supercomputing Frontiers and Innovations 4(2), 87–102 (2017), DOI: 10.14529/jsfi170207
Obsidian Strategics Inc., https://www.cybersecurityintelligence.com/obsidian-strategics-106.html, accessed: 2020-06-01
Obsidian Strategics Inc., https://obsidianstrategics.com/index.html, accessed: 2020-06-01
Vcinity Inc., https://vcinity.io/, accessed: 2020-06-01
Mellanox MetroX R-2 Systems, https://www.mellanox.com/products/long-haul, accessed: 2020-06-01
Obsidian Longbow Campus Solutions Extend Its Columbia Supercomputer across Multiple NASA Locations, https://www.militaryaerospace.com/home/article/16725502/obsidian-longbow-campus-solutions-extend-its-columbia-supercomputer-across-multiple-nasa-locations, accessed: 2020-06-01
Eikenberry, S., Lindekugel, K., Stanzione, D.: Long Haul InfiniBand Technology: Implications for Cluster Computing, Arizona State University (2006), https://obsidianstrategics.com/archives/2006/asustanzione ccs.pdf, accessed: 2020-06-28
El-Harake, H.N., Gamboni, C., Gorini, S., Schoenemeyer, T.: Evaluation of infiniband range extension offered by obsidian (2011)
Richling, S., Kredel, H., Hau, S., Kruse, H.G.: A long-distance infiniband interconnection between two clusters in production use. In: State of the Practice Reports, November 2011, Seattle, Washington. Association for Computing Machinery, New York, NY, USA (2011), DOI: 10.1145/2063348.2063368
Ban, K., Chrzeszczyk, J., Howard, A., Li, D., Tan, T.W.: InfiniCloud: Leveraging the Global InfiniCortex Fabric and OpenStack Cloud for Borderless High Performance Computing of Genomic Data. Supercomputing Frontiers and Innovations 2(3), 14–27 (2015), DOI: 10.14529/jsfi150302
Chrzeszczyk, J., Howard, A., Chrzeszczyk, A., Swift, B., Davis, P., Low, J., Tan, T.W., Ban, K.: InfiniCloud 2.0: distributing High Performance Computing across continents. Supercomputing Frontiers and Innovations 3(2), 54–71 (2016), DOI: 10.14529/jsfi160204
Antypas, K.: Superfacility: How new workflows in the DOE Office of Science are influencing storage system requirements? (2016), https://storageconference.us/2016/Slides/KatieAntypas.pdf, accessed: 2020-06-01
NERSC Superfacility, https://www.nersc.gov/research-and-development/superfacility/, accessed: 2020-06-01
Creating Super-facilities: a Coupled Facility Model for Data-Intensive Science, Internet 2 Global Summit 2015, http://meetings.internet2.edu/2015-global-summit/detail/10003679/, accessed: 2020-06-01
Bell, G.: The Energy Sciences Network: Overview, Update, Impact (DoE) - presentation, https://science.osti.gov/-/media/ascr/ascac/pdf/meetings/20150324/Bell ESNet.pdf?la=en&hash=46C0168F7ADAB232EC32E4452C49A159453859C9, accessed: 2020-06-01
Fenix Research Infrastructure, https://fenix-ri.eu/about-fenix, accessed: 2020-06-01
Noaje, G.: InfiniCortex, InfiniBand nation-wide and world-wide, a talk given at Journee Scientifique ROMEO’2016, Reims, France (2016), https://romeo.univ-reims.fr/news/208/Journee Scientifique ROMEO 2016 le 9 juin 2016 a REIMS, accessed: 2020-06-01
Proficz, J., Sumionka, P., Skomia l, J., Semeniuk, M., Niedzielewski, K., Walczak, M.: Investigation into MPI All-Reduce Performance in a Distributed Cluster with Consideration of Imbalanced Process Arrival Patterns. In: International Conference on Advanced Information Networking and Applications, 15-17 April, Caserta, Italy. pp. 817–829. Springer (2020), DOI: 10.1007/978-3-030-44041-1_72
Niedzielewski, K., Marchwiany, M.E., Piliszek, R., Michalewicz, M., Rudnicki, W.: Multidimensional feature selection and high performance parallex. SN Computer Science 1(1), 40 (2020), DOI: 10.1007/s42979-019-0037-5
Open MPI: Open source high performance computing, https://www.open-mpi.org/, accessed: 2020-02-08
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 545–552. MIT Press (2005), http://papers.nips.cc/paper/2728-result-analysis-of-the-nips-2003-feature-selection-challenge.pdf
Dua, D., Graff, C.: UCI machine learning repository (2017), http://archive.ics.uci.edu/ml
Application examples for the ADIOS2 I/O library, https://github.com/ornladios/ADIOS2-Examples, accessed: 2020-02-08
Pearson, J.E.: Complex Patterns in a Simple System. Science 261(5118), 189–192 (1993), DOI: 10.1126/science.261.5118.189
Downloads
Published
How to Cite
Issue
License
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-Non Commercial 3.0 License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.