Simultac Fonton: A Fine-Grain Architecture for Extreme Performance beyond Moore's Law
DOI:
https://doi.org/10.14529/jsfi170203Abstract
With nano-scale technology and Moore's Law end, architecture advance serves as the principal means of achieving enhanced efficiency and scalability into the exascale era. Ironically, the field that has demonstrated the greatest leaps of technology in the history of humankind, has retained its roots in its earliest strategy, the von Neumann architecture model which has imposed tradeoffs no longer valid for today's semiconductor technologies, although they were suitable through the 1980s. Essentially all commercial computers, including HPC, have been and are von Neumann derivatives. The bottlenecks imposed by this heritage are the emphasis on ALU/FPU utilization, single instruction issue and sequential consistency, and the separation of memory and processing logic ("von Neumann bottleneck"). Here the authors explore the possibility and implications of one class of non von Neumann architecture based on cellular structures, asynchronous multi-tasking, distributed shared memory, and message-driven computation. "Continuum Computer Architecture" is introduced as a genus of ultra-fine-grained architectures where complexity of operation is an emergent behavior of simplicity of design combined with highly replicated elements. An exemplar species of CCA, "Simultac" is considered comprising billions of simple elements, "fontons", of merged properties of data storage and movement combined with logical transformations. Employing the ParalleX execution model and a variation of the HPX+ runtime system software, the Simultac may provide the path to cost effective data analytics and machine learning as well as dynamic adaptive simulations in the trans-exaOPS performance regime.
References
Anderson, M., Brodowicz, M., Kaiser, H., Sterling, T.L.: An application driven analysis of the ParalleX execution model. CoRR (2011), http://arxiv.org/abs/1109.5201, arXiv:1109.5201v1
Argyris, J.H., et al.: Finite element method – the natural approach. Computer Methods in Applied Mechanics and Engineering 17–18, 1–106 (January 1979), DOI:10.1016/0045-7825(79)90083-5
Berlekamp, E.R., Conway, J.H., Guy, R.K.: Winning Ways for your Mathematical Plays, vol. 4. A. K. Peters Ltd. (2001-2004), ISBN:978-1568811444
Black, B., et al.: Die stacking (3D) microarchitecture. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO’06. pp. 469–479 (December 2006), DOI:10.1109/MICRO.2006.18
Borkar, S., et al.: Supporting systolic and memory communication in iWarp. In: Proceedings of the 17th Annual International Symposium on Computer Architecture. pp. 70–81 (1990), DOI:10.1109/ISCA.1990.134510
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965), DOI:10.2307/2003354
Dennard, R.H., Gaensslen, F., Yu, H.N., Rideout, L., Bassous, E., LeBlanc, A.: Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE Journal of Solid State Circuits 9(5) (October 1974), DOI:10.1109/JSSC.1974.1050511
Dennis, J.B.: Data flow supercomputers. Computer 13(11), 48–56 (1980), DOI:10.1109/MC.1980.1653418
Hewitt, C., Baker, H.G.: Actors and continuous functionals. Tech. rep., Cambridge, MA, USA (1978)
Intel Corp.: Intel Threading Building Blocks (Intel TBB) (2017), website, http://www.threadingbuildingblocks.org
Kaiser, H., Brodowicz, M., Sterling, T.: ParalleX: An Advanced Parallel Execution Model for Scaling-Impaired Applications. In: Parallel Processing Workshops. pp. 394–401. IEEE Computer Society (2009), DOI:10.1109/ICPPW.2009.14
Kale, L.V., Krishnan., S.: Charm++: Parallel programming with message-driven objects. In: Wilson, G.V., Lu, P. (eds.) Parallel Programming using C++, pp. 175–213. MIT Press (1996), ISBN:9780262731188
Kim, T.H., Liu, J., Keane, J., Kim, C.H.: A high-density subthreshold SRAM with data-independent bitline leakage and virtual ground replica scheme. In: IEEE International Solid State Circuits Conference. pp. 330–331,606. IEEE (2007), DOI:10.1109/ISSCC.2007.373428
von Neumann, J.: Collected Works, vol. 5, pp. 288–326. Oxford: Pergamon Press (1961), ISBN:0080095666
Slaughter, E., Lee, W., Jia, Z., Warszawski, T., Aiken, A., McCormick, P., Ferenbaugh, C., Gutierrez, S., Davis, K., Shipman, G., Watkins, N., Bauer, M., Treichler, S.: Legion programming system (Feb 2017), version 16.10.0, http://legion.stanford.edu/
Sterling, T., Kogler, D., Anderson, M., Brodowicz, M.: SLOWER: A performance model for exascale computing. Supercomputing Frontiers and Innovations 1(2) (2014), DOI:10.14529/jsfi140203
Syrbu, A., Mereuta, A., Iakovlev, V., Caliman, A., Royo, P., Kapon, E.: 10 Gbps VCSELs with high single mode output in 1310 nm and 1550 nm wavelength bands. In: Proceedings of the Optical Fiber Communication/National Fiber Optic Engineers Conference. pp. 1–3 (February 2008), DOI:10.1109/OFC.2008.4528529
The Center for Research in Extreme Scale Technologies: HPX-5 (Nov 2016), version 4.0.0, http://hpx.crest.iu.edu/
The Ste||ar group: HPX (July 2016), version 0.9.99, http://stellar.cct.lsu.edu/
Tim, M., Romain, C.: OCR, the open community runtime interface (March 2016), version 1.1.0, https://xstack.exascale-tech.com/git/public?p=ocr.git;a=blob;f=ocr/spec/ocr-1.1.0.pdf
Valiant, L.G.: A bridging model for parallel computation. Comm. ACM 33(8), 103–111 (1990), DOI:10.1145/79173.79181
Verma, N., Chandrakasan, A.P.: A 256 kb 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy. In: IEEE Journal of Solid-State Circuits. pp. 141–149. IEEE (2008), DOI:10.1109/JSSC.2007.908005
Wilke, J., Hollman, D., Slattengren, N., Lifflander, J., Kolla, H., Rizzi, F., Teranishi, K., Bennett, J.: DARMA 0.3.0-alpha specification (March 2016), version 0.3.0-alpha, SANDIA Report SAND2016-5397
Downloads
Published
How to Cite
Issue
License
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-Non Commercial 3.0 License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.