NEXTGenIO is developing a prototype high performance computing (HPC) and high performance data analytics (HPDA) system that integrates a byte-addressable storage class memory (SCM) into a standard compute cluster to provide greatly increased I/O performance for computational simulation and data analytics tasks.
To enable us to develop a prototype that can be used by a wide range of computational simulation application, and data analytic tasks, we have undertaken a requirements-driven design process to create hardware and software architectures for the system. These architectures both outline the components and integration of the prototype system, and define our vision of what is required to integrate and exploit SCM to enable a generation of Exascale systems with sufficient I/O performance to ensure a wide range of workloads can be supported.
The hardware architecture, which is designed to scale up to an ExaFLOP system, uses high performance processors coupled with SCM in NVRAM (non-volatile random access memory) form, traditional DRAM memory, and an Omni-Path high performance network, to provide a set of complete compute nodes that can undertake both high performance computational and high performance data analysis.
The systemware (system software), which supports the hardware in the system, will enable parallel I/O using the SCM technology, provide a multi-node filesystem for users to exploit, enable use of object storage techniques, and provide automatic check-pointing if desired by a user. These features, along with other systemware components, will enable the system to support traditional parallel applications with high efficiency, and newer computing modes such as high performance data analytics.
The architectures defined in this document are also demonstrated with descriptions of relevant use cases that illustrate which systemware components will be used to undertake various actions on the hardware by user applications.