The NEXTGenIO project addresses a key challenge not only for Exascale, but also for HPC and data intensive computing in general: the challenge of I/O performance.
As core-counts have massively increased over the past few years, the performance of I/O subsystems have struggled to keep up with computational performance and have become a key bottleneck on today’s largest systems. NEXTGenIO will develop a prototype computing platform that uses on-node non-volatile memory, bridging the latency gap between DRAM and disk, thus removing this bottleneck. In addition to the hardware that will be built as part of the project, NEXTGenIO will develop the software stack (from OS and runtime support to programming models and tools) that goes hand-in-hand with this new hardware architecture. Two particular focal points are a data and power aware job scheduling system, as well as an I/O workload and workflow simulator that will allow us to stress-test our hardware and software developments. We believe that the new platform that is being developed in NEXTGenIO will be capable of delivering transformational performance across high performance and data intensive computing. NEXTGenIO is organised into eight work packages:
The Management work package deals with the day-to-day running of the project, as is responsible for the coordination of all technical tasks and the quality control of the project outputs.
WP1 is led by the EPCC, the Project Coordinator.
The Architecture work package focusses on defining the architecture of the complete NEXTGenIO solution. This involves, as a first step, capturing the requirements that the architecture needs to satisfy.
WP2 is led by EPCC.
WP3 Application Co-Design
The Application Co-Design work package looks at the I/O needs of the applications and workloads on HPC systems and provides feedback for the co-design process from the perspective of the applications and workloads that must eventually run on the NEXTGenIO architecture. Three end users will be considered: a general purpose HPC service provider (EPCC); a global numerical weather centre (ECMWF); and a commercial cloud service provider (Arctur).
WP3 is led by ECMWF.
WP4 Tools and Application Enabling
The Tools and Application Enabling work package is extending performance analysis and debugging tools, namely TUD's Vampir and Allinea's DDT and MAP, to exploit the non-volatile memory for their own performance, and to enable them to give feedback on user applications that run on an architecture with non-volatile memory.
WP4 is led by TU Dresden.
The Systemware work package is developing the software layer that sits between the hardware and the user level application (excluding OS and drivers). This work includes: extending the SLURM workload mamanger to be both energy and data locality aware; developing a data scheduler that moves data through the storage hierarchy; and developing the software layer that deals with storing data inside the NVRAM layer.
WP5 is led by BSC.
WP6 Prototype Hardware Development
The Prototype Hardware Development work package will build the NEXTGenIO hardware prototype using Intel's Optane DC Persistent MemoryTM technology and develop the low-level system software (e.g. drivers) necessary to use the hardware.
WP6 is led by Fujitsu.
WP7 Testing and Evaluation
The Testing and Evaluation work package is in charge of conducting systematic tests and a thorough evaluation of the project results, especially from WPs 4, 5 and 6, and use results of WP3 to define and maintain a suite of test cases and benchmarks. The work package will take care to give clear and constructive feedback to the development of the prototype system, of the OS-level and systemware components, and of the tools and APIs.
WP7 is led by Intel.
WP8 Dissemination and Exploitation
The Dissemination and Exploitation work package is responsible for ensuring that the project's target audience is aware of NEXTGenIO, and that its research outcomes can be exploited and have the desired impact.
WP8 is led by EPCC.