Skip to content Skip to navigation

Approaches to I/O Scalability Challenges in the ECMWF Forecasting System

Author: 
Florian Rathgeber

Florian Rathgeber of ECMWF presented this talk at the Platform for Advanced Scientific Computing 2016 (PASC’16) in Lausanne on 9th June 2016.

As the resolution of the forecasts produced by ECMWF's Integrated Forecast System (IFS) is refined, the amount of data involved continues its geometric growth. Current peak loads already require an otherwise oversized parallel storage filesystem (Lustre). The data volume is expected to grow 6-fold by 2020, to reach 120TB/day, concentrated in short 1 hour bursts. Moreover, this data requires post-processing to create the final forecast products sent to end-users, introducing a further I/O bottleneck. Realizing these challenges, ECMWF's Scalability Programme aims to redesign the data workflow to minimize I/O in the time-critical path, whilst retaining resilience to failures. The authors are investigating multiple solutions to tackle issues of data locality, data volume and overall resilience. Solutions range from a novel NVRAM hardware co-design effort inside the EU funded NEXTGenIO project, to the use of distributed object storage technologies and a new dynamic worker-broker solution for managing the post-processing workload.

You can see slides from Florian's talk below.