Project develops ‘game-changing’ I/O capabilities in supercomputing

Share
NEXTGenIO workshop image

Over the last four years, the EU-funded Horizon 2020 NEXTGenIO project has designed and built a prototype hardware platform that promises massive gains in input/output (I/O) capabilities in supercomputing.

ECMWF’s contribution has included developing associated software that will be useful far beyond the project. The results of the project will feed into the Centre’s Scalability Programme.

“I/O limitations have become a bottleneck in supercomputing as applications require the rapid throughput of ever larger volumes of data,” says ECMWF project lead Tiago Quintino.

“The hardware that has been developed is game-changing compared to today’s systems. If you think of the storage systems in ECMWF’s high-performance computing facility as a big, slow truck, then the new hardware is a nimble racing car by comparison.”

The project ended on 1 October 2019 after a successful final workshop and hackathon hosted by ECMWF from 25 to 27 September.

Sep 19 NEXTGenIO workshop group photo

The workshop brought together the user community, hardware vendors and software tool providers. Full size photo

New technology

A few years ago, ECMWF learnt of new technology coming from Intel and Micron Technologies about novel non-volatile and very high-density memory (NVRAM).

“We seized on this as I/O requirements in the supercomputers used at ECMWF for numerical weather prediction are set to increase dramatically over the years to come,” Tiago explains.

The new technology promised an increase in memory from hundreds of gigabytes in today’s systems to up to 6 terabytes. Most importantly, the combination of high density with high throughput makes this technology a veritable storage class memory.

To help explore its potential, ECMWF became one of eight partners in the NEXTGenIO project, which was coordinated by EPCC, the supercomputing centre at the University of Edinburgh.

The goal was to build a computer server blade with the NVRAM technology for the high-performance computing (HPC) market. A prototype was delivered by project partner Fujitsu earlier this year and has since been tested successfully.

NEXTGenIO prototype node

The new hardware was built by project partner Fujitsu.

ECMWF’s contribution

The Centre contributed to the project in three ways.

In the initial phase, it helped to define the requirements for the hardware from the point of view of a potential user. “This was important because it enabled us to influence the design,” Tiago says.

The second contribution was to develop software to test and use the hardware.

Developing application software involved making changes to the I/O stack connected to ECMWF’s Integrated Forecasting System (IFS) so it could use the technology.

This led to the development of a new multi-I/O library that could work with the new hardware, the fields database version 5 (FDB5), which was made operational earlier this year.

NEXTGenIO prototype system

The hardware has been installed at the EPCC in Edinburgh, where it will stay as a supported research platform for the next three years. ECMWF has developed application software that can work with it.

To measure how useful the new technology is, ECMWF built a workflow simulator, called Kronos, which can test whether a piece of hardware meets certain performance benchmarks for specified workflows.

“This has already been used beyond the project to help define benchmarks for the procurement of our next high-performance computing facility,” Tiago notes.

Finally, ECMWF helped to evaluate the new hardware platform in terms of time to solution and energy to solution. “Performance is difficult to compare with current systems as the new hardware does not necessarily fulfil the same function,” Tiago says. “But node by node we found a performance increase by a factor of up to 30.”

Final workshop

The final project workshop in September brought together the user community, hardware vendors and software tool providers.

One of the keynote speakers was Balint Fleischer, who had led the team at Intel that helped to develop the technology and who now works for Micron Technologies.

Balint Fleischer at the Sep 19 NEXTGenIO workshop

Balint Fleischer opened the workshop with a talk on ‘Accelerating time to insight through advanced memory-centric system architectures’.

“I really enjoyed this workshop,” Tiago says. “The fact that it was relatively small meant that we had excellent discussions on where this technology is going and how it’s going to make a difference.”

On the last day of the workshop, internal and external participants came together for a hackathon to work with the new prototype. This was an opportunity for interested scientists to learn how to make the most of the new technology.

Further details on the workshop and all presentations and recordings can be found on the workshop page on the ECMWF website.

NEXTGenIO Sep 19 hackathon

Internal and external participants took part in the NEXTGenIO hackathon at ECMWF on 27 September.

Link to artificial intelligence

Tiago is upbeat about the prospects for the new technology.

“This is really just the beginning. What we are seeing is a shift in technology potentially similar in size to that brought about by the shift from CPUs to GPUs in supercomputing architectures.”

He adds that the new hardware is particularly well-suited for artificial intelligence models, fast access to large datasets, and computations close to the data.

Beyond the availability of powerful new technology, participation in the project has brought wider benefits for ECMWF.

“Our requirements have become better known in the world of supercomputing innovation, and this will help to ensure that new HPC developments take into account the needs of numerical weather prediction,” Tiago says.

“Last but not least, ECMWF is now much better prepared for the arrival of this type of disruptive technology.”

For more information on NEXTGenIO, visit the project website.