The Copernicus Climate Data Store (CDS) and Copernicus Atmosphere Data Store (ADS) services are the backbone technical components supporting the implementation of the EU’s Copernicus Climate Change Service (C3S) and Atmosphere Monitoring Service (CAMS) run by ECMWF. These CDS and ADS services have recently been modernised and brought together in a Data Stores Service. This encompasses a wide range of software and data products, which now integrates with, benefits from and contributes to the more comprehensive ECMWF Software EnginE (ESEE). The ESEE is used to provide transversal data provision and workflow services across the portfolios of ECMWF, Copernicus, and the EU’s Destination Earth initiative, to which ECMWF contributes. The Data Stores Service is now hosted and running on the ECMWF Common Cloud Infrastructure (CCI). The modernised Data Stores have evolved from the original CDS infrastructure and capitalised on the experience and know-how gained from it. At the same time, they rely on open-source and cutting-edge technologies fully aligned with ECMWF’s Strategy.
Key components
The operational Data Stores Service is split into two main layers, which have different functions and cloud requirements: Data Repositories and Software Services. These layers are integrated by modular components sharing common interfaces and acting jointly to perform the full range of capabilities exposed to users. Similar to what happens for other services provided at ECMWF, such as the European Weather Cloud (EWC), the modernised Data Stores Service is deployed and runs on the CCI. This ensures the elastic provision of resources for further scalability of different components, facilitates integrated management across hardware and software, and strengthens synergies with different EU cloud-related initiatives in which both ECMWF and EUMETSAT participate, such as WEkEO. Layers of the Service can be described as follows:
- Data Repositories: These are the foundational base of the Data Stores Service. This layer encompasses a broad range of distributed data products made available to users as part of the Services catalogue portfolio. External repositories are those hosted outside the CCI, while internal repositories are hosted within the CCI. Internals are managed as part of the Data Stores and are optimised to be accessed by its different components. The internal data repositories include an instance of ECMWF’s online Meteorological Archival and Retrieval System (MARS), where a subset of the most requested variables from the core MARS archive are regularly uploaded. It also includes an observations repository; other small on-disk datasets from C3S and CAMS; and an experimental ARCO (Analysis Ready, Cloud Optimized) data lake to improve visualisation and interactivity of catalogued data on the WEkEO platform and as an extension to address the needs of demanding machine learning/artificial intelligence (ML/AI) and visual-interactive applications.
- Software Services: This layer integrates all the different software components which deploy and run together to support the operational functioning of the Data Stores Service. The functional baseline of the Data Stores is to provide a seamless user journey from searching, discovering and sub-setting to retrieving data, via interactive and programmatic interfaces. These services encompass different software applications, which deploy and run on dedicated servers and clusters within the CCI. They include the Core Data Stores Engine, the Evaluation and Quality Control (EQC) function, monitoring and metrics, the Copernicus Services web portals, interactive applications, different micro-services deployed and operated by third parties as part of contractual agreements, and soon a JupiterHub development environment where users will be able to perform computation and visualisation on top of the data, using a set of preconfigured expert tools.
Data Stores software infrastructure is based on a plug-in architecture, which makes it possible to share components to power other platforms, but also to integrate third-party components to complement functional areas of the system. This layer supports interactive and programmatic interfaces (APIs), which are highly configurable and serve as an entry point for users to the data repositories and other standard services offered by the system. Most of this software has been developed directly on the CCI as part of the modernisation project. Data Store software components are cloud-oriented and ready to deploy and run in different clouds as backend engines, ‘powering’ other services and platforms.
The CDS for C3S, the ADS for CAMS, and the recently launched Early Warning Data Store (EWDS) for the EU’s Copernicus Emergency Management Service (CEMS) are the well-known public-facing interfaces of the Data Stores Service.
On the periphery of the Data Stores, but closely interlinked with them, there is a layer made of components which complement the content and functional scope of the Data Stores. Of special relevance are the following:
- earthkit: the ECMWF open-source Python code repository offering a broad set of expert libraries optimised to work with ECMWF and Data Store data resources. Within the Data Stores, earthkit libraries are used to foster data formatting, processing and visualisation capabilities.
- Visual Interactive Content (VIC): this includes a broad set of user-oriented applications and training material that showcases or makes use of the full range of data resources and capabilities of the Data Stores and earthkit via friendly interfaces. VICs can deploy and run anywhere. This enables user communities to discover, learn and interact with the available data and functionalities. The recently launched Copernicus Interactive Climate Atlas is an example of a VIC.
- External platforms: this includes a very broad ecosystem of platforms and infrastructures ‘powered by’ the Data Stores. These platforms may interact with or consume data resources via the exposed interfaces, embed technical components or integrate VIC as part of their portfolio. The Copernicus WEkEO DIAS Platform – a partnership of ECMWF, the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT), Mercator Ocean and the European Environment Agency (EEA) – is an example of such a platform.
Like an iceberg
The ECMWF Data Stores Service is a complex, multi-layer system that can be conceptually likened to an iceberg: simple interfaces on the surface, with a robust and scalable backend underneath providing seamless access to a broad set of catalogued resources.
The modernisation of the former Data Stores has touched all the different layers of the architecture. It became necessary to overcome the obsolescence of former components and make the system evolve in a changing and highly demanding environment. The Data Stores Service will thus play a central role as an efficient, versatile and trusted transmission link between big, distributed and heterogeneous data sources and increasingly sophisticated user requirements, driven by the development of new technologies and the need for immediate data and information.