16.2 C
Tuesday, July 16, 2024

Los Alamos Partners with AirMettle for Efficient In-Storage Data Analysis

Must read

AirMettle and Los Alamos have published an open-source reference design with APIs for utilizing analytics in computational storage devices, enabling further scalability and efficiency.

A partnership between Los Alamos National Laboratory (LANL) and AirMettle offers a solution for efficiently analyzing highly dimensional data sets from large-scale simulation campaigns while protecting the stored data. Performing some parts of the analytics near data storage reduces the amount of data moved to perform the analysis — reducing both the cost of analytics and the time-to-scientific insight.

“Our scientific large-scale simulations can generate hundreds of petabytes of highly dimensional floating-point data,” said Gary Grider, High-Performance Computing division leader at Los Alamos. “But the data associated with a scientific feature of interest can be orders of magnitude smaller than the written data. So, a key challenge is quickly and efficiently finding what’s relevant in this sea of data. To optimize this process, we’ve been drawn towards computational storage — processing data in-place and near storage — to eliminate unnecessary data movement while maintaining parallelism and adequate data protection.”

Building on AirMettle’s Real-Time Smart Data Lake (RT-SDL) architecture, Los Alamos and AirMettle have defined a common Applications Programming Interface (API) to extend the Non-Volatile Memory Express standard for computational storage devices, empowering them to support in-place analytics. RT-SDL enables scalable analytics to be done near storage using standard interfaces like the S3 object storage interface and standard data formats like Apache Parquet while integrating rigorous data protection using erasure coding.

Scalable and Cost-Efficient Data Processing

In extending that technology, computational tasks will be delegated down to the device level so data can be processed far more scalable and power-efficiently. Reduction of the data near storage means a smaller analytics processing capability can also be used. These enhancements build on the benefits of AirMettle’s existing unique architecture.

“Accelerating analytics of vast volumes of experiment and simulation data is a key requirement and challenge for the scientific community,” said Donpaul Stephens, founder and CEO of AirMettle, Inc. “AirMettle’s RT-SDL is the first computational storage service with highly scalable in-place processing to accelerate analytics by 100 times or more and significantly reduce network traffic. Users can easily store and retrieve data in our object store via standard APIs. AirMettle strips this data across hundreds of storage nodes, eliminating hot spots for traditional storage access and high-speed parallel analytics.

Working with Los Alamos, AirMettle recently published an open-source reference design with APIs for utilizing analytics in computational storage devices, enabling further scalability and efficiency. 

More articles

Latest posts