Soda, the open source data reliability tools provider announced to have released Cloud Metrics Store, providing advanced testing-as-code capabilities to enable data teams to get ahead of data issues.
Available to all users of Soda’s Open Source (OSS) tools, Cloud Metrics Store is said to capture historical information about the health of data to support the intelligent testing of data across every workload.
According to the company, without a clear strategy to monitor data for quality issues, many organisations fail to catch the problems that can leave their systems exposed and can result in serious downstream issues. Inspired by modern software engineering principles, Soda is giving data teams the tools to create a culture and community of good data practice through a combination of the Soda Cloud Data Observability Platform and its OSS data reliability tools, built by and for data engineers.
The Soda global data community already counts Disney, HelloFresh, and Udemy as major contributors to have deployed Soda’s data reliability tools.
With this latest OSS release, Cloud Metrics Store gives data and analytics engineers the ability to test and validate the health of data based on previous values. These historical metrics allow data tests to use a baseline understanding of what good data looks like, with any bad data efficiently quarantined for inspection before it impacts data products or downstream consumers. Alerts are sent via popular on-call tools or Slack, so that data teams are the first to know when data issues arise, and can swiftly resolve the problem.
Soda’s data reliability tools work across the data product lifecycle. This means that it is straightforward for data engineers to test data at ingestion using Soda, and for data product managers to validate data before it is consumed in tools such as Snowflake, the firm said in a press release.
All checks can be written ‘as-code’ in an easy-to-learn configuration language. Configuration files are version controlled, and used to determine which tests to run each time new data arrives into a data platform. Soda supports every data workload, including data infrastructure, science, analysis, and streaming workloads, both on-premise and in the cloud.
“It’s advantageous for data teams to unify around a common language that allows them to specify what good data looks like across the data value chain from ingestion to consumption, irrespective of roles, skills, or subject matter expertise. Most data teams are organised by domain, and when creating data products, they often depend on each other to provide timely, accurate, and complete data,” explains Maarten Masschelein, CEO, Soda.