- This open source project is the culmination of a strategy by NVIDIA and BlazingSQL.
- Processing data at scale is expensive, slow and incredibly complex; BlazingSQL aims to address these common challenges around the analytics pipelines.
- BlazingSQL provides GPU-accelerated results in seconds, allowing data scientists to quickly iterate over new models.
BlazingSQL, a GPU-accelerated SQL engine built on the RAPIDS ecosystem, has now become an open source project. It aims to support large scale data science workflows and enterprise datasets.
RAPIDS is a suite of open source software libraries for executing end-to-end data science and analytics pipelines entirely on GPUs.
“BlazingSQL is not a database, which is why we changed our original name of BlazingDB to BlazingSQL. It is a SQL engine that processes (almost) any data you want,” Rodrigo Aramburu, CEO of BlazingSQL, wrote in a blog post.
Processing data at scale is expensive, slow and incredibly complex. According to the team, BlazingSQL was built to address these common challenges their customers face around their analytics pipelines.
“BlazingSQL addresses these customer concerns not only with an incredibly fast, distributed GPU SQL engine, but also a zealous focus on simplicity,” Rodrigo said.
“With a few lines of code, BlazingSQL can query your raw data, wherever it resides and interoperate with your existing analytics stack and RAPIDS,” he added.
NVIDIA -BlazingSQL partnership
Rodrigo shared that this open source project is part of a strategy between NVIDIA’s RAPIDS team and BlazingSQL that brought more than 100 developers to contribute to the SQL Engine.
“NVIDIA and the RAPIDS ecosystem are delighted that BlazingSQL is open-sourcing their SQL engine built on RAPIDS,” said Josh Patterson, GM of data science at NVIDIA.
“By leveraging Apache Arrow on GPUs and integrating with Dask, BlazingSQL will extend open-source functionality, and drive the next wave of interoperability in the accelerated data science ecosystem,” he added.
How BlazingSQL solves major challenges in analytics pipelines
- Expensive — Customers cluster thousands of servers together for data science at scale. BlazingSQL + RAPIDS requires a small fraction of the infrastructure to run at an equivalent scale.
- Slow — Workloads and queries can take hours or days on large data sets. BlazingSQL + RAPIDS provides GPU-accelerated results in seconds, allowing data scientists to quickly iterate over new models.
- Complex — Workloads are prototyped at small scale and then rebuilt for distributed systems. BlazingSQL + RAPIDS enables users to write code once and dynamically change the scale of distribution with a single line of code.
“BlazingSQL, in addition to contributing heavily to the RAPIDS ecosystem, will focus on the services and support agreements necessary to make RAPIDS + BlazingSQL deployments successful and accessible to all,” wrote Aramburu.
The source code of BlazingSQL has been released on Github under Apache 2.0 license.