The Apache Software Foundation has released Apache MADlib as a new top-level project that helps deliver scalable in-database analytics. The new release is a result of discussions between database engine developers, data scientists, IT architects and academics who were looking for advanced skills in the field of data analysis.
Apache MADlib provides parallel implementations of machine learning, graph, mathematical and statistical methods for structured and unstructured data. It was initially a part of the Apache Incubator.
“During the incubation process, the MADlib community worked very hard to develop high-quality software for in-database analytics, in open and inclusive manner in accordance with the Apache Way,” said Aaron Feng, vice president of Apache MADlib.
Starting from automotive and consumer to finance and government, MADlib has been deployed by various industry verticals. It helps to deliver detailed analytics on both structured and unstructured data using SQL. This ability makes the open source solution an important offering for various machine learning projects.
“We have seen our customers successfully deploy MADlib on large-scale data science projects across a wide variety of industry verticals,” said Elisabeth Hendrickson, vice president of R&D for data, Pivotal.
Apache MADlib is available with Apache License 2.0. A project management committee (PMC) helps its daily operations and community development.