The Smart Cube offers a range of custom research and analytics services to its clients, and relies greatly on open source to do so. The UK-headquartered company has a global presence with major bases in India, Romania, and the US, and employs more than 650 analysts around the world.
“The whole analytics ecosystem is shifting towards open source,” says Nitin Aggarwal, vice president of data analytics, The Smart Cube. Aggarwal is leading a team of over 150 developers in India, of which 100 are working specifically on open source deployments. The company has customers ranging from big corporations to financial services institutions and management consulting firms. And it primarily leverages open source when offering its services.
Aggarwal tells Open Source For You that open source has helped analytics developments to be more agile in a collaborative environment. “We work as a true extension of our clients’ teams, and open source allows us to implement quite a high degree of collaboration. Open source solutions also make it easy to operationalise analytics, to meet the daily requirements of our clients,” Aggarwal states.
Apart from helping increase collaboration and deliver operationalised results, open source reduces the overall cost of analytics for The Smart Cube, and provides higher returns on investments for its clients. The company does have some proprietary solutions, but it uses an optimal mix of open and closed source software to cater to a wide variety of industries, business problems and technologies.
“Our clients often have an existing stack that they want us to use. But certain problems create large-scale complex analytical workloads that can only be managed using open source technologies. Similarly, a number of problems are best solved using algorithms that are better researched and developed in open source, while many descriptive or predictive problems are easily solved using proprietary solutions like Tableau, QlikView or SAS,” says Aggarwal.
The Smart Cube team also monitors market trends and seeks customer inputs at various levels to evaluate new technologies and tools, adjusting the mix of open and closed source software as per requirements.
The challenges with analytics
Performing data analysis involves overcoming some hurdles. In addition to the intrinsic art of problem solving that analytics professionals need to have, there are some technical challenges that service providers need to resolve to examine data. Aggarwal says that standardising data from structured and unstructured information has become challenging. Likewise, obtaining a substantial amount of good training sets is also hard, and determining the right technology stack to balance cost and performance is equally difficult.
Community solutions to help extract data
Aggarwal divulges various community-backed solutions that jointly power the data extraction process and help to resolve the technical challenges involved in the data analysis process. To serve hundreds of clients in a short span of time, The Smart Cube has built a custom framework. This framework offers data collection and management solutions that use open source. There is Apache Nutch and Kylo to enable data lake management, and Apache Beam to design the whole data collection process.
The Smart Cube leverages open source offerings, including Apache Spark and Hadoop, to analyse the bulk of extracted structured and unstructured data. “We deal with data at the terabyte scale, and analysis of such massive data sets is beyond the capability of a single commodity hardware. Traditional RDBMS (relational database management systems) also cannot manage many types of unstructured data like images and videos. Thus, we leverage Apache Spark and Hadoop,” Aggarwal says.
The Smart Cube is one of the leading service providers in the nascent field of predictive analytics. This type of analytics has become vital for companies operating in a tough competitive environment. Making predictions isn’t easy. But open source helps on that front as well.
“A wide variety of predictive analytics problems can be solved using open source. We take support from open source solutions to work on areas like churn prediction, predictive maintenance, recommendation systems and video analytics,” says Aggarwal. The company uses Scikit-learning with Python, Keras and Google’s TensorFlow, to enable predictive analysis and deep learning solutions for major prediction issues.
Additionally, in September 2017, The Smart Cube launched ‘Concept Lab’ that allows the firm to experiment at a faster pace, and develop and test solution frameworks for client problems. “This approach, enabled by opting for open source, has gained us a lot of traction with our corporate clients, because we are able to provide the flexibility and agility that they cannot achieve internally,” Aggarwal affirms.
Open source is projected to help data analytics companies in the future, too. “We expect open source to dominate the future of the analytics industry,” says Aggarwal.
The Smart Cube is foreseeing good growth with open source deployments. Aggarwal states that open source will continue to become more mainstream for data analytics companies and will gradually replace proprietary solutions. “Most of the new R&D in analytics will continue to be on open source frameworks. The market for open source solutions will also consolidate over time as there is a huge base of small players at present, which sometimes confuses customers,” Aggarwal states.
According to NASSCOM, India will become one of the top three markets in the data analytics space, in the next three years. The IT trade body also predicts that the Big Data analytics sector in the country will witness eight-fold growth by 2025, from the current US$ 2 billion to a whopping US$ 16 billion.
Companies like The Smart Cube are an important part of India’s growth journey in the analytics market, and will influence more businesses to opt for open source in the future.