‘Data is more valuable than gold’. This is the mantra of modern computing. Enormous amounts of data are being generated every minute throughout the world. The entry of AI and ML has facilitated the processing of this data and its use in the enterprise as well as in various other fields. Here is a bird’s eye view of trending open source tools for AI and ML.
2018 can well be remembered as the year where data first demonstrated its dominance, with a visible impact not only on science and technology but also on global politics and socioeconomic conflict, especially in developing nations. While we witnessed the trouble it can foment, we were made painfully aware of the terrible costs incurred if these tools are used unethically. On the whole, this article seeks to adopt an objective view as we look at how artificial intelligence (AI) is maturing, backed by large scale research efforts across the world from Silicon Valley in the West to China in the East.
Top machine learning (ML) frameworks
The stalwart across the field remains Google’s TensorFlow that provides an enterprise-grade system to train, test and deploy deep neural networks at scale. It has steadily grown and is supported by an ecosystem of visualisation, data manipulation and interpretability tools that make it a ubiquitous solution when it comes to scalable machine learning. With the added support of Keras integration, Google is now trying to shorten the learning period for developers to work with TensorFlow.
Last year we saw the emergence of PyTorch as one of the frameworks preferred by machine learning researchers who often chose not to use the dominant TensorFlow, given the flexibility and features in the younger, lightweight, open source, deep learning library supported by and extensively used by Facebook. Most comparisons of state-of-art frameworks are focused on TensorFlow and PyTorch, arguably given their strong adoption rate in academia and industry, shadowing the others like Caffe, Theano and Microsoft’s Cognitive Toolkit (CNTK). Following these, there’s also the Apache MXnet project with the Gluon interface, which seeks to provide simple and quick building blocks that allow users to speedily prototype deep learning models.
Scikit-learn remains a widely used open source framework to prototype and deploy classifiers for machine learning, but is more focused on providing a ‘workbench’ in order to avoid the boilerplate code that presents a challenge for picking up frameworks like TensorFlow and PyTorch.
We do have Spark MLib and CNTK in use across enterprises. Netron, a popular visualisation library for neural networks, now also supports CNTK while Spark MLib is seeing steady adoption as companies start out with building scalable data streaming pipelines. In combination with Mahout and Apache’s other products for Big Data management and architecture, Apache has released SystemML as an addition to its repertoire of open source tools at the intersection of Big Data and machine learning.
Libraries such as Fast.ai’s recently released software have advanced the state-of-the-art in some disciplines within natural language processing. Edward has been released as a probabilistic programming toolkit built atop TensorFlow (soon to be integrated within it), while Lime is another library supporting greater interpretability for deep neural networks. All these are seeing increased use as issues of privacy, ethics, and understanding of biases in data acquire greater importance within the industry. Many traditional applications also rely on machine learning capabilities in Java and R via frameworks such as deeplearning4j. Overall, the AI and ML space is bustling with developments that one needs to follow, and change seems to be the only constant.
Top tools for artificial intelligence in the cloud
‘Artificial Intelligence-as-a-Service’ is trending, especially because small-scale companies do not wish to do the heavy lifting of setting up end-to-end data pipelines, but would prefer to focus on each stage of the preprocessing, training and deployment processes. For instance, Amazon and Google’s cloud platforms offer a set of endpoints to address machine learning on streaming data. In fact, their recent offerings like Google Cloud AutoML and the Amazon Web Services SageMaker focus on transferring control into the user’s hands by introducing more interpretability; but they still have some distance to travel when considering the level of automation and performance across heterogenous data sets.
Following Rekognition in 2017, Amazon has ramped up focus on natural language processing, automatic speech recognition, text-to-speech services, and neural machine translation technologies as managed services in the cloud. The company introduced video and image analysis using DeepLens, making it easier for developers to access these as desired.
Top Web frameworks for machine learning
Web frameworks have been all the rage in machine learning as neural networks have reduced in size and gained sufficient accuracy when compared to human standards. One of the forerunners in this space is Andrej Karpathy’s ConvNet.js. This inspired similar work or parallel lines of thought that later resulted in libraries based in JavaScript, which can be run as part of server-side or client-side scripts. The recent release of TensorFlow.js is a solid step forward in this direction, extending the ecosystem for developers seeking to bring the machine learning experience to the browser. There are other frameworks, including ml5js, focused on offering a complete set of in-browser machine learning capabilities.
Top ML tools for mobile app developers
‘Data is more valuable than gold these days’. All mobile app developers want to integrate advanced analytics including machine learning systems into data processing, to enable them to generate more accurate insights and make decisions based on the ‘big picture’ of the user statistics within their apps. It becomes a very lucrative market to capture as app developers seek custom solutions for their use cases, which focus on an unchanged user experience in spite of a huge amount of processing in the backend.
Google is capitalising on its expertise in developing and supporting TensorFlow by releasing ML Kit which caters to the Android market, often with specific requirements of low-memory impact and low-resource learning, on the fly. This comprises specific libraries that address text and face recognition, bar code scanning, image labelling and face detection, and will soon see a foray into natural language processing, with support for the smart reply feature seen in its other products including Gmail.
Apple, meanwhile, is playing catch-up with its CoreML library. The advantage of the competition within this space has been the release of a huge repository of lightweight models optimised for mobile devices that permit the end user to continue to have a streamlined experience on handheld devices.
Overall, the machine learning landscape is growing increasingly crowded with new frameworks emerging and older ones fading; but a huge amount of work is focused on working in conjunction and promoting a much healthier environment for developers and researchers. This bodes well for the end users as we note the emergence of mature, capable and interpretable open source machine learning tools for use cases spanning the tech landscape and beyond.