The Complete Magazine on Open Source

Facebook releases open source FastText to let anyone classify bulk of text

, / 3401 0


Taking its artificial intelligence (AI) developments to next levels, Facebook has released FastText for the public. The open source library of code intelligently identifies words and enables text classification with ease.

FastText is not a new thing for the social network as it is already easing the experience for over a billion Facebook users through some successful concepts of natural language processing and machine learning. The library classifies text from a large database and learns word vector representations. There is also a hierarchical softmax to leverage the unbalanced distribution of various classes and fasten the process of text classification.

“With FastText, we were often able to cut training times from several days to just a few seconds, and achieve state-of-the-art performance on many standard problems, such as sentiment analysis or tag prediction,” the Facebook AI Research (FAIR) lab team wrote in a blog post.

Facebook is using a hierarchical classifier to work on datasets with wide categories efficiently. This cuts down the time complexities of training and testing different text classifiers. Additionally, the library is leveraging a Huffman algorithm to balance the classes.

FastText is claimed to have trained on more than a billion words in less than ten minutes using a standard multicore CPU. Also, it can classify a half-million sentences among over 300,000 categories in less than five minutes. The library is not limited to English and can work with international languages such as Czech, French and German.

You can utilise FastText for your next project directly through GitHub. It requires Python, Numpy and Scipy to enable the word-similarity evaluation script on your end.