Facebook Open-sources LASER for Faster Natural Language Processing Development

0
1400
Advertisement

The code for LASER, posted on GitHub, provides an “encoder-decoder” neural network that’s built using Long Short-Term Memory neural nets, a workhorse of speech and text processing.

Did You Unfriend Your Freedom with Facebook?

Facebook Inc. has taken another big step to accelerate the transfer of natural language processing (NLP) applications to many more languages.

The social media giant, on Tuesday, announced that it’s open-sourcing a new PyTorch tool called LASER, which stands for Language-Agnostic Sentence Representations.

Advertisement

The toolkit works with more than 90 languages, written in 28 different alphabets, according to Facebook.

Facebook researcher Holger Schwenk wrote in a blog post, “LASER achieves these results by embedding all languages jointly in a single shared space (rather than having a separate model for each). We are now making the multilingual encoder and PyTorch code freely available, along with a multilingual test set for more than 100 languages.”

One single model to handle variety of languages

In December, Facebook had released a research report titled, “Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond.” The report described how Facebook engineers trained a single neural network model to represent the structure of 93 languages in 34 separate alphabets.

Facebook eventually developed a “single representation,” a mathematical transformation of sentences in the form of vectors that encapsulates structural similarities across the 93 languages. That single representation was then used to train the computer on multiple tasks where it had to match sentences between pairs of languages it had never seen before, such as Russian to Swahili, a feat known in the trade as “zero-shot” language learning.

The code for LASER, posted on GitHub, provides an “encoder-decoder” neural network that’s built using Long Short-Term Memory neural nets, a workhorse of speech and text processing.

According to Schwenk, LASER opens the door to performing zero-shot transfer of NLP models from one language, such as English, to scores of others — including languages where training data is extremely limited.

“LASER is the first such library to use one single model to handle this variety of languages, including low-resource languages, like Kabyle and Uighur, as well as dialects such as Wu Chinese,” he said

Advertisement

LEAVE A REPLY

Please enter your comment!
Please enter your name here