Diffusion-LM, a non-autoregressive generative language model that allows for fine-grained control of the model’s output text, has been open sourced by Stanford University researchers. Diffusion-LM outperforms existing methods on controlled text generation tasks.
The model and experiments were described in an arXiv paper. Diffusion-LM is a generative language model that employs a plug-and-play control scheme in which the language model is fixed and the generated text is guided by an external classifier that determines how well the generated text matches the desired parameters. Users can specify the required parts of speech, syntax tree, and sentence length for the desired output. Diffusion-LM denoises a set of latent vectors iteratively during generation, with the external controller providing gradient updates to steer the latent vectors to generate the desired output. Diffusion-LM outperformed baseline methods on a set of control tasks “significantly.” According to the study’s findings,
Many autoregressive generative language models (LM), such as GPT-3, recursively generate text by predicting the next word in a sequence, then adding that word to the existing sequence and using the updated sequence as input for further prediction. These models can generate text that is indistinguishable from human-written text, and they can generate text to solve a variety of problems ranging from question-answering to interactive chat. However, providing any user control over the generated output, such as a desired sentence length, structure, or sentiment, is difficult.
One possible solution is to fine-tune the LM to accept an additional control input, but this update can be computationally intensive and may not generalise to handle multiple control parameters. Another approach is to use a plug-and-play technique that freezes the LM’s parameters and steers the generation with an external classifier that evaluates how close the generated output is to the desired parameters. Attempts to steer autoregressive models, on the other hand, have proven difficult.
Instead of attempting to steer an autoregressive LM, the Stanford researchers chose a new language generation technique: a diffusion model. These models have performed well in computer vision and other continuous domains, but have not been applied to text generation, which is a discrete domain. The team claims that Diffusion-LM is the first diffusion model for text generation.
The team modified the standard diffusion model in two ways to make Diffusion-LM work. First, they defined an embedding function that maps words into vectors in the diffusion model’s continuous latent space. Second, they defined a “rounding” method for converting these vectors to discrete words. To generate text, the model starts with a random vector in the latent space, which is treated as a noisy version of the embedding of the output sentence. The model then denoises it iteratively, passing the embedding to an external classifier at each step, which produces a gradient update of the embedding for the next iteration. After all iterations, the rounding method converts the final embedding to text output.
Diffusion-LM was tested on five classifier-guided text generation control tasks and compared to baseline methods using a GPT-2 autoregressive LM, with both plug-and-play and fine-tuning. Diffusion-LM outperformed the other plug-and-play methods on all five tasks; it also outperformed fine-tuning on two tasks while performing “similarly” on the other three. Diffusion-LM was also tested on an unguided text-infilling task against three different baseline models; it outperformed two of them and performed “comparably” to an autoregressive model specifically trained for infilling.