EleutherAI, a collective of volunteer researchers, have open-sourced GPT-J, a six billion parameter natural language processing (NLP) AI model based on GPT-3. The model is said to be trained on 800GB open source text dataset, developed by Aran Komatsuzaki.
The developer has released GPT-J, 6B JAX-based (Mesh) and Transformer LM (Github). He has mentioned that GPT-J performs nearly on par with 6.7B GPT-3 on various zero-shot down-streaming tasks. The model was trained on EleutherAI’s Pile dataset using Google Cloud’s v3-256 TPUs, training for approximately five weeks.
“GPT-J allows more flexible and faster inference than Tensorflow + TPU counterparts. This project required a substantially smaller amount of person-hours than other large-scale model developments did, which demonstrates that JAX + xmap + TPUs is the right set of tools for quick development of large-scale models,” reads a report in the developer’s blog.
In response to a twitter user’s query on hardware requirements, Komatsuzaki replied, “For inference, in principle you can modify the code to run it on any hardware that can hold a bit more than 12GB of memory. Best throughput can be achieved with TPUs, in which case you can just run as is. Fine-tuning is more demanding: you need at least TPU v3-8 to do that.”
When concerns were raised about the misuse of the model, Connor Leahy, EleutherAI co-founder justified saying, “ There is significant, important safety research that can only be done with access to large, pretrained models. We would like to make such research possible and easy for low-resource researchers. It is very unclear if and when such models will start to exhibit far more powerful and dangerous capabilities.”
The GPT-J code and model is available on GitHub.