During the last decade, significant progress has been made in the field of Artificial Intelligence (AI) and AI has become more pervasive in our daily lives. The widespread usage and adoption of AI can be attributed to multiple factors, including Deep Learning (DL) a.k.a modern artificial neural networks, the availability of large volumes of data, and compute power to train DL models. More recently, Generative AI has caught the attention of the general public, thanks to OpenAI and the building of scalable, performant Large Language Models (LLMs). Generative AI has also been used to produce text, images, videos, programming code and music. There are multimodal models that generate images based on text descriptions (e.g., DALL·E) and vice versa and such innovations will continue to grow quite rapidly.
Advances in Generative AI
One important breakthrough in the application of a DL model was demonstrated during 2012 [1] to classify images into several different groups (ImageNet Large Scale Visual Recognition Challenge 2010). This has been followed by the use of DL for similar classification tasks in text and speech where the DL models significantly improved upon previously established benchmarks. These models were trained for specialized tasks and delivered state-of-the-art performance. The use of DL to generate a wide range of outputs has allured AI researchers. Generative Adversarial Networks [2], the landmark work in this direction, was conducted during 2014 where real-looking images of human faces and numbers were generated. This led to further research to develop Generative AI techniques in other domains.
The modeling of language has been a challenging task for AI. The goal of language models is to predict the next word given a sequence of words. The use of DL to pre-train LLMs was demonstrated in 2019 [3]. Generative pre-trained transformers (GPT) are the underlying technology that powers ChatGPT. These models have been trained on huge volume of text data by expending enormous compute power on Graphics Processing Units (GPU). The results of GPT-3/GPT-4 for tasks such as text summarization, question answering and code generation have been impressive.
Challenges for Generative AI models
DL models learn from training data and set the parameters of artificial neural networks to represent the view of the world as represented in data. These models are generally many orders of magnitude larger than the traditional machine learning (ML) models. The size of these networks and models can become a challenge when the amount of data available for training is small. Most real-world datasets have imbalance in the classes and may have (non-obvious) inherent bias. Techniques to train DL models to overcome these challenges have regularly been developed. Otherwise, they are prone to memorize the training data, also known as overfitting and the models may not be able to generalize for unseen data or provide biased results.
The Generative AI models are also prone to challenges inherent to DL techniques. In addition, the generative nature of the models can introduce artifacts in the generated data. For example, AI image generators struggle with hands. They could produce weird looking images that are hard to explain. Several approaches have been proposed to overcome these challenges [4]. This is also true for LLMs whose job is to predict the next word. They can produce wrong completions or provide wrong answers, given the data on which they are trained. Hence care must be taken to ensure that guardrails are in place, in particular, when they respond to human queries.
Paving the way to innovative applications
The early success of DL was demonstrated for specific tasks such as classification where the models were trained to be deep and narrow. In contrast, generative AI models tend to be broad and shallow. The initial applications of DL were designed to provide higher accuracy demanded by business requirements and AI researchers focused on improving these metrics. Generative AI has opened possibilities for use of AI in creative fields such as fashion designing, creative writing, and art generation. This will lead to broader use of AI in such high skill-intensive areas that are not touched by it so far. Further research will be guided by how these social communities adapt to use of AI and this can spur the growth of innovative applications.
Disclaimer: The views reflected in this article are the views of the author and do not necessarily reflect the views of the global EY organization or its member firms.
References
- Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton: ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012: 1106-1114.
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, Yoshua Bengio: Generative Adversarial Nets. NIPS 2014: 2672-2680.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT (1) 2019: 4171-4186.
- Makkapati, V., Patro, A. (2017). Enhancing Symmetry in GAN Generated Fashion Images. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXIV. SGAI 2017. Lecture Notes in Computer Science, vol 10630. Springer, Cham.