In all of our previous tasks, we were training a neural network to perform a certain task using labeled dataset. With large transformer models, such as BERT, we use language modelling in self-supervised fashion to build a language model, which is then specialized for specific downstream task with further domain-specific training. However, it has been demonstrated that large language models can also solve many tasks without ANY domain-specific training. A family of models capable of doing that is called GPT: Generative Pre-Trained Transformer.
The idea of a neural network being able to do general tasks without downstream training is presented in Language Models are Unsupervised Multitask Learners paper. The main idea is the many other tasks can be modeled using text generation, because understanding text essentially means being able to produce it. Because the model is trained on a huge amount of text that encompasses human knowledge, it also becomes knowledgeable about wide variety of subjects.
Understanding and being able to produce text also entails knowing something about the world around us. People also learn by reading to the large extent, and GPT network is similar in this respect.
Text generation networks wor;k by predicting probability of the next word
You can read more about probabilities in our Data Science for Beginers Curriculum
Quality of language generating model can be defined using perplexity. It is intrinsic metric that allows us to measure the model quality without any task-specific dataset. It is based on the notion of probability of a sentence - the model assigns high probability to a sentence that is likely to be real (i.e. the model is not perplexed by it), and low probability to sentences that make less sense (eg. Can it does what?). When we give our model sentences from real text corpus, we would expect them to have high probability, and low perplexity. Mathematically, it is defined as normalized inverse probability of the test set: $$ \mathrm{Perplexity}(W) = \sqrt[N]{1\over P(W_1,...,W_N)} $$
You can experiment with text generation using GPT-powered text editor from Hugging Face. In this editor, you start writing your text, and pressing [TAB] will offer you several completion options. If they are too short, or you are not satisfied with them - press [TAB] again, and you will have more options, including longer pieces of text.
GPT is not a single model, but rather a collection of models developed and trained by OpenAI.
Under the GPT models, we have:
GPT-2 | GPT 3 | GPT-4 |
---|---|---|
Language model with upto 1.5 billion parameters. | Language model with up to 175 billion parameters | 100T parameters and accepts both image and text inputs and outputs text. |
The GPT-3 and GPT-4 models are available as a cognitive service from Microsoft Azure, and as OpenAI API.
Because GPT has been trained on a vast volumes of data to understand language and code, they provide outputs in response to inputs (prompts). Prompts are GPT inputs or queries whereby one provides instructions to models on tasks they next completed. To elicit a desired outcome, you need the most effective prompt which involves selecting the right words, formats, phrases or even symbols. This approach is Prompt Engineering
This documentation provides you with more information on prompt engineering.
✍️ Example Notebook: Playing with OpenAI-GPT
Continue your learning in the following notebooks:
New general pre-trained language models do not only model language structure, but also contain vast amount of natural language. Thus, they can be effectively used to solve some NLP tasks in zero-shop or few-shot settings.