All You Need to Know About OpenAI’s New AI Called GPT-3
Robot folding the shirt

OpenAI is an AI development and deployment company with the mission to ensure that AI benefits all of humanity. A little over a year ago, OpenAI stunned the world by showing a dramatic leap with GPT-3 (generative pre-training), what appeared to be the power of computers to form natural-language sentences, and even to solve questions, such as completing a sentence, and formulating long passages of text people, found fairly human.

What exactly is GPT-3?

GPT-3 is so popular these days

Open AI’s GPT-3 is one of the biggest innovations of 2020 and the biggest advancement in AI thus far. GPT-3 is the fastest open-source NLP framework. GPT-3 uses a vast data bank of English sentences and highly powerful computer neural networks. Their main focus is to spot patterns and learn its own rules of how language operates.

GPT-3 as a successor to GPT-2

GPT-2 (a successor to GPT) was a direct scale-up of GPT, with more than 10x the parameters and trained on more than 10x the amount of data. It was trained simply to predict the next word in 40 GB of Internet text.

Due to OpenAI’s concerns about malicious applications of the technology, they did not release the trained model. They instead released a much smaller model for researchers to experiment with, as well as a technical paper.


GPT-2 was a large transformerbased language model with 1.5 billion parameters, trained with a simple objective: predict the next word, given all of the previous words within some text. The diversity of the dataset caused this simple goal to contain naturally occurring demonstrations of many tasks across diverse domains. 

Why is GPT-3 so powerful?

General observations about GPT-3

According to researchers in the original paper, with minor fine-tuning, GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks. It can also accomplish several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.


Surprisingly enough, it also can achieve, as the authors described, “meta learning”, which means GPT neural network doesn’t demand to be re-trained in order to perform a task such as sentence completion. The team also finds that “GPT-3 can construct samples of news articles which people will get conflict in order to recognize articles transcripted by humans.”

Let’s illustrate its power with an example

If you’ve been skimming Twitter account these days, you have most probably ran across Sharif Shameem, who posted another video in which he simply asks the machine, in English, for ‘the Google logo, a search box, and two light grey buttons that say “Search Google” and “I’m Feeling Lucky”’ Soon after the request, GPT-3 appears to create code for what it “thinks” (GPT-3 doesn’t think the same way as human brain) this should look like. When rendered, it resembled something like the Google homepage from ten years ago.

Sharif Shameem GPT-3 tweet
Viral GPT-3 tweet

How does GPT-3 work?

The logic behind GPT-3

To achieve such an outstanding result, GPT-3 has 175 billion parameters. In that case, a parameter is a measurement in a neural network that deploys a large or small weightage to a few aspects of data. GPT-3’s size and nature make it adaptable to all sorts of different tasks that involve any sort of language. Because it’s still such early days, there’s no way to tell which sorts of fields might benefit the most.


Looking holistically, it obtains outcomes on the SuperGlue benchmark. For other benchmarks like COPA and ReCoRD, the model lets down with WIC (word-in-context) analysis.

SuperGLUE benchmark

GPT-3 obtains outcomes on the SuperGLUE benchmark (Super General Language Understanding Evaluation). The latter offers a single-number metric that summarizes progress on a diverse set of language understanding tasks. However, the performance on the benchmark has recently come close to the level of non-expert humans, suggesting relatively limited headroom for further research. Performance on SuperGLUE increases with model size and number of examples in context.

COPA benchmark

For other benchmarks like COPA (The Choice Of Plausible Alternatives) evaluation provides researchers with a tool for assessing progress in open-domain commonsense causal reasoning. COPA consists of 1000 questions, split equally into development and test sets of 500 questions each. Each question is composed of a premise and two alternatives. The task is to select the alternative that more plausibly has a causal relation with the premise. The correct alternative is randomized so that the expected performance of randomly guessing is 50%.

ReCoRD benchmark

ReCoRD (Reading Comprehension with Commonsense Reasoning Dataset) is a large-scale reading comprehension dataset which requires commonsense reasoning. It consists of queries which are automatically generated from CNN/Daily Mail news articles; the answer to each query is a text span from a summarizing passage of the corresponding news. The goal of ReCoRD is to evaluate a machine’s ability of commonsense reasoning in reading comprehension. ReCoRD contains 120,000+ queries from 70,000+ news articles and a large portion of queries requiring commonsense reasoning. Thus, it is presenting a good challenge for future research to bridge the gap between human and machine commonsense reading comprehension.

WIC analysis

WIC (word-in-context) analysis:  By design, word embeddings are unable to model the dynamic nature of words’ semantics, i.e., the property of words to correspond to potentially different meanings. To address this limitation, dozens of specialized meaning representation techniques such as sense or contextualized embeddings have been proposed. However, despite the popularity of research on this topic, very few evaluation benchmarks exist that specifically focus on the dynamic semantics of words. Existing models have surpassed the performance ceiling of the standard evaluation dataset for the purpose. To address the lack of a suitable benchmark, they put forward a large-scale Word in Context dataset, called WiC, based on annotations curated by experts, for generic evaluation of context-sensitive representations.

Huge data bank required high-performance computing

Researchers, scientists, and developers can accelerate their HPC (High Performance Computing) applications using specialized libraries, directive-based approaches, and language-based models. HPC is one of the most essential tools fueling the advancement of science. By leveraging GPU-powered parallel processing across multiple compute nodes, it can run advanced, large-scale application programs efficiently, reliably, and quickly. This acceleration delivers a dramatic boost in throughput and cost savings, paving the way to scientific discovery.

HPC usage is growing
HPC usage

What's in store for the future?

As stated by Forbes: GPT-3 is ultimately a correlative tool. It cannot reason; it does not understand the language it generates.

“The GPT-3 hype is way too much … AI is going to change the world, but GPT-3 is just a very early glimpse.”– Sam Altman, OpenAI CEO

Want to check how GPT-3 could help grow your business? Let’s talk about how we could improve it.

Simon Sovič

Simon Sovič

Simon is a full stack startupper. With several years of startup experience ranging from being in the core team of a startup building a dual-sided marketplace with over 100k of investment to working as a business consultant in an incubator helping startups grow and develop, to co-organizing the biggest startup conference in the Alps Adriatic region.