Large models like GPT-3 can perform a variety of tasks with little instruction. That said, one of the challenges in working with these models is determining the right way to do something.
GPT-3 has acquired knowledge from its training data as well as another kind of “intelligence” from learning the various relationships between concepts in that data. While it learned that bats are a kind of mammal from reading articles and educational content, it also learned the concept of taxonomies – that one thing can be part of a large group:
Bat -> Mammal -> Animal
The ability to understand how the connections between things can affect other things is what’s sometimes called “fluid” intelligence in psychology (a description made both by Jean Piaget and Raymond Cattell.)
Hard factual information like ‘bats are mammals’ is what Cattell called “crystalline” intelligence. The underlying facts may change, but our ability to make logical deductions about information doesn’t. Our fluidness comes from our ability to adapt to new information.
Like us, GPT-3 has its own form of fluid and crystalline intelligence. The crystalline part is all of the facts it has accumulated and the fluid part is its ability to make logical deductions from learning the relationships between things.
In theory you could build an intelligent and logical AI by only showing it fiction. Our bat example works equally well with something more abstract like Harry Potter. Harry is a member of Gryffindor and Gryffindor is a group at a school called Hogwarts:
Harry Potter -> Gryffindor -> Hogwarts
Give a model enough fictional books to consume and it will become pretty smart and teach itself how to make logical deductions about information. However, it won’t be very knowledgeable about our world.
But calling it unintelligent wouldn’t be accurate. Ben Franklin was an extremely intelligent person, but he’d be the last person you’d want to ask about 20th Century history. Unless you gave him a few books to catch up on.
The same can be said for models like GPT-3. They’re trained on a large amount of facts and have quite a bit of knowledge, but they’re limited by what information they’re trained with, when they were trained and the fact that their universe includes both Harry Potter and science knowledge – sometimes intermingled.
If you want to get reliable crystalline knowledge you either need to include basic facts in the text you send it or give it a dataset to train on via fine-tuning – a method that lets the model add a new layer of understanding.
Using methods like this, GPT-3 does extremely well in grasping the context of new information and can answer questions about it and even summarize it for different audiences – using its fluid knowledge to increase its crystalline intelligence.
Like someone from another planet or a different time who doesn’t know about our world, large models usually need more context. But once they have that, they can be extremely smart and do very interesting things.
Aligning Language Models to Follow Instructions
Improving Language Model Behavior by Training on a Curated Dataset
WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing