How large is GPT-3’s short term memory?

TL;DR: In an API call GPT-3 can recall details from a 1,500 word article and even repeat passages verbatim. It can also repeat over 250 items from a list as it creates a completion.

The concept of memory with a large language model can be a little fuzzy. There’s how much information the model possesses (GPT-3’s total size is 175 billion parameters) and what it’s able to process in an API call. Let’s take a look at what GPT-3 can remember from the text sent to the API.

While GPT-3 can handle a total number of 2048 tokens (roughly 1,500 words) and pick up information from that text, making that work in an application requires understanding how GPT-3 sees lists and blocks of text.

Information recall

You can experiment with GPT-3’s recall capabilities with a few different tests. If I give it a 1,500 word article from WIRED that it never had the opportunity to train on and ask it questions about the content and who said what, GPT-3 is able to recall details with precision: Telling you what university a researcher works for, the names of key people and even quoted numbers is easily accomplished.

Beyond pulling small facts from the text, you can even get GPT-3 to recall specific passages by quoting sections and asking it what comes next. It’s actually a bit spooky how much GPT-3 is able to recall with the right trigger.

If you have access to GPT-3 you can test this by giving it an article and then asking questions about the content or adding a query like this to the end:

Question: What sentence follows: “The team found numerous artifacts:”
Answer: rings, scarabs, pottery vessels, debris from thousands of statues, and a large number of tools, possibly used for spinning or weaving and casting molds. 

You might have to play with the temperature a bit to find the spot where GPT-3 gives you the right response, but the information is in there somewhere, waiting for the right trigger.

When you’re building question and answer applications and run into a block, keep in mind the trick is getting GPT-3 to understand what you want.

Long lists

Taking a more methodical approach, you can also test GPT-3’s capabilities by giving it really long lists. I’ve found that GPT-3 can tell you what name appears in what position in a list over 250 entries long using a prompt like this:

(I truncated the lists so they’d fit on this page.)

Memory test. Remember who is number 133.

1. Robert De Niro
2. Al Pacino
3. Tom Hanks
4. Johnny Depp
5. Jack Nicholson
6. Marlon Brando
7. Meryl Streep

...

244. Spike Lee
245. Frank Darabont
246. Michael Bay
247. Brian De Palma
248. Mel Brooks
249. Guillermo del Toro
250. Michelangelo Antonioni


Question: Who is at number 133?
Answer: Douglas Adams

The fact that it can process the list and remember who is at that number is fairly impressive for a language model, but what’s even more interesting is that you can ask it to repeat the entire list and GPT-3 will keep doing that verbatim until it runs out of tokens.

You can also have it recall part of the list by giving it a query like this after the list:

List...

What comes after?
133. Douglas Adams
134. Thomas Hardy
135. Herman Melville
136. Dante Alighieri
137. Harper Lee
138. Joseph Conrad
139. Jack Kerouac
140. Emily Bronte
141. Marcel Proust
142. Jules Verne
143. W. Somerset Maugham
144. Roald Dahl
145. Philip Pullman
146. Aldous Huxley
147. Anton Chekhov
148. Jack London
149. H. G. Wells

...

241. Francois Truffaut
242. Wes Anderson
243. Vittoria De Sica
244. Spike Lee
245. Frank Darabont
246. Michael Bay
247. Brian De Palma
248. Mel Brooks
249. Guillermo del Toro
250. Michelangelo Antonioni

GPT-3 has complete recall of every item in the list and is even able to pull items out from random spots, similar to how it can answer questions about an article included in a prompt.

Working memory

The fact that GPT-3 is able to recall an entire 250 item list and pull names from at any point shows that it has a incredible recall. To make use of this in an application the key is to create a prompt that emphasizes recall over confabulation.

As I said earlier, the information is in there, you just have to guide it along.