Grounding: The Holly Grail of Natural Language Processing and Why 99% misunderstand what ChatGPT is all about.
2023-02-12
I ran into a friend of mine over the weekend, and he was asking me about ChatGPT. (What else? It is really permeating) He had a story that his scientist friend asked ChatGPT to recommend relevant articles in his field. The answers were excellent, with multiple relevant yet unknown ones from highly cited authors.
There was only one problem.
Quite common in citations that the first name is abbreviated. His suspicion was raised when a highly cited author with a really specific surname had a different initial than usual. His suspicion was confirmed by searching for the article titles in the relevant journals.
They were non-existent.
ChatGPT confidently made up a bunch of titles and made up a couple of highly cited sounding professors as well.
But we all know that ChatGPT makes things up. (TBH, this is a __huge__ and potentially commercially blocking issue, but offtopic for this article. More on this later; follow me if you want to be notified.) That wouldn’t be news to write a LinkedIn post.
I started writing because of his next question:
“Why doesn’t it just look it up in the database it learnt?”
Because there is no database.
First, ChatGPT doesn’t memorise data in the rote learning sense. There is nothing in its structure (and I avoid using the term memory, which is ephemeral) similar to a book index. This would allow it to confirm its hallucinations against a reference and change it accordingly.
But this doesn’t exist. And to understand why we need to understand a common term in Natural Language Processing:
“Grounding”
“Cognitive Science defines “grounding” as the process of establishing what mutual information is required for successful communication between two interlocutors.”
It essentially means that when you see the term “Jeff Bezos”, you know that he is the founder of Amazon. You also understand that “founder” (in this context) means someone who starts a company from scratch, and Amazon is a company. Also, Jeff Bezos is a person, and Amazon is a company (not a legendary female warrior), a company is a commercial entity (and so and so on all the way down).
These terms are said to be “grounded” by you. When you see the 10 characters (9 letters + 1 space) of “Jeff Bezos” (which is called a “surface form” in NLP), you link this information in your head to a specific person (called an “entity”). Entities can have multiple surface forms (e.g., Jeff Bezos has “Jeff” and “Bezos”).
This is not an easy process. If you see “Jeff”, you will have a much harder time. If the text earlier talked about Bezos or Amazon, you can make a good guess this must still be grounded to the same person. If the text previously talked about Google AI and its head Jeff Dean, you would ground this “Jeff” to Jeff Dean and not Bezos.
What happens if Google organises a conference with Amazon about AI, and both of them are there speaking? (And they talk about the movie “Wonder Woman”, but that would be a trolling example.) Well, the author of the text is expected to communicate clearly and not abbreviate the names of the two Jeffs to “Jeff” and use their full name each time to help the reader.
In communication, the two (sender/receiver) make an attempt to understand each other. In writing, this is hard because there is only one shot, so you want to be careful. But in live conversations, you can ask back if you can’t ground something with “Which Jeff you meant? Jeff Dean or Jeff Bezos?” And you can continue the grounding and understanding of the situation.
Now back to ChatGPT:
LLMs don’t do grounding. All they do is predict the next word in the sentence. They can figure out higher-level natural language processing features that help them in this process in a general sense but not deliberately. There is clearly some emergent behaviour that indicates that this is happening to a certain extent but not enough (can not be enough). As you can see from the nuances above, unless there is an attempt to clear the edge cases, there will be embarrassing consequences.
When the scientist asked for articles, in his head, he was expecting articles that were grounded to real ones so he could go and read them. ChatGPT doesn’t do grounding. It doesn’t make an attempt to help you ground the text you are receiving. In fact, it doesn’t even “know” what grounding is, and that communication is supposed to be about that.
For us, it is so natural that we don’t even think about it. When someone talks to us, we do it automatically. With simple facts, we do it instantly and use strategies to help us. With more complex information, you ground deliberately. When someone is BSing you, you try to ground parts of the text and then validate the truth value of the BSer’s statement. This is how phrases like “Many people say” are misleading because you can’t ground “Many people” to a concrete set of people. You are supposed to ask back, “Which people? You? Me? Every software engineer in the UK? All dentists in the world?” But, of course, you can’t do that on most platforms.
But ChatGPT doesn’t have a concept of this. It is making a best effort to remember as many surface forms as possible so they can be reused when they need to predict the next token. But that doesn’t mean it will make an effort to communicate in a way that can be grounded because there is nothing to ground to.
But that makes any communication very difficult. When you see “Bezos”, you think of the person. While ChatGPT just knows that after “Bezo”, you probably expect an “s”.
Using ChatGPT for any kind of factual information will expose you if, during training, ChatGPT benefitted from remembering some pseudo-grounded information. If it usage of a fact rarely happens enough, so it doesn’t affect the overall performance, you probably get some hallucinated facts.
But communication is about the specific. You don’t talk about generics because that has little value. Everyone is supposed to already know that. But if a system focuses on generics, then I don’t see how it will add a lot of value. I am also not sure how this problem can be resolved in the future. This is a fundamental problem of statistical natural language processing.
PageRank vs ChatGPT
Google solved this problem a long time ago by figuring out that guessing is less important than being correct. So they used a relatively simple algorithm (PageRank) to do the initial guessing but did a hard search to make sure they picked the grounded information. Of course, they moved on from PageRank to more complications and messed up the top of the search with ads, but you don’t expect hallucinated results.
ChatGPT uses a complicated algorithm to make an initial guess but doesn’t make any effort to ground it. This will be clearly untenable in the future.
Couldn't RLHF be considered grounding of sorts? After all you are incorporating human feedback