TRANSCRIPT: Getting to know generative AI with Gary Marcus
Gary Marcus:
Large language models are actually special in their unreliability. They're arguably the most versatile AI technique that's ever been developed, but they're also the least reliable AI technique that's ever gone mainstream.
Ian Bremmer:
Hello and welcome to the GZERO World podcast. This is where You'll find extended versions of interviews for my weekly show on public television. I'm Ian Bremmer, and today we are diving back into the world of artificial intelligence, and specifically generative AI. I'm talking about those AI powered tools you know like the text to text generator ChatGPT or the text to image generator DALL-E. As of now, they can do magical things like write your college term paper for you in middle English or instantly generate nine images of a slice of bread ascending to heaven. But many of the smartest people I know think these tools will soon reshape the way we live in both good ways and in bad. And that's the subject of my interview today with psychologist cognitive scientist and NYU professor Gary Marcus. Let's get to it.
Announcer:
The GZERO World Podcast is brought to you by our leads sponsor, Prologis. Prologis helps businesses across the globe scale their supply chains with an expansive portfolio of logistics real estate, and the only end-to-end solutions platform addressing the critical initiatives of global logistics today. Learn more at Prologis.com.
Ian Bremmer:
Gary Marcus, thanks for joining us today.
Gary Marcus:
Thanks for having me, Ian.
Ian Bremmer:
So AI genius, I have so many things I want to talk to you about today. I want to maybe start with the fact that we've had AI voice assistance on our phones for about a decade now, but no one's really been all that excited about it. And then suddenly, now it's all about ChatGPT and all of this other stuff. Is it really all that different, and how, if it is?
Gary Marcus:
Well, the underlying technology is actually pretty different. And your question is a reminder, there's actually lots of kinds of AI, some of which we can trust and some of which we probably shouldn't. Siri was very carefully engineered to do only a few things and do them really well, and it has APIs to hook out into the world, so if you say, "Lock the door," and you have-
Ian Bremmer:
API. Explain API for everybody.
Gary Marcus:
Application program interface. It's jargon for hooking your machine up to the world. And Siri only does a few things. It'll control your lights if you have the right kind of light switches, it will lock your door, but it doesn't just make stuff up. It only really works on a few things. It doesn't let you talk about just anything. Every once in a while they roll out an update and now we can ask it about sports scores or movie scores. At first you couldn't. But It's very narrowly engineered. Whereas the large language models that are popular now are kind of like jack of all trades but masters of none. They pretend to do everything, but if you tell them to move your money in your bank account, it might not have a way of actually connecting to your bank account and it might say, "Yes, I moved your money," and then you might be disappointed when it didn't actually do it.
So there's a kind of appearance or an illusion of great power that Siri never tried to give you. Siri tried to not oversell what it could do. Whereas, in a certain sense, the large language models are constantly overselling what They're doing. They give this illusion of being able to do anything. They'll give you medical advice but that doesn't mean they really understand medicine or that you should trust them. But it's completely changed the world that these things are widespread. They existed before, the underlying technology, the main part of it. Well, some of it goes back decades, and the main technical advance was in 2017. People were playing around with them in the field but nobody knew that would catch on this way. And now that you have these kind of unreliable tools that give the illusion of incredible versatility and everybody's using them, that actually changes the fabric of society.
Ian Bremmer:
I think it'd be helpful if you explain to people a little bit how large language models, what we commonly think of today as the AI we interact with, how it works.
Gary Marcus:
They are analyzing something, but what they're analyzing is the relationship between words, not the relationship between concepts or ideas or entities in the world. They're just analyzing relations between words, and so they're basically like auto complete on steroids. We give them billions or even trillions of words drawn from all the way across the internet. Some people say they now are trained on a large fraction of the internet. And they're just doing autocomplete. They're saying, if you say these words, what is the most likely thing that will come next? And that's surprisingly handy, but it's also unreliable. A good example of this that I gave in my TED Talk was one of these systems saying on March 18th, 2018, Tesla CEO Elon Musk died in a fatal car accident. Well, we know that a system that could actually analyze the world wouldn't say this. We have enormous data that Elon Musk is still alive. He didn't die in 2018. He tweets every day, he's in the news every day. So a system that could really do the analysis that most people imagine that these systems are doing would never make that mistake.
Ian Bremmer:
Could not return that result.
Gary Marcus:
Could not return that result. It could have looked in Wikipedia, it could look in Twitter. Somebody tweeted, they're probably still alive. There's so many inferences it could make. The only inference it's really making is that these words go together in this big soup of words that it's been trained on. So it turns out that other people died in Teslas and he's the CEO of Tesla. But it doesn't understand that the relationship between being CEO of Tesla is not the same as owning a particular Tesla that was in a fatal vehicle accident. It just does not understand those relationships.
Ian Bremmer:
But so Gary, why can't you combine these two things in one of these apps? In other words, you have this very, very powerful predictive analytics tool in assessing the relationships between words and data and then instantaneously right after it returns that, just do a quick search so you would know that if it returns something that's stupid or obviously false, Google, Wikipedia, whatever, doesn't then return the thing that's obviously false. Why isn't that possible? Why isn't that happening?
Gary Marcus:
Well, people are trying, but the reality is, it's kind of like apples and oranges. They both look like fruit but they're really different things, and in order to do this "quick search," what you really need to do is to analyze the output of large language model. Basically take sentences and translate them into logic. And then if you could translate them into logic, but that's a really hard problem that people have been struggling with for 75 years, then you could do that logic if you had the right databases to tell you all this stuff and you'd hook them all up.
But that's what AI has been trying to do for 75 years. We don't really know how to do it. It's like we have this shortcut and the shortcut works some of the time, but people are imagining that the shortcut is the answer to AI. These things have nothing really to do with the central things that people have been trying to do in AI for 75 years, which is to take language, translate them into logical form, into database facts that can be verified, that can be analyzed. We still don't really know how to do that, and it's much harder than it looks, which is why you have things like Bing will give you references and the references will say the opposite of what actually happened. It can't actually read those things. Another way to put it all together is these things are actually illiterate. We don't know how to build an AI system that can actually read with high comprehension, really understand the things that are being discussed.
Ian Bremmer:
So Gary, I get that they're different things. I get that apples and oranges do not grow on the same trees, but you and I can eat them on the same plate. We're capable of ordering a fruit salad. So what I'm trying to understand is I understand that the AI large language model is not capable of recognizing the output as information and translating it, but Google as a search engine is. So again, what I'm trying to understand is why can't you then put that essentially into a separate search engine that you and I aren't going to see in the interactivity but it's actually doing those two different things.
Gary Marcus:
So I'm going to give you two different answers here, two different ways of thinking about it. One is that Google doesn't actually do all this stuff by itself. It requires you to do it. So you put in a search, it gives you a bunch of garbage, and you sort through that garbage. And so in a typical application of Google, it's actually human in the loop is how we would describe it. To do what people want to do now, you'd have to take the human out of the loop. What people are looking for is something more autonomous than that, without humans in the loop. You want to be able to type into ChatGPT whatever, have it give you a search, and figure out the answer for itself, not consult you or some farm of humans off in some other country who are underpaid and too slow to actually do this right now. You want a system that can actually do it autonomously for itself. Google's not actually good enough to do that. It's really just matching a bunch of keywords, it gives you a bunch of stuff and you as a human sort it out. So it's a nice idea to say just pass it through Google, and that's kind of what they're actually doing with Bard and it kind of doesn't work. So that's one way to think about it.
Another way to think about is the Nobel laureate Daniel Kahneman's distinction between system one-
Ian Bremmer:
A psychologist. Yes.
Gary Marcus:
... and system two. Cognition.
Ian Bremmer:
Thinking.
Gary Marcus:
So system one is like doing stuff by reflex and system two is more deliberate reasoning. Well, pretty much every, I don't know, multicellular animal on the planet has some system one cognition, but it took a long time to evolve system two cognition. Only a few species can really do it. I mean, maybe crows can do it when they make a new tool, maybe some primates can do it, but mostly, it's just humans, and even humans have to be trained to do it well. So you have to take this sort of thing that's very rare in evolution. Well, it's hard to reconstruct that, and that's just a different thing from the system one cognition that current systems are. It's like people are imagining because they can do the reflexive stuff, therefore they'll be able to do the deliberative reasoning. But that's just not the case.
Ian Bremmer:
I remember when I was reading early Ray Kurzweil who was talking about trying to reverse engineer the human brain, which of course is one way a lot of people that early on thought about maybe AI would have breakthroughs, and that when you get to certain levels of compute, you can only replicate the amount of neural connections of an earthworm, and then you get to a rat, and then you get to a crow, and then you get to a monkey and it takes off eventually really, really fast, but it takes you a very long time to get there. That's not remotely what's happening here when it comes to artificial intelligence. You are not at all replicating even low level brain interaction or cognition. You're doing something that is radically different. A computer is engaging in an incredibly high amount of pattern recognition and prediction on the basis of that using words, pictures, images, other types of data.
Gary Marcus:
That's right. There's almost nothing in common between how an animal, let's say a dog, understands the world and how a large language model does. There's no simple ladder of life. Biologists have ruled that out. But take an animal like a dog. What it's doing is it's trying to understand the world in terms of other agents, other animals and also other objects. What can I jump on? What can I run through? Why is this person doing this? I think they're doing some analysis of human beings. They don't need any data in the form of sentences scraped from the web to do that. It's a completely different paradigm from a large language model, which is just looking at the sentences that people have seen and trying to mimic those sentences. And the large language model is not building an understanding of other people, it's not building an understanding of other machines, it's not building an understanding of anything. It's just tabulating statistics of words, and that's just not how any creature actually works.
The other side of the Kurzweil argument is so far we still don't even know how to actually model the nervous system of a worm. There's a worm where we've known for almost 40 years what the wiring diagram is, and we can't even get a good simulation of that. So we're nowhere near understanding neuroscience well enough to simply replicate-
Ian Bremmer:
I was actually about to ask you that. Given our incredible levels of compute, which are very advanced, staggeringly fast, can deal with lots of data, why haven't we been able to model, effectively model, the brain or nervous system of an earthworm?
Gary Marcus:
There's soft truth and hard truth. The soft truth is it's just really hard. And the hard truth is I think we're looking in the wrong places. I think we have a few ideas about neuroscience and they're just wrong. Sometimes scientists make mistakes. They usually correct it. Science is self-correcting in the long run, but it can take decades when people are attached to ideas they think are right that aren't really right. I'll give you an example of this. For the first 40 years or so of the last century, people thought that the genetic material was some kind of protein, and they were just wrong. They knew there was heredity, they were trying to find the molecular basis, and everybody was like, "Is it this protein? Is it that protein?" Turns out the genes are not made of proteins. But proteins did so many things, everybody was convinced that it was one protein or another and so just chased the wrong idea until Oswald Avery did the right process of elimination experiment in the 1940s.
And once he proved that it couldn't be a protein, he zeroed in and found that it was this weird sticky acid nobody understood called DNA. And then it wasn't long after he sorted that out that Watson and Crick figured out the structure of it, borrowing some ideas from Rosalind Franklin. And it was only after that that things started going really quickly. So sometimes we're just in a scientific blind alley, and I think that's the case with neuroscience. Everybody thinks it's all about the connectivity, but there may be other things that are going on. We don't really know. And it's the same thing with neural networks right now, which are AI simulations of brains that we don't really understand. And there's one good idea there which is about statistical association, but we're probably missing some other ideas.
Ian Bremmer:
So if we take that challenge, how could we potentially create AI that is really "truthful," that's going to be factual to the extent that you and I would believe it? The equivalent of we're prepared to have the autonomous driver as opposed to drive ourselves, not because it never gets into accidents but because, you know what? It's good enough we have confidence. What needs to happen before a chatbot or another LLM system will be something that you and I can have confidence in?
Gary Marcus:
I think we're fairly far. But the main thing is, I would think of it in terms of climbing mountains in the Himalayas. You're at one peak and you see that there's this other peak that's higher than you and the only way you're going to get there is if you climb back down, and that's emotionally painful. You think you've gotten yourself to the top. You haven't really. You realize you're going to have to go all the way back down and then all the way up another mountain, and nobody wants to do that. And add in the economics where you're making money where you are right now, and you're going to have to give up the money that you're making now in order to make a long-term commitment to doing something that feels foreign and different. Nobody really wants to do that.
I think there's a chance maybe that finally we have the right economic incentive, which is people want what I call chat search to work, which is you type in a search to ChatGPT. It doesn't work that well right now, but everybody can see how useful and valuable that would be, and maybe that will put enough money and enough kind of frustration because it's not working to get people to kind of turn the boat. You need to wind up in a different part of the solution space. And the real question is, what's going to motivate people to look there?
I think once people start, first of all, taking seriously old-fashioned AI, sometimes people call it good old-fashioned AI, it's totally out of favor right now, but it was I think dismissed prematurely. Good old-fashioned AI looks like computer programming or logic or mathematics. You have symbols that stand for things and you manipulate those symbols like you would in an equation. And we need to combine elements of old-fashioned AI, symbolic AI, with neural networks, and we don't really know how to do it. So old-fashioned AI is much better as it turns out with truth. If you give it a limited set of facts, it can reason relative to this fact and it won't hallucinate. It won't just make stuff up the way neural networks do.
And so we would like to be able to use something like that but it doesn't learn as quickly, it's more cumbersome to work with, and so people have abandoned it really for 30 years. We're going to need to come back and say, "Look, we made a lot of progress, we're proud of ourselves, but there were some ideas that people had in the 1960s and '70s that were actually pretty good, and we have to stop being so hostile to one another in this field."
Ian Bremmer:
So I take it, and I want you to confirm this, that what you're saying is that we're going to get the ChatGPT five and six, which will be vastly faster, and we'll be much more confident in our interactions with it if we don't question it. But does that mean that you don't believe that the hallucinations are going to get materially better?
Gary Marcus:
So I wrote an essay called What to Expect When You're Expecting GPT four. I made seven predictions and they were all right, and one of them was that GPT four would continue to hallucinate. And I will go on record now as saying GPT five will, unless it involves some radical new machinery. If it's just a bigger version trained on more data, it will continue to hallucinate. And same with GPT six.
Ian Bremmer:
To the same degree though. Because again, I've seen a lot of newspaper headlines, and I haven't dug into the underlying studies, that have shown that actually there's a lot more accuracy with four than three. Is that not true?
Gary Marcus:
That's actually marginal. NewsGuard did a systematic study looking at different misinformation tropes, and they actually found that four was worse at promulgating those tropes than three and a half was.
Ian Bremmer:
Then why has four been seen to be so much better at, say, being able to pass an advanced placement exam or the bar exam or things like that?
Gary Marcus:
I suspect that actually has to do with what's in the training data, that it's not very general. So first of all, somebody found that in the law school exams, the percentile wasn't as high as initially reported. It was like 65 rather than 95, which is a world of difference. And second of all, I don't think that they measured contamination correctly. So if you read the paper carefully, they're basically saying was this question word for word in there? And the systems can do a bunch of synonymy. They can recognize things with slightly altered wording. If you wanted to do this as science rather than PR, you would want stricter measures about what actually counts as contamination in the data. I suspect that the reason it does as well as it does in a law school exam is because they probably bought 80 old LSATs or something like that, which no human is able to do. They made commercial agreements that are not disclosed. I suspect that a lot of the progress is like that. There is some real progress in some dimensions for four rather than, let's say, three and a half. But it's not at all clear that it's any better on the hallucination problem. This problem of inventing stuff that isn't there persists in four. It's not clear that it's actually better.
Ian Bremmer:
Given what the money is presently being spent on, the exponential growth and capacity that we've experienced in the last couple of years, play it out 2, 3, 5 years, where do you think the biggest advances are going to be?
Gary Marcus:
There's no guarantee that all this money in is going to lead to an output. It might. There'll probably be some output. But I'll just as a cautionary tale remind you that driverless cars have been around for a long time. In 2016, I said even though these things look good right now, I'm not sure they're going to be commercialized anytime soon because there's an outlier problem. In the outlier problems, there's always some scenarios you haven't seen before, and the driverless cars continue to be plagued by this stuff seven years later. I gave an example then that Google had just maybe solved at that point a problem about recognizing piles of leaves on the road and I said there's going to be a huge number of these problems, we're never going to solve them, and we still haven't. So there was a Tesla that ran into a jet not that long ago because a jet wasn't in its training set. It didn't know what to do with it. It hasn't learned the abstract idea that driving involves not running into large objects. And so there's a particular large object that isn't in the training set, it doesn't know what to do.
So what happened there is a hundred billion went into this. Everybody was sure there was going to be a ton of money to be made, and still, there hasn't been. The driverless car industry so far has just sucked in money and not actually made driverless cars that you can use in a reliable way. Take medicine. It's not clear that even GPT five is going to be a reliable source of medicine. Four is actually better than three, but is it reliable enough is an interesting question. Each year's driverless car is better than the last, but is it reliable enough? Not so far. And so whether you'll make that threshold in a domain where it's really safety critical is unclear. Already, four is great at writing boilerplate text. Five will be even better at writing boilerplate text. Erik Brynjolfsson is arguing that there'll be productivity improvements, and I think at least in some places like customer service where there's still some humans in the loop, we'll see even more of that. So there's definitely going to be things we get out of it, but there's also risks.
Ian Bremmer:
What I am hearing pretty consistently, and I think it certainly aligns with what you're telling me today, is that humans in the loop remain essential for almost all of the uses of AI that we're really benefiting from.
Gary Marcus:
Yeah, that's true for large language models. You could argue that the routing systems, routing systems, that you use in your GPS, that just works. You don't need a human in the loop there. Same with a chess computer. You can just play the chess computer. You don't need a human there. Large language models are actually special in their unreliability. They're arguably the most versatile AI technique that's ever been developed, but they're also the least reliable AI technique that's ever gone mainstream. And so everything where we're using large language models, we do for now need humans in the loop.
Ian Bremmer:
Before we close, I want to ask you just for a moment about the what do we do about this side, which is the governance side. We don't really have a regulatory environment yet, the government actors don't know a lot about the stuff that you know a lot about yet, maybe ever. The architecture's not there, the institutions aren't there. Give me just for a moment where you think the beginnings of effective governance or regulation would come from in this environment.
Gary Marcus:
First thing is I think every nation has to have its own AI agency or cabinet level position, something like that, in recognition of how fast things are moving and in recognition of the fact that you can't just do this with your left hand. You can't just say, all the existing agencies, "Yeah, you can just do a little bit more and handle AI." There's so much going on. Somebody's job needs to be look at all of the moving parts and say, "What are we doing well? What are we not doing well? What are the risks, the cyber crime, misinformation? How are we handling these kinds of things?" So we need some centralization there. Not to eliminate existing agencies, which still have a big role to play, but to coordinate them and figure out what to do.
We also, I think, need global AI governance. We don't really want to have, and the companies don't really want to have, different systems in every single country. So for example, it's very expensive to train these models. If you have 193 countries with 193 different regimes requiring so much damage to our climate and maybe updating them every month or whatever, that would be just a climate disaster. So we want some international coordination. And then another thing that I think is important for each nation, and maybe we do this globally, is I think we need to move to something like an FDA model where if you're going to do something that you deploy at wide scale, you have to make a safety case. So sure, you can do research in your own labs, Google doesn't have to tell us everything they're doing, OpenAI doesn't have to tell us everything they're doing. But if they're going to put something out for a hundred million users, we really want to make sure it's safe and ask what are the risks here? What are you doing about those risks?
Right now the companies are doing those things internally, but we need to have external scientists who can say, "Hey, wait a minute," and I'll give you an example. There's something called ChatGPT plugins, which has now given rise to something called AutoGPT, which can access your files and the internet and even other human beings. Any external cybersecurity expert would say, "There's a lot of risk here." But the companies, OpenAI, went ahead and said, "It's fine. We can do this." Whereas Apple says, "We have to sandbox every application. We have to limit what its access is." So OpenAI has done something completely at odds with best practice that we know elsewhere in the software industry, and there's no constraint on that and could actually lead to pretty serious harm.
There are cases like these where we really need some external advisory that can say, "This isn't good enough." Sort of like peer review. You don't just publish a paper, you have people examine it. We need to do the same thing. You can't just put something out. And if you affect a hundred million users, you really affect everybody. So just as one example, these systems are going to affect people's political opinions. So everybody, even if they signed up or not, is going to be affected by what these systems do. We have no transparency, we don't know what data they're trained on, we don't know how they're going to influence the political process, and so that affects everybody. And so we should have some oversight of that.
Ian Bremmer:
Gary Marcus, thanks so much for joining us.
Gary Marcus:
Thanks a lot for having me.
Ian Bremmer:
That's it for today's edition of the GZERO World podcast. Do you like what you heard? Of course you did. Well, why don't you check us out at GZEROMedia.com and take a moment to sign up for our newsletter? It's called GZERO Daily.
Announcer:
The GZERO World podcast is brought to you by our lead sponsor, Prologis. Prologis helps businesses across the globe scale their supply chains with an expansive portfolio of logistics real estate, and the only end-to-end solutions platform addressing the critical initiatives of global logistics today. Learn more at Prologis.com.
Subscribe to the GZERO World Podcast on Apple Podcasts, Spotify, Stitcher, or your preferred podcast platform, to receive new episodes as soon as they're published.- Can we trust AI to tell the truth? ›
- Ian interviews Scott Galloway: the ChatGPT revolution & tech peril ›
- The AI arms race begins: Scott Galloway’s optimism & warnings ›
- Governing AI Before It’s Too Late ›
- Emotional AI: More harm than good? ›
- Is AI's "intelligence" an illusion? - GZERO Media ›
- AI and the future of work: Experts Azeem Azhar and Adam Grant weigh in - GZERO Media ›
- Will AI further divide us or help build meaningful connections? - GZERO Media ›
- How is AI shaping culture in the art world? - GZERO Media ›
- AI is turbocharging the stock market, but is it all hype? - GZERO Media ›