Trending Now
We have updated our Privacy Policy and Terms of Use for Eurasia Group and its affiliates, including GZERO Media, to clarify the types of data we collect, how we collect it, how we use data and with whom we share data. By using our website you consent to our Terms and Conditions and Privacy Policy, including the transfer of your personal data to the United States from your country of residence, and our use of cookies described in our Cookie Policy.
{{ subpage.title }}
Can OpenAI stand to “reason”?
OpenAI has unveiled its latest AI model, once code-named Strawberry and now officially dubbed o1. The company behind ChatGPT claims that this model represents a significant leap forward in artificial intelligence capabilities, specifically that it can perform human-like reasoning and tackle complex problems in ways that previous models, such as GPT-4, could not.
But can we really call what o1 is doing “reasoning,” or is that simply marketing-speak for more sophisticated pattern-matching?
According to OpenAI, o1 uses a novel processing approach, similar to the chain-of-thought technique, for prompting chatbots. Chain-of-thought is essentially a set of instructions aimed at getting previous generations of large language models to process questions step-by-step rather than all at once. Large language models are best at guessing the next word in a sequence, and they’re not great at answering complex questions accurately, so this technique helps it break down the task into simple steps to minimize error.
Gadjo Sevilla, senior analyst for technology at the market research firm eMarketer, told GZERO that o1 “integrates human-level reasoning that can more carefully analyze prompts and requests and generate more analytical responses.” It’s better at things like complex mathematics, chemistry, and computer science, he said.
Sevilla noticed that o1 takes longer to respond than previous models, and it notes how long it’s been “thinking” and “formulating a solution,” along with other indicators that it’s answering more carefully. Sevilla says the reasoning function still feels very early, which is why OpenAI has released it as a “preview.”
“More than anything, the timing of the release seems to coincide with OpenAI's current bid to increase its valuation,” he noted. OpenAI is reportedly seeking a monster valuation of around $150 billion in its upcoming funding round, making it one of the most valuable private companies in the world.
But by using terms like “thinking” and “reasoning,” OpenAI is employing human language for its models, a common marketing technique for AI companies that seem to want to suggest that their products are capable of human levels of intelligence — even when they’re just really, really good guessers.
Welcome to your AI video fever dream
Generative AI lets people craft sprawling essays, create detailed images, and even clone their own voice with remarkable precision. But taking an AI-generated video service for a spin made me realize that the technology is still far from creating convincing or cinematic video. In fact, the entire experience was surreal.
Luma AI’s Dream Machine, a free text-to-video service, warns users that they’re limited to 10 videos per day, and 30 videos per month, due to high demand — unless they pay at least $29.99 a month for the starting subscription tier. But I only needed to wait a couple of minutes to get my first prompts turned into … very, very strange videos.
I started with a simple request: Can you generate a video of a baseball player hitting a ball out of the park?
The results were astonishingly bizarre. Instead of a smooth, realistic depiction of a home run, what I got was a fever dream. The video featured an old man contorting his body in impossible ways, simultaneously attempting to swing a bat and prepare to throw (or catch?) a ball. While the stadium background looked reasonably accurate, the player’s movements were distorted, his jersey number blurred, and his face twisted unnaturally as he moved. Meanwhile, the bat morphed in size as he swung, and the words on the stadium signs were incoherent.
Determined to achieve a more precise outcome, I decided to try a prompt generated by ChatGPT. Sometimes the robots are best at talking to other robots.
The prompt described a sunny afternoon at a modern baseball stadium filled with cheering fans, detailing vibrant team colors and the batter’s white uniform with blue pinstripes. I requested a pitcher in a dark blue uniform throwing a fastball, a batter’s level swing, a monster home run, and the crowd’s roaring applause.
The result was even more disconcerting. The batter appeared to be hugging himself while morphing into a strange creature. Fans inexplicably sat near home plate, which transformed into an arch shape with some strange object on top. The batter was facing the wrong direction — or was that the catcher?
Given the perennial fear of deepfake videos and misinformation, I prompted the model to give me videos of Joe Biden, Donald Trump, Pope Francis, and Barack Obama giving speeches — but it refused. It did, however, agree to create a video of basketball star Michael Jordan giving a speech in a school gym.
The video showed a figure who kind of looked like Jordan for a split second before inexplicably morphing into a completely different-looking person. Meanwhile, another figure shuffled by like a zombie in ill-fitting pants. The gym setting was almost right, except for a riser cutting off someone’s legs, incorrect basketball markings on the floor, and a basketball hoop seemingly painted on the wall.
My editor Matt Kendrick, an Emmy-nominated TV producer in a former life, also gave it a try. His first effort to work up a thrilling historical drama set in medieval Mongolia resulted in a somewhat disturbing reverse-centaur situation.
But maybe the software is designed for the format of a proper Hollywood script, something like, say, the 2004 Kal Penn/John Cho opus “Harold and Kumar go to White Castle.” Alas, pasting in that finely crafted script resulted in nothing more than a clip of a man taking a phone call in an indecipherable language while sitting at a desk spruced up with the flag of the Belarusian democratic movement and some rather phallic decorations.
Text-to-video models like Luma AI or OpenAI’s still-under-wraps model, Sora, promise to make lifelike scenes — but the technical challenges we saw in our initial test suggest that this technology is still a ways away. The glitchiness, blurriness, and jarring incoherence were not evidence of a model that could confuse anyone — at least not without serious improvement. So Hollywood shouldn’t be worried just yet.
The bar for success is high but not impossible — and regulators should plan ahead. If video generation technology is cheap and powerful, it could be used to scam people, deceive them, and even disrupt elections. Earlier this year, an employee at a bank in Hong Kong was defrauded into paying over $25 million by deepfakes of the company’s chief financial official on a video call. And AI-generated recordings, photos, avatars, and text have played a role in influencing politics this year — so it’s only a matter of time before AI-generated video causes a stir.
Nick Reiners, senior analyst for geotechnology at Eurasia Group, says that while regulators haven’t cracked down on text-to-video models, a major global focus is transparency – “so you know you’re looking at deepfakes,” he said. That’s a principle of the European Union’s AI Act, the G7’s Hiroshima Process, and the Biden administration’s executive order on AI.
Reiners sees hesitation from major AI companies in releasing models and chalks it up more to the negative societal externalities than the products being technically underwhelming. “You look at the amount of progress that image generators have had in recent years, and you'd assume we see a similar improvement curve with video,” he said.
The two big issues, in Reiners’ view, are disinformation and sexual abuse material, and he thinks the latter might be addressed first: “There’s a big push on both sides of the aisle to protect children.” When video models improve, it may be deepfake of obscene or indecent nature that causes a ruckus before it can help throw an election one way or another.