Trending Now
We have updated our Privacy Policy and Terms of Use for Eurasia Group and its affiliates, including GZERO Media, to clarify the types of data we collect, how we collect it, how we use data and with whom we share data. By using our website you consent to our Terms and Conditions and Privacy Policy, including the transfer of your personal data to the United States from your country of residence, and our use of cookies described in our Cookie Policy.
{{ subpage.title }}
Welcome to your AI video fever dream
Generative AI lets people craft sprawling essays, create detailed images, and even clone their own voice with remarkable precision. But taking an AI-generated video service for a spin made me realize that the technology is still far from creating convincing or cinematic video. In fact, the entire experience was surreal.
Luma AI’s Dream Machine, a free text-to-video service, warns users that they’re limited to 10 videos per day, and 30 videos per month, due to high demand — unless they pay at least $29.99 a month for the starting subscription tier. But I only needed to wait a couple of minutes to get my first prompts turned into … very, very strange videos.
I started with a simple request: Can you generate a video of a baseball player hitting a ball out of the park?
The results were astonishingly bizarre. Instead of a smooth, realistic depiction of a home run, what I got was a fever dream. The video featured an old man contorting his body in impossible ways, simultaneously attempting to swing a bat and prepare to throw (or catch?) a ball. While the stadium background looked reasonably accurate, the player’s movements were distorted, his jersey number blurred, and his face twisted unnaturally as he moved. Meanwhile, the bat morphed in size as he swung, and the words on the stadium signs were incoherent.
Determined to achieve a more precise outcome, I decided to try a prompt generated by ChatGPT. Sometimes the robots are best at talking to other robots.
The prompt described a sunny afternoon at a modern baseball stadium filled with cheering fans, detailing vibrant team colors and the batter’s white uniform with blue pinstripes. I requested a pitcher in a dark blue uniform throwing a fastball, a batter’s level swing, a monster home run, and the crowd’s roaring applause.
The result was even more disconcerting. The batter appeared to be hugging himself while morphing into a strange creature. Fans inexplicably sat near home plate, which transformed into an arch shape with some strange object on top. The batter was facing the wrong direction — or was that the catcher?
Given the perennial fear of deepfake videos and misinformation, I prompted the model to give me videos of Joe Biden, Donald Trump, Pope Francis, and Barack Obama giving speeches — but it refused. It did, however, agree to create a video of basketball star Michael Jordan giving a speech in a school gym.
The video showed a figure who kind of looked like Jordan for a split second before inexplicably morphing into a completely different-looking person. Meanwhile, another figure shuffled by like a zombie in ill-fitting pants. The gym setting was almost right, except for a riser cutting off someone’s legs, incorrect basketball markings on the floor, and a basketball hoop seemingly painted on the wall.
My editor Matt Kendrick, an Emmy-nominated TV producer in a former life, also gave it a try. His first effort to work up a thrilling historical drama set in medieval Mongolia resulted in a somewhat disturbing reverse-centaur situation.
But maybe the software is designed for the format of a proper Hollywood script, something like, say, the 2004 Kal Penn/John Cho opus “Harold and Kumar go to White Castle.” Alas, pasting in that finely crafted script resulted in nothing more than a clip of a man taking a phone call in an indecipherable language while sitting at a desk spruced up with the flag of the Belarusian democratic movement and some rather phallic decorations.
Text-to-video models like Luma AI or OpenAI’s still-under-wraps model, Sora, promise to make lifelike scenes — but the technical challenges we saw in our initial test suggest that this technology is still a ways away. The glitchiness, blurriness, and jarring incoherence were not evidence of a model that could confuse anyone — at least not without serious improvement. So Hollywood shouldn’t be worried just yet.
The bar for success is high but not impossible — and regulators should plan ahead. If video generation technology is cheap and powerful, it could be used to scam people, deceive them, and even disrupt elections. Earlier this year, an employee at a bank in Hong Kong was defrauded into paying over $25 million by deepfakes of the company’s chief financial official on a video call. And AI-generated recordings, photos, avatars, and text have played a role in influencing politics this year — so it’s only a matter of time before AI-generated video causes a stir.
Nick Reiners, senior analyst for geotechnology at Eurasia Group, says that while regulators haven’t cracked down on text-to-video models, a major global focus is transparency – “so you know you’re looking at deepfakes,” he said. That’s a principle of the European Union’s AI Act, the G7’s Hiroshima Process, and the Biden administration’s executive order on AI.
Reiners sees hesitation from major AI companies in releasing models and chalks it up more to the negative societal externalities than the products being technically underwhelming. “You look at the amount of progress that image generators have had in recent years, and you'd assume we see a similar improvement curve with video,” he said.
The two big issues, in Reiners’ view, are disinformation and sexual abuse material, and he thinks the latter might be addressed first: “There’s a big push on both sides of the aisle to protect children.” When video models improve, it may be deepfake of obscene or indecent nature that causes a ruckus before it can help throw an election one way or another.
We’re Sora-ing, flying
OpenAI, the buzzy startup behind the ChatGPT chatbot, has begun previewing its next tool: Sora. Just like OpenAI’s DALL-E allows users to type out a text prompt and generate an image, Sora will give customers the same ability with video.
Want a cinematic clip of dinosaurs walking through Central Park? Sure. How about kangaroos hopping around Mars? Why not? These are the kinds of imaginative things that Sora can theoretically generate with just a short prompt. The software has only been tested by a select group of people, and the reviews so far are mixed. It’s groundbreaking but often struggles with things like scale and glitchiness.
AI-generated images have already posed serious problems, including the spread of photorealistic deep fake pornography and convincing-but-fake political images. (For example, Florida Gov. Ron DeSantis’ presidential campaign used AI-generated images of former President Donald Trump hugging Anthony Fauci in a video, and the Republican National Committee did something similar with fake images of Joe Biden.)
While users may not yet have access to movie-quality video generators, they soon might — something that’ll almost certainly supercharge the issues presented by AI-generated images. The World Economic Forum recently named disinformation, especially that caused by artificial intelligence, as the biggest global short-term risk. “Misinformation and disinformation may radically disrupt electoral processes in several economies over the next two years,” according to the WEF. “A growing distrust of information, as well as media and governments as sources, will deepen polarized views – a vicious cycle that could trigger civil unrest and possibly confrontation.”
Eurasia Group, GZERO’s parent company, also named “Ungoverned AI” as one of its Top Risks for 2024. “In a year when four billion people head to the polls, generative AI will be used by domestic and foreign actors — notably Russia — to influence electoral campaigns, stoke division, undermine trust in democracy, and sow political chaos on an unprecedented scale,” according to the report. “A crisis in global democracy is today more likely to be precipitated by AI-created and algorithm-driven disinformation than any other factor.”