That robot sounds just like you

First, OpenAI tackled text with ChatGPT, then images with DALL-E. Next, it announced Sora, its text-to-video platform. But perhaps the most pernicious technology is what might come next: text-to-voice. Not just audio — but specific voices.

A group of OpenAI clients is reportedly testing a new tool called Voice Engine, which can mimic a person’s voice based on a 15-second recording, according to the New York Times. And from there it can translate the voice into any language.

The report outlined a series of potential abuses: spreading disinformation, allowing criminals to impersonate people online or over phone calls, or even breaking voice-based authenticators used by banks.

In a blog post on its own site, OpenAI seems all too aware of the potential for misuse. Its usage policies mandate that anyone using Voice Engine obtain consent before impersonating someone else and disclose that the voices are AI-generated, and OpenAI says it’s watermarking all audio so third parties can detect it and trace it back to the original maker.

But the company is also using this opportunity to warn everyone else that this technology is coming, including urging financial institutions to phase out voice-based authentication.

AI voices have already wreaked havoc in American politics. In January, thousands of New Hampshire residents received a robocall from a voice pretending to be President Joe Biden, urging them not to vote in the Democratic primary election. It was generated using simple AI tools and paid for by an ally of Biden's primary challenger Dean Phillips, who has since dropped out of the race.

In response, the Federal Communications Commission clarified that AI-generated robocalls are illegal, and New Hampshire’s legislature passed a law on March 28 that requires disclosures for any political ads using AI.

So, what makes this so much more dangerous than any other AI-generated media? The imitations are convincing. The Voice Engine demonstrations so far shared with the public sound indistinguishable from the human-uttered originals — even in foreign languages. But even the Biden robocall, which its maker admitted was made for only $150 with tech from the company ElevenLabs, was a good enough imitation.

But the real danger lies in the absence of other indicators that the audio is fake. With every other AI-generated media, there are clues for the discerning viewer or reader. AI text can feel clumsily written, hyper-organized, and chronically unsure of itself, often refusing to give real recommendations. AI images often have a cartoonish or sci-fi sheen, depending on their maker, and are notorious for getting human features wrong: extra teeth, extra fingers, and ears without lobes. AI video, still relatively primitive, is infinitely glitchy.

It’s conceivable that each of these applications for generative AI improves to a point where they’re indistinguishable from the real thing, but for now, AI voices are the only iteration that feels like it could become utterly undetectable without proper safeguards. And even if OpenAI, often the first to market, is responsible, that doesn’t mean all actors will be.

The announcement of Voice Engine, which doesn’t have a set release date, as such, feels less like a product launch and more like a warning shot.

More from GZERO Media

- YouTube

On GZERO World, Ian Bremmer sits down with Jennifer Sciubba to explore a looming global crisis: population collapse. With fertility rates below replacement levels in two-thirds of the world, what does this mean for the future of work, healthcare, and retirement systems? In the US, Vice President-Elect JD Vance and Elon Musk are already sounding the alarm, the latter saying it's “a much bigger risk” to civilization than global warming. Can governments do anything to stop it?

Senegal's Presidential Bassirou Diomaye Faye casts his ballot during the early legislative election, at a polling station in Ndiaganiao, Mbour, Senegal on Nov. 17, 2024.

Abdou Karim Ndoye/Senegal's Presidency/Handout via Reuters

President Bassirou Diomaye Faye called the snap vote eight months after taking office, seeking a majority mandate for economic reforms as the country grapples with high inflation and widespread unemployment.

Brazil's President Luiz Inacio Lula da Silva greets UN General-Secretary Antonio Guterres ahead of the G20 summit, in Rio de Janeiro, Brazil, on Nov. 16, 2024.

Ricardo Stuckert/Brazilian Presidency/Handout via Reuters

As G20 leaders meet in Rio de Janeiro on Monday, it’s not just the city’s famed statue of Christ the Redeemer casting a shadow: it’s US President-elect Donald Trump.

President Joe Biden, South Korea's President Yoon Suk Yeol, and Japan's Prime Minister Shigeru Ishiba participate in a trilateral meeting at the Asia-Pacific Economic Cooperation summit in Lima, Peru, on Nov. 15, 2024.

REUTERS/Leah Millis

In a joint press conference on Friday at the APEC summit in Lima, Peru, US President Joe Biden, South Korean President Yoon Suk-yeol, and Japan’s Prime Minister Shigeru Ishiba warned of the latest “dangerous and destabilizing” cooperation between Russia and North Korea.

Former President Donald Trump attends court during closing arguments in his civil business fraud trial at the New York Supreme Court on Jan. 11, 2024.
John Nacion/NurPhoto via Reuters

Donald Trump’s victory in the US presidential election puts the country in an unprecedented position. He’s the first convicted felon to win the presidency and was elected to the nation’s highest office while facing multiple criminal cases at the federal and state level. What will happen to these criminal proceedings?

- YouTube

The world is quietly being reshaped by a demographic time bomb: Birthrates are plummeting, and the global population is rapidly aging. By 2050, one in six people will be over 65. While the overall population is still increasing—driven by growth in developing countries like Nigeria and Pakistan—experts predict it will peak in about 60 years. The shift to depopulation will have huge implications for the future of work, healthcare, and retirement. So what can we do about it? On Ian Explains, Ian Bremmer breaks down the different strategies governments are using to try to get people to have more kids, particularly in East Asia, where the population crisis is severe.

The Puerto Princesa Forest Restoration Initiative is a project to plant more than 400,000 seedlings to restore Palawan forests destroyed by Super Typhoon Odette in the Philippines. It’s part of a larger global effort by the Priceless Planet Coalition, launched by Mastercard with Conservation International and the World Resources Institute, to fund the restoration of 100 million trees around the world. These projects extend beyond carbon sequestration — they’re aimed at creating economic opportunities for women in the region, enabling them to better provide for their families. Read more about how many local women and community members are leading the charge on nursery construction, maintenance, and seedling production.

- YouTube

Listen: The world is on the brink of one of the most fundamental demographic shifts in modern human history: populations are getting older, and birth rates are plummeting. By 2050, one in six people on Earth will be over 65, which will have a huge impact on the future of work, healthcare, and social security. On the GZERO World Podcast, Ian Bremmer sits down with Jennifer Sciubba, President & CEO of the Population Reference Bureau, to discuss declining fertility, the aging crisis, and why government efforts all over the world to get people to have more babies don’t seem to be working.