That robot sounds just like you

First, OpenAI tackled text with ChatGPT, then images with DALL-E. Next, it announced Sora, its text-to-video platform. But perhaps the most pernicious technology is what might come next: text-to-voice. Not just audio — but specific voices.

A group of OpenAI clients is reportedly testing a new tool called Voice Engine, which can mimic a person’s voice based on a 15-second recording, according to the New York Times. And from there it can translate the voice into any language.

The report outlined a series of potential abuses: spreading disinformation, allowing criminals to impersonate people online or over phone calls, or even breaking voice-based authenticators used by banks.

In a blog post on its own site, OpenAI seems all too aware of the potential for misuse. Its usage policies mandate that anyone using Voice Engine obtain consent before impersonating someone else and disclose that the voices are AI-generated, and OpenAI says it’s watermarking all audio so third parties can detect it and trace it back to the original maker.

But the company is also using this opportunity to warn everyone else that this technology is coming, including urging financial institutions to phase out voice-based authentication.

AI voices have already wreaked havoc in American politics. In January, thousands of New Hampshire residents received a robocall from a voice pretending to be President Joe Biden, urging them not to vote in the Democratic primary election. It was generated using simple AI tools and paid for by an ally of Biden's primary challenger Dean Phillips, who has since dropped out of the race.

In response, the Federal Communications Commission clarified that AI-generated robocalls are illegal, and New Hampshire’s legislature passed a law on March 28 that requires disclosures for any political ads using AI.

So, what makes this so much more dangerous than any other AI-generated media? The imitations are convincing. The Voice Engine demonstrations so far shared with the public sound indistinguishable from the human-uttered originals — even in foreign languages. But even the Biden robocall, which its maker admitted was made for only $150 with tech from the company ElevenLabs, was a good enough imitation.

But the real danger lies in the absence of other indicators that the audio is fake. With every other AI-generated media, there are clues for the discerning viewer or reader. AI text can feel clumsily written, hyper-organized, and chronically unsure of itself, often refusing to give real recommendations. AI images often have a cartoonish or sci-fi sheen, depending on their maker, and are notorious for getting human features wrong: extra teeth, extra fingers, and ears without lobes. AI video, still relatively primitive, is infinitely glitchy.

It’s conceivable that each of these applications for generative AI improves to a point where they’re indistinguishable from the real thing, but for now, AI voices are the only iteration that feels like it could become utterly undetectable without proper safeguards. And even if OpenAI, often the first to market, is responsible, that doesn’t mean all actors will be.

The announcement of Voice Engine, which doesn’t have a set release date, as such, feels less like a product launch and more like a warning shot.

More from GZERO Media

- YouTube

If China, Japan, and South Korea formed a united front, what kind of leverage would they have in negotiating against US tariffs? I think they are heading in that trajectory. The question is, will it be enough to keep Syria stable and away from descending into civil war? Why does Trump want to take Greenland? Ian Bremmer shares his insights on global politics this week on World In :60.

President Donald Trump, seen here on the South Lawn of the White House in February, is set to unveil his "Liberation Day" tariffs.

REUTERS/Craig Hudson

T-Day has arrived. On Wednesday afternoon, Donald Trump’s reciprocal tariffs on US trade partners will take effect immediately after a Rose Garden announcement.

A giant screen in Beijing shows news footage about the People's Liberation Army (PLA) joint army, navy, air and rocket forces drills around Taiwan on April 1, 2025.
REUTERS/Florence Lo

Beijing conducted one of the largest and most provocative military drills ever around the island -- but why now?

President of Ukraine Volodymyr Zelenskyy speaks during a briefing, Kyiv, Ukraine, on March 28, 2025.
Ukrinform/ABACA via Reuters Connect

Vladimir Putin insists that Volodymyr Zelensky is no longer Ukraine’s legitimate president because his government has imposed martial law and delayed elections that were due in 2024.

President Donald Trump speaks from the Oval Office flanked by Commerce Secretary Howard Lutnick on the day he signed executive orders for reciprocal tariffs, Feb. 13, 2025.
REUTERS/Kevin Lamarque

Details of a group chat between senior administration officials that leaked last week – the so-called Houthi PC small group – provide allies, adversaries, and watchers with revealing insights into the administration’s foreign policy blueprint. Lindsay Newman explores the takeaways.

Proud Source became a Walmart supplier in 2021. Today, its team has grown by 50%, and it's the largest employer in Mackay, ID. Walmart supports small businesses across the country, and nearly two-thirds of Walmart's product spend is on products made, grown, or assembled in America. It’s all a part of Walmart’s $350 billion investment in US manufacturing, which helps small businesses grow and supports US jobs. Learn more about Walmart’s commitment to US manufacturing.

As Microsoft celebrates its 50th anniversary, Vice Chair and President Brad Smith sits down with company cofounder Bill Gates for a special episode of Tools and Weapons. They discuss Gates’ new memoir, "Source Code: My Beginnings," reflect on Microsoft’s impact over the past five decades, and explore why the next phase of the digital revolution is shaping up to be the most exciting yet. Subscribe and find new episodes monthly, wherever you listen to podcasts.