That robot sounds just like you

First, OpenAI tackled text with ChatGPT, then images with DALL-E. Next, it announced Sora, its text-to-video platform. But perhaps the most pernicious technology is what might come next: text-to-voice. Not just audio — but specific voices.

A group of OpenAI clients is reportedly testing a new tool called Voice Engine, which can mimic a person’s voice based on a 15-second recording, according to the New York Times. And from there it can translate the voice into any language.

The report outlined a series of potential abuses: spreading disinformation, allowing criminals to impersonate people online or over phone calls, or even breaking voice-based authenticators used by banks.

In a blog post on its own site, OpenAI seems all too aware of the potential for misuse. Its usage policies mandate that anyone using Voice Engine obtain consent before impersonating someone else and disclose that the voices are AI-generated, and OpenAI says it’s watermarking all audio so third parties can detect it and trace it back to the original maker.

But the company is also using this opportunity to warn everyone else that this technology is coming, including urging financial institutions to phase out voice-based authentication.

AI voices have already wreaked havoc in American politics. In January, thousands of New Hampshire residents received a robocall from a voice pretending to be President Joe Biden, urging them not to vote in the Democratic primary election. It was generated using simple AI tools and paid for by an ally of Biden's primary challenger Dean Phillips, who has since dropped out of the race.

In response, the Federal Communications Commission clarified that AI-generated robocalls are illegal, and New Hampshire’s legislature passed a law on March 28 that requires disclosures for any political ads using AI.

So, what makes this so much more dangerous than any other AI-generated media? The imitations are convincing. The Voice Engine demonstrations so far shared with the public sound indistinguishable from the human-uttered originals — even in foreign languages. But even the Biden robocall, which its maker admitted was made for only $150 with tech from the company ElevenLabs, was a good enough imitation.

But the real danger lies in the absence of other indicators that the audio is fake. With every other AI-generated media, there are clues for the discerning viewer or reader. AI text can feel clumsily written, hyper-organized, and chronically unsure of itself, often refusing to give real recommendations. AI images often have a cartoonish or sci-fi sheen, depending on their maker, and are notorious for getting human features wrong: extra teeth, extra fingers, and ears without lobes. AI video, still relatively primitive, is infinitely glitchy.

It’s conceivable that each of these applications for generative AI improves to a point where they’re indistinguishable from the real thing, but for now, AI voices are the only iteration that feels like it could become utterly undetectable without proper safeguards. And even if OpenAI, often the first to market, is responsible, that doesn’t mean all actors will be.

The announcement of Voice Engine, which doesn’t have a set release date, as such, feels less like a product launch and more like a warning shot.

More from GZERO Media

Marine Le Pen, French member of parliament and parliamentary leader of the far-right National Rally (Rassemblement National - RN) party and Jordan Bardella, president of the French far-right National Rally (Rassemblement National - RN) party and member of the European Parliament, gesture during an RN political rally in Bordeaux, France, September 14, 2025.
REUTERS/Stephane Mahe

Army Chief Asim Munir holds a microphone during his visit at the Tilla Field Firing Ranges (TFFR) to witness the Exercise Hammer Strike, a high-intensity field training exercise conducted by the Pakistan Army's Mangla Strike Corps, in Mangla, Pakistan, on May 1, 2025.

Inter-Services Public Relations (ISPR)/Handout via REUTERS

Field Marshal Asim Munir, the country’s de facto leader, consolidated his power after the National Assembly rammed through a controversial constitutional amendment this month that grants him lifelong immunity from any legal prosecution.

In this episode of Tools and Weapons, Microsoft Vice Chair and President Brad Smith sits down with Ed Policy, President and CEO of the Green Bay Packers, to discuss how purpose-driven leadership and innovation are shaping the future of one of the world’s most iconic sports franchises. Ed shares how technology and community-focused initiatives, from Titletown Tech to health and safety innovations on the field, are transforming not just the game of football, but the economy and culture of Green Bay itself. He explains how combining strategic vision with investment in local startups is keeping talent in the Midwest and creating opportunities that extend far beyond Lambeau Field.

Subscribe and find new episodes monthly, wherever you listen to podcasts.

People walk past a damaged building during the funeral of Hezbollah's top military official, Haytham Ali Tabtabai, and of other people who were killed by an Israeli airstrike on Sunday, despite a U.S.-brokered truce a year ago, in Beirut's southern suburbs, Lebanon November 24, 2025.
REUTERS/Mohamed Azakir

The Israeli military assassinated a senior Hezbollah commander in an airstrike on the Lebanese capital of Beirut on Sunday. The attack killed at least five people overall.

Servicemen of the 148th Separate Artillery Zhytomyr Brigade of the Armed Forces of Ukraine fire a Caesar self-propelled howitzer towards Russian troops at a position on the front line, amid Russia's attack on Ukraine, near the frontline town of Pokrovsk in Donetsk region, Ukraine November 23, 2025.
REUTERS/Anatolii Stepanov

After facing backlash that the US’s first 28-point peace deal was too friendly towards Russia, American and Ukrainian negotiators drafted a new 19-point plan on Monday.